Senior DevOps/Site Reliability Engineer
About us:
Deep expertise, personal and industry evolution, and impeccable craft are the foundational principles of Limit Break. Founded by global industry leaders in mobile gaming, we are dedicated to unlocking the potential of digital markets to create vibrant communities and new opportunities. Our mission is to push the boundaries of gaming economies for players and traders worldwide, leveraging technology, cryptocurrency, and creative vision to connect people globally.
Limit Break is backed by prominent investors Buckley Ventures & Paradigm Ventures, as we ride the wave of the rapidly growing crypto market and the future of technology, with gaming leading the way forward.
About the Role:
Limit Break is seeking a Senior Site Reliability Engineer with expertise in CI/CD, Kubernetes, and Security. We are looking for a skilled and inquisitive engineer passionate about problem-solving, security, and driving continuous improvement.
Responsibilities:
- Identify, propose, and implement enhancements to performance and scalability in our multi-cluster EKS environment on AWS.
- Analyze systems' health, scalability, and performance metrics to pinpoint areas for optimization.
- Deploy services, address production issues, and utilize coding to solve operational challenges within the Limit Break Infrastructure and Platform.
- Collaborate with the engineering team to create production-like testing environments for manual and automated testing.
- Develop and refine SLOs, SLIs, monitoring, alerting, and incident response practices, and enhance our observability stack (Grafana, Thanos, Loki) for global scalability.
Qualifications:
- 5+ years of experience in SRE, DevOps, or Systems engineering.
- Proficiency in Kubernetes, including managing multiple EKS clusters in a production environment.
- Extensive knowledge of Terraform and Ansible.
- Experience with CI/CD and automation tools like GitHub Actions, Jenkins, or GitLab CI.
- Strong background in AWS, with expertise in Aurora, RDS (MySQL/SQL), and networking.
- Willingness to participate in an on-call rotation.
- Strong communication skills for clear explanations and reasoning.
- Excellent collaboration abilities to work closely with product engineers and owners to design effective solutions.
- Skilled in implementing monitoring and observability tools like Grafana, Thanos, Loki.
- Familiarity with ElasticSearch stack or similar solutions for log capturing.
- Knowledge of CloudFlare, CDN technologies, and edge/perimeter networking.
- Exposure to cloud security tools such as Wiz, AWS GuardDuty, CloudFlare Zero Trust, and secrets management platforms.
- Experience in addressing vulnerabilities and driving remediation efforts.
- Ability to implement real-time monitoring and protection tools for the environment.
Preferred Skills:
- Experience with containerized and serverless architectures.
- Prior involvement in blockchain projects.
- Work on large greenfield projects.
