Staff SRE - Cloud Infrastructure
Circle is a pioneering fintech company operating within the realm of the evolving internet of finance, facilitating seamless global transmission of value akin to digital information. This transformative internet layer signifies a paradigm shift allowing for enhanced speed, cost-efficiency, and immediate global transfers, marking a significant advancement in the financial landscape.
Company Culture:
At Circle, transparency and stability are fundamental to our ethos. As we expand our reach internationally, speed and effectiveness drive our progress. Our employees embody our core values such as Multistakeholder approach, Mindfulness, Commitment to Excellence, and High Integrity. In our remote work environment, our teams thrive through collective efforts, fostering flexibility, diversity, and a culture of innovation that encourages novel ideas and inclusive participation.
Responsibilities:
We are seeking a Senior Site Reliability Engineer to oversee the design, construction, and maintenance of Circle’s infrastructure ecosystem catering to a globally distributed customer base across various regions. Leveraging your expertise, you will ensure the consistent, efficient, and high-performing operation of Circle's products and pivotal systems. This role offers a unique chance to sharpen your skills, collaborate with diverse team members, and thrive in a dynamic, fast-paced setting dedicated to delivering top-notch customer experiences.
Key Responsibilities:
- Provide agile support to multiple development teams via a responsive and efficient CI/CD platform for delivering quality builds with quantifiable performance and reliability.
- Manage, enhance, secure, and scale cloud-based infrastructure using Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, and Ansible.
- Automate operational tasks utilizing scriptable languages like Go, Python, alongside serverless solutions like AWS Lambda and Kubernetes Jobs.
- Design, oversee, and monitor Kubernetes clusters for diverse production workloads.
- Drive the progression of blockchain infrastructure by configuring and managing blockchain nodes across various blockchains like Algorand, Ethereum, Hedera, Flow, Solana, Stellar, Tron.
- Participate in an on-call rotation to promptly address disruptions affecting production systems while conducting root cause analyses of incidents.
- Plan and assess disaster recovery scenarios for a resilient microservices architecture.
- Collaborate with the Security team to establish and maintain security-centric tools, frameworks, and fortify cybersecurity measures.
- Engage in mentoring activities to support team members, fostering the growth and scalability of the team.
Required Skills:
Senior Site Reliability Engineer:
- Possess 4+ years of experience in DevOps or SRE roles with emphasis on tooling, automation, and cloud infrastructure on a major public cloud provider.
- Proficient in coding/scripting using languages like Go, Python, Shell.
- Have a minimum of 3 years of experience in constructing and managing CI/CD platforms and supporting agile engineering teams in building microservices.
- Experienced in Docker image creation, container deployment within Kubernetes clusters, working with modern CI/CD platforms, deployment strategies like Blue-Green, Canary, A/B Testing, managing blockchain systems, and databases like PostgreSQL, Redis, OpenSearch.
- Knowledge in various tech domains including network routing, DNS, load balancing, monitoring tools, Helm charts, IaC with Terraform, and deploying resources across public cloud providers.
- Strong troubleshooting, observability, and performance optimization skills with the ability to communication effectively and simplify technical concepts to stakeholders.
Staff Site Reliability Engineer:
- Bring a minimum of 7 years of DevOps/SRE experience with advanced skills in tooling, automation, and managing infrastructure on major public cloud platforms.
- Well-versed in API design, REST principles, cloud services (AWS, Google Cloud, Microsoft Azure), containers and Kubernetes, SQL databases, coding standards, and excellent test coverage diversely.