28 days ago

Logo of FactSet

Lead Site Reliability Engineer

FactSet

RemoteUK

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our software systems and infrastructure. The ideal candidate possesses a strong background in coding, automation, and system administration, combined with a passion for continuously improving system reliability.

 

Responsibilities:

  • Collaborate with development, operations, and product teams to define, review, and implement reliability standards and best practices.
  • Design, implement, and maintain highly available and scalable architectures for our applications and infrastructure.
  • Develop and enhance automated tools and frameworks to optimize system monitoring, deployment, and recovery.
  • Troubleshoot and resolve complex issues throughout the entire software stack, including networking, databases, and distributed systems.
  • Conduct performance analysis and capacity planning to ensure system scalability and resource optimization.
  • Take a proactive approach to continuously improving reliability.
  • Participate in incident response, root cause analysis, and postmortem activities to identify and rectify system failures.
  • Collaborate with cross-functional teams to implement and improve CI/CD pipelines, ensuring reliable and efficient software releases.
  • Stay up-to-date with emerging technologies and industry trends, actively contributing to ongoing system improvements.
  • Participate in on-call rotation.

 

Requirements:

  • Bachelors degree in Computer Science, Engineering, or equivalent practical experience.
  • Proven experience deploying and managing large-scale distributed systems successfully.
  • Understanding of SRE concepts (error budgets, SLIs/SLOs, blameless postmortems)
  • Proficiency in programming languages such as Python, C++, or Go
  • Familiarity with monitoring and observability tools.
  • Excellent problem-solving skills and ability to troubleshoot complex issues efficiently.
  • Strong organizational and communication skills, with the ability to collaborate effectively in a cross-functional team environment.

 

Desirable Qualifications:

  • Familiarity with security best practices and experience implementing security measures in a production environment.
  • Experience with modern infrastructure technologies and tools, including cloud platforms (AWS, Azure, GCP), containers (Docker, Kubernetes), and orchestration (Ansible, Chef, Puppet).
  • Solid understanding of networking protocols and technologies (TCP/IP, DNS, load balancing).
  • Demonstrated experience with infrastructure as code (IaC) and automation tools (e.g., Terraform, GitHub Actions).

 

Join our team and contribute to creating and maintaining a highly reliable and performant infrastructure that supports our growing platform. Help shape the future of our systems architecture while working in a collaborative and innovative environment.