Site Reliability Engineer (SRE) Apple Services Engineering, Observability London, England
Job description
Summary
Key Qualifications
- Strong sense of ownership and integrity demonstrated through clear communication and collaboration
- Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment
- Experience with the Prometheus ecosystem
- The ability to design, author, and release code in languages like Go or Python
- Acute drive to automate manual operations and to improve them through repeated iteration
- Understanding of the Linux Operating System, standard networking protocols, and components
- Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible, and Spinnaker)
- Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
- Excellent troubleshooting and problem solving skills
- Experience with scale testing, disaster recovery, and capacity planning
- Familiarity with microservices architecture and container orchestration with Kubernetes
Description
Education & Experience
Additional Requirements