Job description
This global financial services firm contributes to the stability of the financial markets. They help clients cut through complexity and mitigate risks of financial transactions. They have the ambition to use this key role to facilitate and accelerate a sustainable global financial system.
Role Purpose
As a Senior Site Reliability Engineer you’ll be working alongside and closely with DevEx and Cloud Engineer. They’re a group of engineers who are passionate in learning new technologies and fostering a collaborative and inclusive environment.
Primary Responsibilities
Gather and analyse metrics from servers and services to assist in performance tuning and fault finding.
Partner with development teams to improve services through rigorous testing and release procedures.
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service-level objectives
Proactively manage TLS/SSL Certificates for server technology
Management of client integration accreditation testing and sign-off
Failure Engineering experience (chaos, failure, resilience & recovery)
Qualifications
Knowledgeable and experienced with building, running and supporting Kubernetes clusters in a highly available, high traffic Production environment
Experience working in cloud-based infrastructure (AWS)
Familiarity with one or more coding languages, preferably Go, Python, Ruby, Node
Troubleshooting experience in complex environments using monitoring and logging tools (We use Grafana, Loki, Tempo, Prometheus & Graylog to name a few)
Knowledge and experience with Terraform