Job description
My client is the world’s leading specialist in air transport communications and information technology, they are looking for a Site reliability Engineer to join their busy London team.
Site Reliability Engineering (SRE) is a production-oriented discipline combining deep software engineering and systems skills. As a Site Reliability Engineer (SRE) you possess a combination of software engineering & service engineering skills, and importantly you write code. You know and have interest in production related technologies and take engineering-based approaches to solve operations problems and have a pre-disposition to automate over doing things manually. You care deeply about software quality and how software is constructed and designed with a production oriented focus on monitoring, release processes, safe deployments, performance optimisation, capacity management, crisis response, resiliency engineering and similar concerns beyond pure functionality which are relevant to improving reliability.
You seek to understand distributed systems and services taking holistic approaches across the infrastructure stack, from front to back through storage, compute, and networking, from clients to servers and everything in between. You engage in the operational behaviour of a product or service and may participate in production on-call activities. Collaboration, cross team influence and partnership are essential, as is ability to reason complex systems and behaviours.
As an SRE, you will look to help the different parts of our organisation to work together through better communication and collaboration. In conjunction with our Service Delivery Engineers, you will help to build out the appropriate Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Our customers want us to move faster than ever before. As an SRE, you will promote frequent releases, continually updating the product and keeping team members on their toes about new and relevant technology. You will encourage us to move quickly by reducing the costs of failure.
One of the main focal points for SREs is automation. You should promote and drive automation as much as possible - as long as the automation provides value to developers and operations by removing manual tasks. In other words, you will strive to minimise manual effort to drive long-term value to the product or service we are providing.
An automated workflow that moves fast is something that needs constant monitoring. DevOps teams and SREs both need to make sure that they’re moving in the right direction, and they do so by measuring everything. SREs considers everything as a software problem, and hence devise prescriptive ways for measuring things such as uptime, availability, toil, outages etc.
IMPORTANT - SUCCESSFUL APPLICANT MUST HAVE EXPERIENCE/SKILLS USING DYNTRACE OR SIMILAR and the following;
- Dynatrace
- AppDynamics
- datadog
- SCOM
- Logicmonitor
- Kibana /Elasticsearch /Splunk
- Docker or Kubernetes
- DevOpS
Overview of the typical activities we expect the SRE to do/contribute heavily towards
- safeguard, support, and advance the software and systems they are responsible for
- proactively help to monitor capacity, performance, latency, and availability of the products they are responsible for
- increase the reliability and efficiency of the products they are responsible for
- help shape and deliver any cloud migration strategies of the products they are responsible for
- own log and secret management, project dockerisation, and the continuous integration and delivery pipelines for the products they are responsible for
- ensure monitoring and observability tooling and methods are adopted consistently across the business
- work with development, infrastructure, operations, support teams to deliver the above
Benefits that you will bring to the areas of the business that you work with
- Modernise - given an SREs comprehensive viewpoint and deep understanding of contemporary technologies and best practices, the SRE can modernise the products end-to-end delivery pipelines, ensuring we are able to deliver faster and to a higher quality than we do now (yes it is 100% possible to deliver faster with higher quality)
- Early prevention on issues that can affect end users - Tight iteration and development cycles and production releases not only let us be one step ahead in the market but also mitigates various problems such as bug and vulnerability detection early.
- Enhanced monitoring - SREs will endeavour to comprehend the systems they are using and, using automation and machine learning, e.g. build a procedure wherein alarms are automatically forwarded to whoever is most qualified to resolve them. SREs remove these problems through proactive troubleshooting.
- Enhanced metrics reporting - clarity around end-to-end metrics relating to bugs, efficiency, production, overall service health, and other factors.
- Generate more time for DevOps teams to, well, develop - DevOps teams will have a lot more time to devote to developing new features and enhancements if the error detection and resolution process was more effective and deployment pipelines faster and more automated (they will need to do far less baby-sitting)
- Better Operations responsiveness - freeing up operations teams to do direct configuration, testing, and maintenance.
- Focused on improving the customer experience - the SRE drives continuous improvement via empirical data. Maintaining SLA/SLO/SLIs. This is stuff is not the remit of the delivery teams.
- Continuous Cultural improvement - SREs offers ongoing search of areas of improvement to optimise services and products reliability
- Increased automation - SREs continuously find the best way to automate and modernise workflow of product engineers. At the same, they also improve their workflow by detecting vulnerabilities and bugs early. Thus, automation will increase the reliability of services or systems.
What We Offer
My client’s workplace is all about diversity. Many different countries and cultures are represented in the workforce, and colleagues who’ve been working there for decades collaborate with those just out of college and early in their careers. My client’s workplace is a place of change and constant improvement, where they're always pushing themselves to find better ways of doing things smarter, quicker, easier, for them, and for their customers too!
And they offer all the good stuff you’d expect like holidays, bonus, flexible benefits, medical policy, pension plan and access to world class learning.
Location - Aldershot or Hayes, Greater London | Hybrid working role 3x per week in the office)
Job Types: Full-time, Permanent
Salary: £65,000.00-£70,000.00 per year
Benefits:
- Additional leave
- Company car
- Company events
- Company pension
- Employee mentoring programme
- Employee stock ownership plan
- Flexitime
- Free parking
- Gym membership
- Health & wellbeing programme
- Life insurance
- On-site parking
- Private medical insurance
- Work from home
Schedule:
- Flexitime
- Monday to Friday
- Weekend availability
Supplemental pay types:
- Bonus scheme
- Yearly bonus
Application question(s):
- Do you have the Right to work for any employer in the UK?
- Are you located in London/Greater London and able to work 3x per week Hybrid in either Hayes or Aldershot?
- Do you have experience working with Dynatrace or similar? Please elaborate.
- Do you have skills in Kibana, Elasticsearch or Splunk?
- Have you any experience previously in the airline/aviation industry?
- What is your notice period?
- What is your desired salary for this role?
Work Location: Hybrid remote in London, UB3