Job description
- Are excited to be part of a vibrant engineering community that values diversity, hard work, and continuous learning.
- Love solving complex real-world business problems.
- Recognize that cross-functional collaboration is a core component of success for the team.
- Believe there are multiple ways to solve most technical problems and are willing to debate the trade-offs.
- Have become a stronger engineer by making mistakes and learning from them.
- Are a doer, someone who wants to grow their career and gain experience across technologies and business functions.
- Continuously invest in a high-performance and inclusive culture, in which a diversity of backgrounds, experiences and viewpoints are celebrated and valued.
- Encourage career mobility, so you can benefit from learning different functions and technologies, and we gain the benefits of your experience across teams.
- Run technology pro bono programs that help the non-profit community and give our engineering community opportunities to volunteer and participate.
- Offer education reimbursements and ongoing training in technology, communication, and diversity & inclusion.
- Embrace knowledge sharing through lunch-and-learns, demos, and technical forums.
- Consider our people to be our greatest asset—we will help you learn what PIMCO Technology has to offer so you can participate in activities that benefit your career while delivering impactful technology solutions.
- You will build telemetry and automation solutions that are in alignment with broader technology platforms; Support incident responses, blameless postmortem, design and implement improvements to prevent incident reoccurrence
- Modernize technology observability practices with an emphasis on top-down monitoring, white box monitoring.
- Analyze effort patterns (user queries, service requests, incidents, workflows) for optimization.
- Design, code, test, and deliver software to eliminate manual operational work.
- Implement self-healing and resiliency patterns, exercise failure cases regularly to validate resilience assumptions.
- Plan, lead, supervise and optimize the production related software and infrastructure for capacity and resiliency
- Minimum of 3+ years experience working in a similar capacity
- Ability to write scripts in multiple languages (bash, python, awk, JavaScript)
- Knowledge of build and configuration tools (for example: Gitlab, SolarWinds, CHEF, Puppet, Ansible, TeamCity)
- Knowledge of scheduling tools (for example: Autosys, Cron, Bob)
- Knowledge of profiling tools (for example: Datadog, Valgrind)
- Knowledge of monitoring tools (for example: Geneos, PagerDuty, Nagios)
- System and network administration and troubleshooting skills (Linux/Unix and Windows). Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
- Strong communication skills to be able to manage stakeholder expectations
- Strong problem solving and troubleshooting skills
- Respond and prioritize multiple issues at a time in a timely matter and perform a structured analysis of the root cause
- Strong curiosity and bias for pro-active planning, action, ownership, learning and continuous improvement.
- Strong interpersonal skills and ability to nurture relationships with all internal/external partners, promoting diversity of perspectives, ideas and culture
- Proficiency with any major RDBMS
- Bachelor’s degree in Computer Science or equivalent.
- Experience with 12 factor applications
- Strong experience with Python
- Experience running Splunk queries and building dashboards
- Experience with Datadog
- Experience with Terraform
- Experience with AWS CDK
- Knowledge of fixed income and/or equities products
- An understanding of ITIL support standard methodologies and experience with Service Management (Incident/Problem/Change Management, etc.)
Equal Employment Opportunity and Affirmative Action Statement