Job description
Senior Storage/Platform Engineer
The team is comprised of extremely talented multidisciplinary individuals with unrestricted access across a large environment. We believe that one cannot build a truly great service without the ability to make changes across the stack. We take great care in focusing on solving real business problems, reducing operational overhead and working together as a team.
The platform and infrastructure team is responsible for managing World Quant’s infrastructure – both engineering and operations:
- Storage platform engineering, performance and capacity management and operations
- Data modelling, database tuning & query optimization
- HPC job scheduling
- Container orchestration
- POSIX and object storage systems
- On premise:
- bare metal compute (linux)
- system tuning
- configuration management
- performance tuning
- network configuration management
- compute, storage, network system purchases / evaluations
- Cloud:
- Environment provisioning and management
- Storage engineering and performance
- Data backups and restores
Qualifications/Skills Required
We are looking for individuals with extensive experience in parallel file systems, particularly GPFS/Spectrum Scale:
- GPFS/Spectrum Scale - required
- Minimum five years of experience in storage environments at scale (eg. billions of files/inodes)
- Experience deploying and managing petabyte scale systems supporting varied workloads
- Mature approach to accessing price/performance, tiering and backup requirements
- Ability to understand business processes and correlate them with storage management and performance tuning.
- Broad experience with storage products likeGPFS, NetApp, Pure, Lightbits,GCP PDsor othernvmespecificproducts
- Linux - required
- Experience using configuration management systems (eg.saltstack,ansible)
- Understanding of linux kernel components (eg. VFS, scheduler, memory mgmt., network)
- Solid troubleshooting experience using gdb, OS & application tracing/profiling mechanisms
- Experience with some of docker, lxd/lxc, kerberos, ebpf and virtualization technologies
- Container Orchestration (Kubernetes) – nice to have
- Experience with: PSPs, helm, admission/mutation controllers, PVs/PVCs, kube-router, BGP – generally demonstrated ability dig deep into the k8s projects to solve hard problems
- Mature approach to dealing with operational complexities and gaps of the kubernetes platform
- Workflow management and batch processing – nice to have
- Experience in the challenges of workflow management in heavily multi-tenant environments
- Mature approach to dealing with/avoiding task failure and system failure
- experience with products likeairflow,nifi,gnubatch,GCP cloud composer,AWS sagemaker
- Software Engineering – nice to have
- Proficient in OO development (we use python), git and CI/CD concepts
- Comfortable contributing to a large code-base with varied technologies
In addition to the above, the following qualifications always apply:
- Comfortable working with business stakeholders to fully understand their applications, business processes, and priorities, and translate them into technical solutions.
- Ability to review and/or extend open source platforms to satisfy business requirements
- A passion for technology and automation, deep sense of curiosity and willingness to always question
- A passion for in-depth understanding of technology, and building large-scale systems.
- Excellent verbal and written communication skills.
- Collaborative mindset – comfortable working with a team of engineers.