You are highly experienced in building customer focussed solutions. We are a team of big thinkers, who love to push boundaries and create new solutions. Together we will build tomorrow's bank today, using world-leading technology and innovation. Do work that matters.
Job Description
The role of Staff Platform Engineer is to design, build, run & evolve tools, infrastructure, templates and capabilities that our data science community and other engineers use to deliver business value, and to write code that automates running our infrastructure and environments.
Collaboratively work with customer facing product owners and platform engineers to design, build and run 'platforms' that they can use to deliver customer value at greater quality, velocity, and safety.
Help to make our platforms loved by our engineers and data science community.
Key Responsibilities
* Provide strategic technical leadership and mentorship driving best practices for ML platform architecture, deployment and scaling
* Oversee the design and development of scalable and resilient AI infrastructure with a focus on performance and reliability and architect core components, ensuring performance, reliability, and scalability
* Create a standardised set of tooling for deploying and running applications and setting them up with best practices.
* Collaboratively work with customer facing product owners and engineers to design, build and run platforms that they can use to deliver customer value at greater quality, velocity, and safety.
* Make all platforms entirely self-service, secure, and available within minutes without human approval.
* Collaborate with data scientists, engineers and stakeholders to define and implement technical requirements.
* Translate needs into technical solutions and ensure the platform's reliability through robust monitoring, logging, and alerting systems
* Develop and maintain comprehensive documentation, including architecture blueprints and best practices as well as conduct workshops and training sessions to educate and align the team on platform usage and best practices
* Stay up to date with the latest development in the field of ML, MLOps, LLMs, GPUs and related concepts
* R&D on emerging AWS technology.
Requirements
* We're interested in hearing from people who Are experts at the Full Cycle model, where engineers are involved in Design, Build, Change, and Run
* Have a passion for designing, developing, deploying and running high quality modern machine learning platforms.
* Contribute to a culture where quality, inclusiveness and excellence are championed.
* Have a natural drive to educate, communicate and positively influence various stakeholder groups including high level executives.
Tech Skills
* AWS Services: In depth knowledge of AWS services such as EC2, ECS, S3, Lambda, Step function, RDS, DynamoDB, IAM, VPC, Route 53, Cloudwatch, EKS
* ML Services: Expertise in AWS ML services like SageMaker, AWS Glue, Amazon EMR.
* Familiarity with AWS Bedrock, Amazon Q services, NVIDIA GPUs and related frameworks, LLMs.
* Model Lifecycle: Experience with the end-to-end ML lifecycle, including data preprocessing, feature engineering, model training, evaluation, and deployment
* Scripting: Proficient in automation and Scripting (Bash, Python).
* IaC Tools: Hands-on experience with infrastructure as code tools like AWS CloudFormation.
* Version Control: Proficiency with version control systems like Github, Github Actions.
* Monitoring & Observability: Expertise in tools like Grafana, Prometheus.
* Engineering Tooling: Artifactory, Synk, Docker.