Site Reliability Engineer - SKA Low TelescopeCSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and pay our respects to their Elders past and present.The opportunityThe SKA Observatory (SKAO) is a next-generation radio astronomy facility that will revolutionise our understanding of the Universe and the laws of fundamental physics. Enabled by cutting-edge technology, it promises to have a major impact on society, in science and beyond.In Australia, the SKAO is collaborating with CSIRO to operate and support the construction of the low frequency telescope (SKA-Low) in remote Western Australia on Wajarri Yamaji Country. The SKA-Low Site Reliability Engineer (SRE) will be responsible for developing a globally consistent and integrated SRE response system for the SKA-Low Telescope. They will maintain compute platform and infrastructure stability, reliability and robustness.Your duties will include:Work with key SKAO stakeholders to develop and maintain a globally consistent and integrated SRE response system.Implement and maintain SKA Low compute platform and infrastructure stability, reliability and robustness.Define, measure and refine Site Reliability Engineering Service Level Objectives (SLO) for the compute, platform and network infrastructure team and corresponding Service Level Indicators (SLI).Implement and continuously improve monitoring systems for system/service health and behaviour observability.Location:Indefinite – Full-Time, Part-Time or Job-ShareReference: 99276To be considered you will need:A tertiary qualification in Software Engineering, Computer Science, or equivalent work experience.Demonstrated experience implementing SRE practices and procedures, such as automation, to deliver and maintain reliable and robust systems/services.Experience in using infrastructure provisioning tools (such as Ansible, Puppet) in a cloud or virtualised environment.Experience in compute hardware and data centre infrastructure management.Ability to communicate in a professional yet friendly and effective manner, both orally and in writing, to colleagues that spans a wide range of cultures and backgrounds.Proficient in containerisation, orchestration, and automation platforms such as Kubernetes.Experience in administering medium to large scale Linux application servers/clusters in a HPC environment, with tools such Slurm.Experience in Continuous Integration / Deployment (CI / CD) pipelines (Jenkins, Gitlab, GoCD, Travis-CI), automated testing, configuration management and continuous monitoring.Demonstrated understanding and enthusiasm for working based on lean/agile principles.Applications for this position are open to Australian/New Zealand Citizens, Australian Permanent Residents or you must either hold, or be able to obtain, a valid working visa for the duration of the specified term. Appointment to this role is subject to provision of a national police check and may be subject to other security/medical/character requirements.Flexible working arrangementsWe work flexibly at CSIRO, offering a range of options for how, when and where you work.About CSIROAt CSIRO Australia's national science agency, we solve the greatest challenges through innovative science and technology. We put the safety and wellbeing of our people above all else and earn trust everywhere because we only deal in facts. We collaborate widely and generously and deliver solutions with real impact.How to applyTo apply for this role, please apply on-line providing your CV, and a Cover Letter clearly addressing the essential criteria of this role and your motivation for applying. Under CSIRO policy, only those who are able to demonstrate how they can meet the essential criteria may be appointed.
#J-18808-Ljbffr