* Build a culture of reliability and develop robust reliability pattern
* Software Engineering background
* Sydney-based, 2 days working from home
As a Senior Site Reliability Engineer, you will be working closely with software engineering teams and stakeholders to ensure the health and performance of the infrastructure and software. You will play a key role in developing and implementing monitoring, logging, and alerting technologies, and establish best practices for SLIs, SLOs, metrics, and error budgets.
Your Responsibilities:
* Apply observability principles to infrastructure, environments, and software, ensuring performance and reliability.
* Develop and implement monitoring, logging, and alerting solutions for proactive issue detection and resolution.
* Define and manage SLIs, SLOs, metrics, and error budgets in line with best practices.
* Establish best practices for release engineering, including strategies for canary releases, feature toggling, and rollback mechanisms.
* Design and implement reliability software patterns and practices.
* Automate processes to reduce toil and increase operational efficiency
Your experience:
* Strong experience in software engineering with a deep understanding of modern software development practices, tools, and technologies.
* Expertise in building and managing SLOs, metrics, logging, tracing, and error budgets.
* Proven track record of automating repetitive tasks and driving efficiencies in software engineering and operations.
* Deep knowledge of DevOps fundamentals and experience with cloud technologies.
* Proficiency in programming languages such as Java, and Go, and experience with microservices architectures.
* Excellent collaboration and communication skills, with the ability to influence and lead teams to adopt best practices.
* A passion for reliability, observability, and improving system performance.
Only for Sydney-based candidates with Australian/NZ citizenship or PR.