Senior site reliability engineer

Swansea (NSW)

Buscojobs

Posted: 31 March

Offer description

Get AI-powered advice on this job and more exclusive features.

Tactiq is a rapidly growing Sydney-based series-A SaaS startup that uses generative AI to extract knowledge from conversations. Users and teams from over 20,000 global organisations use Tactiq to process over 2 million monthly meetings.

As we continue to experience strong growth, this role presents a unique opportunity to solve real customer problems.

Join our product engineering team, where you'll be instrumental in developing, maintaining, and monitoring our cloud infrastructure and delivery pipelines. You'll collaborate closely with the engineering, product, and design teams to ensure stable and scalable operations for our product and users. You will empower software engineers to maintain their own infrastructure and services, improve their CI / CD pipelines, and have proper observability over their operations. You'll help them implement and maintain the best security, reliability, performance practices, monitoring, and alerting.

This role includes an on-call rotation to respond to incidents within and outside normal business hours.

Requirements

* 5+ years of senior-level hands-on experience as a DevOps, cloud infrastructure, or systems reliability engineer, preferably in SaaS
* Expertise in cloud computing platforms, ideally GCP or Azure
* Expertise in Kubernetes+Docker containerisation of applications
* Expertise in CI / CD pipelines in GitHub Actions, including custom scripting and automation / integration with other development tools (Linear, Slack)
* Expertise in Terraform IaC, along with cloud-based providers like Hashicorp Cloud Platform or Spacelift
* Expertise in logging, monitoring, and alerting tools, ideally Datadog, New Relic, PagerDuty
* Expertise in creating and maintaining fixed and on-demand application environments, including configuration and secret management
* Expertise in cloud application availability, capacity planning, performance, and observability strategies
* Expertise in responding to, diagnosing / triaging, and investigating / RCA of production incidents
* Strong understanding of cloud networking, including TCP / IP, DNS, load balancing solutions
* Strong understanding of infrastructure cost / billing monitoring and management
* Strong understanding of edge security, ideally Cloudflare, and intrusion detection systems
* Experience with specific GCP services like GKE, Firestore, Batch, Scheduler, and Tasks is a plus
* Experience with Temporal.io, Elastic, Retool, and deploying GPU workloads is a plus
* Strong communication skills for effective interaction and knowledge transfer with technical and non-technical colleagues
* Proven capability to lead infrastructure, deployment, and monitoring efforts together with software engineers, empowering them to maintain their code in production
* Pragmatism in choosing solutions and flexibility in adjusting along the way
* Resourcefulness in finding minimal but viable solutions to new requirements
* A commitment to delivering high-quality, stable cloud infrastructure that appropriately scales and supports the needs of the product
* Curiosity in finding and experimenting with new technologies and tools
* Collaborative culture with rapid feedback, empowerment, and openness
* Flexibility with a remote work model and available office space
* Close collaboration with the product and UX design team
* Modern tooling : VSCode, GitHub CI pipelines, dynamic feature environments, and advanced monitoring and analytics via Datadog and Mixpanel
* Direct interaction with customers and real-time feedback, impacting hundreds of thousands of users around the world
* MacBooks are our standard, and we're happy to get you whatever equipment helps you get your job done
#J-18808-Ljbffr

Send an application

Create a job alert

Save