Job Description
AWS Utility Computing (UC) is a driving force behind product innovations, including foundational services like Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), as well as consistently released new product innovations that set AWS services apart in the industry.
As a member of the UC organization, you will support the development and management of Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services in AWS, including specialized security solutions for cloud services.
Our Region Services Team
We redefine the way AWS designs, builds, and operates AWS regions to enable new AWS Cloud Infrastructure and Services offerings to customers across every industry and size, including start-ups, enterprises, and public sector organizations.
Key Responsibilities
* Technical leaders for their team, becoming subject matter experts in multiple services.
* Efficiently deliver the right things with limited guidance.
* Use technology to solve complex problems.
* Improve or drive improvements to these services and features for their team.
* Understand how their work impacts multiple teams.
* Take a project from undefined problem to delivery on schedule, applying appropriate technologies and industry systems-engineering best practices.
* Make effective implementation trade-off decisions and assist others with trade-off decisions.
* Identify when implementations are not highly available, performant, secure, stable, maintainable, or have defects; determine short- and long-term mitigations.
* Have a depth and breadth of understanding in multiple areas of technology and be excellent diagnosticians.
* Utilize Linux expertise to troubleshoot, innovate fixes and workarounds, keep software up-to-date, and provide data and metrics to manage capacity and efficiency of services.
* Lead team design, scoping, and prioritization discussions.
* Understand technical decisions' impact on their team and other teams' services, recommend opportunities and solutions for these problems.
* Optimize and/or automate team processes to meet organizational goals.
* Lead implementation reviews, including designs, code, operational readiness, deployment plans, system changes, etc.
* Anticipate and mitigate patterns affecting performance, reliability, or availability of their team's systems, creating automation to simplify workloads that scale operations.
* Drive SOP, documentation, and runbook creation, reviewing them for accuracy, driving use of best practices.
* Actively recruit and interview for their team, help others leverage their expertise by coaching and mentoring.
* Communicate clearly and collaborate with others to deliver results.
* Be self-starters, comfortable dealing with ambiguity and change.
* Be customer-obsessed, always looking to understand customer pain points and find resolutions quickly and completely.
About This Role
You will dedicate a substantial portion of your time to review the operational health of services within your team's responsibility. Identify anomalies and craft actionable bug reports, aspiring to enhance overall efficiency and performance of your systems.
Offer constructive feedback on change management documents and work earnestly to address your team's operational backlog.
Navigate challenges and ensure seamless functionality of your systems.
Engage in the development and testing of scripts to provide practical solutions to enhance workflows.
Assume a role as an educator, sharing insights on complexities of the Cloud with service teams to contribute to collective knowledge of the team, fostering a culture of mutual understanding.
This week encapsulates your commitment to continuous learning and improvement, acknowledging every effort contributes to collective success of your team and reliability of your software systems.
You will occasionally participate in 'on-call' rotations to resolve incidents occurring out-of-hours.
About Our Team
Region Services provides high-caliber Operational Solutions and Cleared Support for services within our Regions.
We provide 'hands-on-keyboard' support to our service teams by deploying changes into isolated regions, monitoring results, and reporting any issues observed.
Diverse Experiences: We value diverse experiences.
If you do not meet all preferred qualifications and skills listed in the job description, we encourage candidates to apply.
Why AWS: Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform.
We pioneered cloud computing and never stopped innovating - that's why customers from successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Basic Qualifications
* 8+ years of systems engineering experience, working with hardware, software, networking, operating systems.
* Advanced knowledge in Linux system administration.
* SIGNIFICANT experience of systems automation using BASH, Python, Perl, etc.
* SIGNIFICANT experience of network fundamentals (DNS, DHCP, TCP/IP, routing, switching, HTTP).
* Excellent troubleshooting skills at all levels, from application to network to host.
* Experience writing technical documents, project plans, and progress reports to leadership and stakeholders.
Preferred Qualifications
* Experience of building and operating systems at scale.
* Advanced knowledge of configuration management systems, such as Puppet, Chef, Ansible, or related systems.
* SIGNIFICANT experience of monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar).