Overview The 1ES Microsoft team is on the lookout for a Senior Data Engineer to join the Engineering Thrive initiative.This pivotal role involves crafting and refining metrics that capture the essence of engineering systems performance and productivity, in line with the SPACE framework where SPACE stands for Satisfaction, Performance, Activity, Communication, and Efficiency.The successful candidate will have a direct impact on Microsoft's engineering culture and strategy by contributing to dashboards surfacing productivity metric data to company leaders.The ideal candidate will bring a wealth of experience from working on large-scale software development projects, whether in the commercial sector or within the open-source community.Analytical skills, data query experience, and effective communication skills are essential, as the role demands interaction with various engineering stakeholders and partners.The ability to articulate, justify, and validate the metrics, as well as to respond to feedback and queries, is crucial.Experience in software development engineering and a keen understanding of productivity are also key to driving success in this dynamic and influential position.Microsoft's mission is to empower every person and every organization on the planet to achieve more.As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals.Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.Responsibilities Collaborates with appropriate stakeholders across teams and escalates concerns around data requirements by assessing and conducting feature estimation.Informs clients on feasibility of data needs and suggests transformations or strategies to acquire data if requirements cannot be met.Negotiates agreements with partners and system owners to align on project delivery, data ownership between both parties, and the shape and cadence of data extraction for one or more features.Proposes new data metrics or measures to assess data across varied service lines.Leads the design of a data model that is appropriate for the project and prepares design specification documents to model the flow and storage of data for a data pipeline.Designs assigned components of the data model for a functional area of a project.Partners with stakeholders (e.g., Data Science Specialists) to make iterative improvements to design specifications, data visualizations, data models, or data schemas.Considers tradeoffs between analytical requirements with compute/storage consumption for data and anticipates cost that could be influenced by the cadence of data extraction, transformation, and loading into moderately complex data products or datasets in cloud and local environments.Demonstrates an advanced understanding of costs associated with data that are used to assess the total cost of ownership (TOC).Identifies data sources and builds code to extract raw data from identified upstream sources using query languages while ensuring accuracy, validity, and reliability of the data across several pipeline components.Contributes to the code review process by providing feedback and suggestions for implementation.Leverages reduction techniques, and aggregation approaches to validate the quality of extracted data across a data pipeline, consistent with the Service Level Agreement.Offers feedback on methods and tools used to track and maintain data source control and versioning.Applies deep knowledge of data to validate that the correct data is ingested and that the data is applied accurately across the pipeline.Plans and creates efficient techniques and operations (e.g., inserting, aggregating, joining) to transform raw data into a form that is compatible with downstream data sources, databases, and visualizations.Independently uses software, query languages, and computing tools to transform raw data across end-to-end pipelines.Evaluates data to ensure data quality and completeness using queries, data wrangling, and statistical techniques.Merges data into distributed systems, products, or tools for further processing.Writes code to implement performance monitoring protocols across data pipelines.Builds visualizations and smart aggregations.Develops and updates troubleshooting guides (TSGs) and operating procedures for reviewing, addressing, and/or fixing advanced problems/anomalies.Supports and monitors platforms.Performs root cause analysis in response to detected problems/anomalies to identify the reason for alerts or customer escalations and implement solutions that minimize points of failure.Implements and monitors self-healing processes across multiple product features to prevent issues from recurring in the future and retain data quality and optimal performance (e.g., latency, cost) throughout the data lifecycle.Documents the problem and solutions through postmortem reports and shares insights with team and the customer.Provides data-based insights into the health of data products owned by the team according to service level agreements (SLAs) across multiple features.Anticipates the need for data governance and designs data modeling and data handling procedures, with direct support and partnership with Corporate, External, and Legal Affairs (CELA), to ensure compliance with applicable laws and policies across all aspects of the data pipeline.Tags data based on categorization (e.g., personally identifiable information (PII), pseudo-anonymized, financial).Documents data type, classifications, and lineage to ensure traceability.Governs accessibility of data within assigned data pipelines.Provides guidance on contributions to the data glossary to document the origin, usage, and format of data for each program.Participate in 24/7 on-call rotation to support 1st party customers.Embody our Culture and Values.