New

Principal Site Reliability Engineer

Microsoft
United States, Washington, Redmond
Oct 22, 2025
OverviewCore Services Infrastructure and Security team in Microsoft Teams provides the foundational infrastructure, network, security, monitoring and governance to run planet scale distributed systems and microservices architecture that powers Microsoft Teams. Security and Reliability are at the heart of what this team aspires to do day in and day out. As a Principal Site Reliability Engineer in Core Services Infrastructure and Security team you will be responsible for the Infra and Network Security, front door, routing, gateway, CDN, DNS and monitoring layers for the microservices powering Microsoft Teams. This opportunity will allow you to improve the reliability of such mission critical layers investing in active-active architectures at every possible level, and hone your skills in improving your security acumen working with experts in the space and become adept at troubleshooting and securing the network layer. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. ResponsibilitiesYou will drive strategic improvements in network security, monitoring and troubleshooting across the service and other stakeholders, while prioritizing development and implementation efforts.You will obsess on leveraging metrics and monitors to improve the reliability of mission critical dialtone services and scenarios.You will leverage subject-matter expertise of cross-product features with appropriate stakeholders (e.g., project managers) to drive multiple group's project plans, release plans, and work items.You will hold accountability as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions.You will proactively seek new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers.Embody our culture and values