We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

Site Reliability Engineer II - CTJ - Poly

Microsoft
United States, Virginia, Reston
Sep 21, 2025
OverviewThe Resiliency Services team is seeking a Site Reliability Engineer II to help drive the reliability, scalability, and operational excellence of our Azure-based solutions. Our team owns and operates several critical services within the AGC, including Azure Automation (streamlining and automating cloud operations), Azure Backup (secure, scalable data protection), Azure Site Recovery (disaster recovery and business continuity), Azure Migrate (cloud migration planning and execution), and the Learn Documents (comprehensive technical documentation and training resources). We are a geographically distributed, collaborative group with in-person coverage at Reston, Elkridge, and Annapolis Junction, and we pride ourselves on fostering a fun, supportive, and high-performing team environment. We are looking for an individual who is quality-focused, proactive, and passionate about reliability. The ideal candidate is someone who can identify issues and drive solutions, communicates clearly, and thrives as a team player. You'll have the opportunity to work across a diverse set of Azure services, ensuring they meet the highest standards for resiliency and customer experience. If you enjoy solving problems, collaborating with talented colleagues, and making a real impact, you'll find our team both rewarding and enjoyable to work with. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesParticipate in the team's on-call rotation to ensure rapid response and resolution of service incidents, minimizing downtime and impact to customers.Monitor, maintain, and improve the reliability and availability of Azure Automation, Azure Backup, Azure Site Recovery, Azure Migrate, and Learn Documents, proactively identifying and addressing potential issues before they affect users.Implement and optimize automation solutions using Azure Automation to streamline operational tasks, reduce manual intervention, and enhance service efficiency.Drive continuous improvement in backup, recovery, and migration processes by maintaining Azure Backup and Azure Site Recovery, ensuring robust disaster recovery and business continuity strategies.Support and enhance cloud migration initiatives with Azure Migrate, helping teams plan, execute, and validate migrations to the Azure platform.Contribute to the creation and maintenance of technical documentation and training resources (Learn Documents), ensuring clarity, accuracy, and accessibility for both internal teams and external customers.Collaborate effectively with team members and stakeholders, communicate clearly about issues and solutions, and foster a positive, fun, and supportive team environment-always striving for quality and taking initiative to fix what needs fixing.Embody our culture and values.
Applied = 0

(web-759df7d4f5-7gbf2)