We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Principal Data Scientist - Microsoft Azure

Microsoft
remote work
United States, Texas, Irving
7000 State Highway 161 (Show on map)
Jul 31, 2025
OverviewGet ready to operate at hyperscale! We are building new systems to optimize the millions of server nodes underlying the Microsoft Azure cloud. You will be part of a dynamic and collaborative team chartered to understand and improve how hardware and software ingredients come together to form our Azure virtual machine (VM) products. Our work includes fleet optimization, server health/performance testing, representative benchmark construction, competitive comparisons, customer collaborations, server platform definition, and more.As a Principal Data Scientist in this role, you will craft sampling experiments and perform analysis to characterize the behavior of Azure's mainstream server fleet. You will define sound approaches, based on statistical inference, to assess fleet health and performance, assess similarity between customer workloads and internal proxies, detect server behavior anomalies using telemetry, define sensible key performance indicators, and more. Based on the understanding you create, you will partner with software engineers and machine learning engineers to improve Azure products and customer experiences. Along the way, you will gain experience in hyperscale cloud computing, a fast-moving industry that permeates nearly every aspect of modern life. Our team is based in Redmond, WA, but this is a hybrid work opportunity, and you have the option to work from home. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesDefine experimental methodology for assessing health and performance consistency across a fleet of millions of server nodes.Select benchmarks through statistical methods that achieve both representativeness and coverage of workload behavior based on collected telemetry.Create anomaly detection methodologies, both online and offline, for identifying problematic server hardware or configurations.Identify opportunities to improve operational efficiency and customer experience through statistical and predictive models.Create key performance indicators, dashboards, data views, and reports based on your experiments and analysis.
Applied = 0

(web-6886664d94-4mksg)