Senior Data Scientist, Brain Tumor Insitute

Children's National Medical Center
United States, D.C., Washington
Mar 28, 2025
Description The Brain Tumor Institute (BTI) Bioinformatics Core at Children's National Hospital is seeking a highly skilled Senior Bioinformatics Scientist/Engineer to join our team. This position will play a critical role in advancing research of multiple PIs focused on uncovering oncogenic mechanisms in pediatric brain tumors and identifying novel therapeutic targets. The Senior Bioinformatics Scientist will engage in basic and translational research projects and contribute to tool development, such as interactive applications for visualizing complex genomic data. The role involves close collaboration with researchers and clinicians within both Children's National as well as external partners. The successful candidate will report to the Director of the BTI Bioinformatics Core and lead workflow creation and implementation using CWL and/or NextFlow, benchmark new core pipelines, contribute bioinformatics analyses to focused projects based on PI needs, participate in collaborative activities in the BTI such as code review and/or workshop training, and contribute to grant applications and scientific manuscripts. In addition, this candidate will support core engineering needs such as database/API/UI development and automation. Key Responsibilities: * Collaborate with bioinformatics scientists and PIs to benchmark and optimize new production-scale analysis pipelines and workflows to generate high quality and high data integrity outputs. * Support project-specific engineering needs, such as database/API/UI development. * Collaborate with IT to ensure AWS IAM and bucket security and optimize resource use. * Create and maintain clear documentation for data engineering workflows, including codebases, data pipelines, validation, testing, and CI/CD processes. * Perform high-quality bioinformatics analyses on pediatric oncology datasets, including genomic, transcriptomic, and epigenomic data. * Design and implement downstream analytical workflows for high-throughput data using GitHub, Docker, and AWS infrastructure, focusing on reproducibility, code efficiency, and scalability. * Utilize cloud-computing environments (e.g., AWS EC2) and/or high-performance computing (HPC) to support large-scale or memory-intensive analyses. * Actively and positively participate in sprints and code reviews, ensuring high standards for reproducibility and documentation. * Engage with multidisciplinary teams, providing bioinformatics expertise to support collaborative research initiatives. Application Process: This position will be remote. Candidates should be prepared to share their GitHub handle and present a recent project as part of the interview process. ------------------------------------------------------------------------------------------------------------------------------------- Build scalable, production ready machine learning and statistical models to improve healthcare data latency through automation. This role will focus on advanced statistical and machine learning solutions collecting, cleansing, interpreting large volumes of data from varying sources, designing and delivering production ready models, monitoring and maintaining models' health in production, all while communicating key findings with stakeholders. Qualifications Preferred Skills: * Ph.D. in Bioinformatics, Computational Biology, or a related field, or equivalent industry experience. * At least ten years of experience in bioinformatics including cancer, with expertise in Bash, R or Python, RShiny and or Python GUI applications. * Proficiency with cloud-based or high-performance computing environments for bioinformatics workflows. * Strong experience with tools and best practices for reproducibility, including Git and Docker. * Proven experience with genomic data types such as single nucleotide variants (SNVs), copy number variants, fusions, RNA expression, methylation, proteomics, splicing, and single cell datasets. * Commitment to open science practices, including sharing and collaborating on code, data, and documentation. * Extensive experience with current standard parallel computing and data processing workflows (eg: Snakemake, NextFlow, CWL, WDL). * Experience diagnosing and troubleshooting pipeline errors and unexpected behaviors. This includes taking initiative whether it be debugging, online searches, contacting authors of software for assistance and generally seeking assistance as needed. * Experience with reproducible pipeline development including software version control, use and creation of docker and/or singularity images, collaborative code review. * Demonstrated ability to develop and implement best practices for bioinformatics systems integration, testing, and deployment is required. * Interest in learning AWS cloud architecture, design, and automation. * Strong organizational and project management skills, with the ability to work on multiple projects and teams. * Excellent communication skills, with the ability to work in cross-disciplinary teams. ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Minimum Education Bachelor's Degree A Bachelor's degree in a quantitative/statistical or business field (e.g., Statistics, Mathematics, Engineering, Computer Science). (Required) Master's Degree Masters preferred. (Preferred) Minimum Work Experience 6 years Requires deep functional knowledge with 6+ years of related experience or equivalent experience acquired through accomplishments of applicable knowledge, duties, scope and skill reflective of the level of this position. (Required) Required Skills/Knowledge Experience working in a heavily regulated industry. Healthcare is a plus. Advanced course in machine learning and programming. Experience working with global distributed multicultural teams. Experience with agile leadership. Experience with building, delivering and maintaining production ready machine learning models. Knowledge of statistical data analysis and machine learning such as linear models, time series forecasting, neural network, random forest and NLP models, etc. Expert in Python coding and utilization of machine learning and statistical packages for modeling. Experience with database skills, SQL, NoSQL, coding for ETL. In depth understanding of machine learning algorithms such as random forest, neural network, graph models, NLP, etc. Familiarity with Spark, Azure, Databricks, MLFlow AutoML. Experience and familiarity with backlog management tools and resources, ideally with JIRA and Confluence. Seeks to acquire knowledge in area of specialty. Ability to identify basic problems and procedural irregularities, collect data, establish facts, and draw valid conclusions. Ability to work independently. Demonstrated analytical skills. Demonstrated project management skills. Demonstrates a high level of accuracy, even under pressure. Demonstrates excellent judgment and decision-making skills. Ability to communicate and make recommendations to leadership. Ability to drive multiple projects to successful completion. Possesses technical aptitude. Excellent verbal and written communication skills, communicate complex findings in a clear and understandable manner Excellent facilitation ability to host sessions and elicit ideas from others, understanding their issues and encourage group participation Attention to detail. Communicate complex findings in a clear and understandable manner Collaborate effectively with cross-functional teams Adapt to changing priorities and thrive in a dynamic environment Functional Accountabilities Designs, develops and delivers statistical and/or machine learning models that solve business problems and work with engineers to make them production ready. Develops, utilize and monitor end-to-end machine learning pipeline from data ETL to model delivery for product ionization. Leads rapid prototyping for new business problems to support feasibility analysis for AI products. Shares complex ideas verbally and visually wit broad audience from technical and non-technical background Builds and adopts solutions to automate and integrate data science processes. Researches latest and best solutions to solve data challenges at hand. Interprets and communicates results of complex models with cross functional team and the stakeholders. Generate internal implementations to achieve results. Work closely with the software engineering teams to drive scalable, production ready implementations. Collaborate with teams across the organization. Document technical work as part of the production deployment process. Contribute to our evolving cloud infrastructure and data engineering pipeline. Contribute to scientific software engineering efforts utilizing professional coding standards. Collaborate with business partners to develop new models and concepts for continuous improvement. Performs other duties as assigned. Complies with all policies and standards. Organizational Accountabilities Organizational Accountabilities (Staff) Organizational Commitment/Identification Anticipate and responds to customer needs; follows up until needs are met Teamwork/Communication Demonstrate collaborative and respectful behavior Partner with all team members to achieve goals Receptive to others' ideas and opinions Performance Improvement/Problem-solving Contribute to a positive work environment Demonstrate flexibility and willingness to change Identify opportunities to improve clinical and administrative processes Make appropriate decisions, using sound judgment Cost Management/Financial Responsibility Use resources efficiently Search for less costly ways of doing things Safety Speak up when team members appear to exhibit unsafe behavior or performance Continuously validate and verify information needed for decision making or documentation Stop in the face of uncertainty and takes time to resolve the situation Demonstrate accurate, clear and timely verbal and written communication Actively promote safety for patients, families, visitors and co-workers Attend carefully to important details - practicing Stop, Think, Act and Review in order to self-check behavior and performance Primary Location : District of Columbia-Washington Work Locations : Research & Innovation Campus 7144 13th Place NW Washington 20012 Job : Information Technology Organization : Ctr Cancer & Immunology Rsrch Position Status : R (Regular) - FT - Full-Time Shift : Day Work Schedule : 9:00-5:30 PM Job Posting : Mar 27, 2025, 6:13:13 PM Full-Time Salary Range : 109116.8 - 181854.4