Job Description
Position Overview
We are seeking an accomplished
Senior Manager / Lead Data Scientist
to lead a high-performing team of data scientists and engineers focused on clinical data standardization, ETL workflows, and regulatory-ready data products. This leadership role requires deep expertise in
CDISC standards (SDTM, ADaM, TLFs) ,
OMOP Common Data Model , and
genomic variant data , combined with proven ability to guide technical teams, architect scalable ETL pipelines, and ensure regulatory compliance across real-world data (RWD), EHR systems, and clinical trial datasets.
The ideal candidate will drive the strategic direction of our clinical data operations, mentor a diverse team of data professionals, and serve as the primary technical authority for OMOP/SDTM transformations and regulatory submissions to agencies such as the FDA and PMDA.
Key ResponsibilitiesLeadership & Strategy
·
Lead, mentor, and develop a team of data scientists, data engineers, and analysts working on clinical data standardization and ETL workflows
·
Define and execute the technical roadmap for OMOP and CDISC-compliant data pipelines, ensuring alignment with business objectives and regulatory requirements
·
Foster a culture of technical excellence, continuous improvement, and collaborative problem-solving across multidisciplinary teams
·
Partner with senior leadership to shape data strategy for precision medicine, regulatory submissions, and real-world evidence generation
·
Drive adoption of best practices in metadata-driven automation, reproducible workflows, and quality assurance frameworks
Technical Architecture & Delivery
·
Design and oversee end-to-end ETL architectures for converting heterogeneous clinical, EHR, and real-world data sources into
OMOP CDM ,
SDTM ,
ADaM , and
TLF
formats
·
Establish and maintain production-grade pipelines using open-source workflow orchestration tools (Airflow, Prefect, Nextflow, Luigi) and proprietary systems (SAS DI, Informatica, cloud-native platforms)
·
Champion the use of
OHDSI tools
(WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, Achilles, DataQualityDashboard) for OMOP transformations and quality validation
·
Ensure adherence to
CDISC 360
metadata standards, Define.xml generation, controlled terminology management, and SDTM/ADaM conformance
·
Implement robust data quality, validation, and reconciliation processes across all stages of ETL, leveraging Pinnacle 21 and custom QC frameworks
Regulatory & Compliance
·
Serve as the subject matter expert for regulatory submission-ready datasets, ensuring timely and accurate delivery of SDTM/ADaM/TLFs to FDA, EMA, and PMDA
·
Collaborate with biostatistics, clinical operations, regulatory affairs, and quality assurance teams to meet submission timelines and compliance standards
·
Provide expert guidance on data privacy, security, and governance in alignment with HIPAA, GDPR, ICH GCP, and ISO 27001/27701 standards
·
Review and approve Define.xml, Reviewer's Guides, aCRFs, and other submission documentation for regulatory packages
Genomic & Variant Data Specialization
·
Lead initiatives for curating, harmonizing, and annotating genomic variant datasets from public and proprietary sources (ClinVar, ClinGen, HGMD, CADD, gnomAD, dbSNP, COSMIC, refSeq, REVEL)
·
Oversee ETL pipelines for mapping VCF annotation files to OMOP genomic tables and CDISC submission formats
·
Ensure quality control of variant annotations, reference genome build consistency (GRCh37/38), and adherence to HGVS nomenclature
·
Stay current with emerging variant annotation standards, genomic data formats (VCF, BED, GFF), and translational research methodologies
Stakeholder Engagement
·
Act as the primary liaison between technical teams, clinical operations, statistical programming, and external partners on data standards and interoperability
·
Translate complex technical challenges into business-friendly solutions and communicate risks, trade-offs, and opportunities to senior stakeholders
·
Represent the organization in industry forums, CDISC working groups, OHDSI community events, and regulatory interactions
Required QualificationsEducation
·
Ph.D.
in Bioinformatics, Health Informatics, Computational Biology, Genomics, Biomedical Engineering, Clinical Data Science, or related quantitative field
·
M.S.
with exceptional leadership track record and 7+ years of relevant experience may be considered
Experience
·
7+ years
of progressive experience in clinical data science, bioinformatics, or health data engineering roles
·
3+ years
in leadership or team lead capacity, managing cross-functional technical teams (data scientists, engineers, analysts)
·
Proven track record of delivering
regulatory-ready SDTM/ADaM datasets
for FDA/EMA/PMDA submissions
·
Deep hands-on experience with
OMOP CDM
and
OHDSI ecosystem
(WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles)
·
Extensive experience building and maintaining
production ETL pipelines
for clinical trials, RWD, EHR, and genomic data
·
Demonstrated expertise in
CDISC standards
(SDTM, ADaM) and associated documentation (Define.xml, Reviewer's Guides, aCRF)
Technical Skills (Core)
·
Programming & Scripting:
Expert-level proficiency in Python, R, SQL; strong working knowledge of SAS (Base, Macro, Studio)
·
ETL & Workflow Orchestration:
Hands-on experience with Airflow, Nextflow, Prefect, Luigi, dbt, or equivalent platforms
·
Clinical Data Standards:
OMOP CDM, CDISC SDTM, ADaM, controlled terminologies (MedDRA, SNOMED CT, LOINC, RxNorm, ICD-10)
·
OHDSI Tools:
WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles, DataQualityDashboard
·
Genomic Data:
VCF, BED, GFF formats; reference genomes (GRCh37/38); HGVS nomenclature; variant annotation databases
·
Data Quality & Validation:
Pinnacle 21, custom QC frameworks, automated testing, Define.xml validation
·
Cloud & Databases:
SQL (PostgreSQL, MySQL, SQL Server), cloud platforms (AWS, GCP, Azure), data warehousing concepts
·
Version Control & DevOps:
Git/GitHub/GitLab, CI/CD pipelines, Docker, Kubernetes (basic understanding)
Domain Knowledge
·
In-depth understanding of
clinical trials ,
real-world evidence studies ,
precision medicine , and
translational research
·
Knowledge of
ontologies and controlled vocabularies
(ClinVar terms, Sequence Ontology, HPO, OMIM)
·
Familiarity with
cohort-building tools
(ATLAS, i2b2, TriNetX) and
EHR/claims data structures
·
Understanding of
data harmonization, linkage, and interoperability
across heterogeneous sources
·
Awareness of
HL7 FHIR ,
DICOM , and other health data exchange standards
Leadership & Soft Skills
·
Proven ability to
lead, mentor, and develop
technical teams, with emphasis on coaching junior and mid-level data scientists
·
Strong
strategic thinking
and ability to translate business needs into technical solutions
·
Excellent
communication and presentation skills , with experience presenting to executive leadership and regulatory authorities
·
Collaborative mindset , capable of working across functions (clinical, biostatistics, IT, regulatory, quality)
·
Problem-solving mentality , detail-oriented, and committed to data integrity and quality excellence
·
Fluent in English; additional languages a plus
Preferred Qualifications
·
Certifications:
CDISC SDTM/ADaM training certification; HL7 FHIR Proficiency; AWS Certified Solutions Architect / GCP Professional Data Engineer / Azure Data Engineer Associate
·
Statistical Programming:
Experience with SAS statistical procedures, double programming workflows, TLF shell development
·
NLP & AI:
Exposure to natural language processing applications on clinical narratives, adverse event coding, or generative AI for SDTM/ADaM automation
·
Data Visualization:
Proficiency in Tableau, Power BI, or custom dashboards (Plotly, Shiny) for stakeholder reporting
Keywords
OMOP, SDTM, ADaM, TLFs, CDISC, OHDSI, ETL, EHR, RWD, Clinical Trials, Regulatory Submissions, FDA, PMDA, WhiteRabbit, Rabbit-in-a-Hat, Pinnacle 21, Genomic Variants, VCF, HGVS, ClinVar, Python, R, SAS, SQL, Airflow, Nextflow, Define.xml, Data Quality, Team Leadership, Bioinformatics, Precision Medicine