Home Job Listings Categories Locations

Senior Manager / Lead Data Scientist (Clinical Data Standardization & ETL Operations)

📍 India

information services Molecular Connections

Job Description

Position Overview We are seeking an accomplished

Senior Manager / Lead Data Scientist

to lead a high-performing team of data scientists and engineers focused on clinical data standardization, ETL workflows, and regulatory-ready data products. This leadership role requires deep expertise in

CDISC standards (SDTM, ADaM, TLFs) ,

OMOP Common Data Model , and

genomic variant data , combined with proven ability to guide technical teams, architect scalable ETL pipelines, and ensure regulatory compliance across real-world data (RWD), EHR systems, and clinical trial datasets. The ideal candidate will drive the strategic direction of our clinical data operations, mentor a diverse team of data professionals, and serve as the primary technical authority for OMOP/SDTM transformations and regulatory submissions to agencies such as the FDA and PMDA.

Key ResponsibilitiesLeadership & Strategy ·

Lead, mentor, and develop a team of data scientists, data engineers, and analysts working on clinical data standardization and ETL workflows ·

Define and execute the technical roadmap for OMOP and CDISC-compliant data pipelines, ensuring alignment with business objectives and regulatory requirements ·

Foster a culture of technical excellence, continuous improvement, and collaborative problem-solving across multidisciplinary teams ·

Partner with senior leadership to shape data strategy for precision medicine, regulatory submissions, and real-world evidence generation ·

Drive adoption of best practices in metadata-driven automation, reproducible workflows, and quality assurance frameworks

Technical Architecture & Delivery ·

Design and oversee end-to-end ETL architectures for converting heterogeneous clinical, EHR, and real-world data sources into

OMOP CDM ,

SDTM ,

ADaM , and

TLF

formats ·

Establish and maintain production-grade pipelines using open-source workflow orchestration tools (Airflow, Prefect, Nextflow, Luigi) and proprietary systems (SAS DI, Informatica, cloud-native platforms) ·

Champion the use of

OHDSI tools

(WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, Achilles, DataQualityDashboard) for OMOP transformations and quality validation ·

Ensure adherence to

CDISC 360

metadata standards, Define.xml generation, controlled terminology management, and SDTM/ADaM conformance ·

Implement robust data quality, validation, and reconciliation processes across all stages of ETL, leveraging Pinnacle 21 and custom QC frameworks

Regulatory & Compliance ·

Serve as the subject matter expert for regulatory submission-ready datasets, ensuring timely and accurate delivery of SDTM/ADaM/TLFs to FDA, EMA, and PMDA ·

Collaborate with biostatistics, clinical operations, regulatory affairs, and quality assurance teams to meet submission timelines and compliance standards ·

Provide expert guidance on data privacy, security, and governance in alignment with HIPAA, GDPR, ICH GCP, and ISO 27001/27701 standards ·

Review and approve Define.xml, Reviewer's Guides, aCRFs, and other submission documentation for regulatory packages

Genomic & Variant Data Specialization ·

Lead initiatives for curating, harmonizing, and annotating genomic variant datasets from public and proprietary sources (ClinVar, ClinGen, HGMD, CADD, gnomAD, dbSNP, COSMIC, refSeq, REVEL) ·

Oversee ETL pipelines for mapping VCF annotation files to OMOP genomic tables and CDISC submission formats ·

Ensure quality control of variant annotations, reference genome build consistency (GRCh37/38), and adherence to HGVS nomenclature ·

Stay current with emerging variant annotation standards, genomic data formats (VCF, BED, GFF), and translational research methodologies

Stakeholder Engagement ·

Act as the primary liaison between technical teams, clinical operations, statistical programming, and external partners on data standards and interoperability ·

Translate complex technical challenges into business-friendly solutions and communicate risks, trade-offs, and opportunities to senior stakeholders ·

Represent the organization in industry forums, CDISC working groups, OHDSI community events, and regulatory interactions

Required QualificationsEducation ·

Ph.D.

in Bioinformatics, Health Informatics, Computational Biology, Genomics, Biomedical Engineering, Clinical Data Science, or related quantitative field ·

M.S.

with exceptional leadership track record and 7+ years of relevant experience may be considered

Experience ·

7+ years

of progressive experience in clinical data science, bioinformatics, or health data engineering roles ·

3+ years

in leadership or team lead capacity, managing cross-functional technical teams (data scientists, engineers, analysts) ·

Proven track record of delivering

regulatory-ready SDTM/ADaM datasets

for FDA/EMA/PMDA submissions ·

Deep hands-on experience with

OMOP CDM

and

OHDSI ecosystem

(WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles) ·

Extensive experience building and maintaining

production ETL pipelines

for clinical trials, RWD, EHR, and genomic data ·

Demonstrated expertise in

CDISC standards

(SDTM, ADaM) and associated documentation (Define.xml, Reviewer's Guides, aCRF)

Technical Skills (Core) ·

Programming & Scripting:

Expert-level proficiency in Python, R, SQL; strong working knowledge of SAS (Base, Macro, Studio) ·

ETL & Workflow Orchestration:

Hands-on experience with Airflow, Nextflow, Prefect, Luigi, dbt, or equivalent platforms ·

Clinical Data Standards:

OMOP CDM, CDISC SDTM, ADaM, controlled terminologies (MedDRA, SNOMED CT, LOINC, RxNorm, ICD-10) ·

OHDSI Tools:

WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles, DataQualityDashboard ·

Genomic Data:

VCF, BED, GFF formats; reference genomes (GRCh37/38); HGVS nomenclature; variant annotation databases ·

Data Quality & Validation:

Pinnacle 21, custom QC frameworks, automated testing, Define.xml validation ·

Cloud & Databases:

SQL (PostgreSQL, MySQL, SQL Server), cloud platforms (AWS, GCP, Azure), data warehousing concepts ·

Version Control & DevOps:

Git/GitHub/GitLab, CI/CD pipelines, Docker, Kubernetes (basic understanding)

Domain Knowledge ·

In-depth understanding of

clinical trials ,

real-world evidence studies ,

precision medicine , and

translational research ·

Knowledge of

ontologies and controlled vocabularies

(ClinVar terms, Sequence Ontology, HPO, OMIM) ·

Familiarity with

cohort-building tools

(ATLAS, i2b2, TriNetX) and

EHR/claims data structures ·

Understanding of

data harmonization, linkage, and interoperability

across heterogeneous sources ·

Awareness of

HL7 FHIR ,

DICOM , and other health data exchange standards

Leadership & Soft Skills ·

Proven ability to

lead, mentor, and develop

technical teams, with emphasis on coaching junior and mid-level data scientists ·

Strong

strategic thinking

and ability to translate business needs into technical solutions ·

Excellent

communication and presentation skills , with experience presenting to executive leadership and regulatory authorities ·

Collaborative mindset , capable of working across functions (clinical, biostatistics, IT, regulatory, quality) ·

Problem-solving mentality , detail-oriented, and committed to data integrity and quality excellence ·

Fluent in English; additional languages a plus

Preferred Qualifications ·

Certifications:

CDISC SDTM/ADaM training certification; HL7 FHIR Proficiency; AWS Certified Solutions Architect / GCP Professional Data Engineer / Azure Data Engineer Associate ·

Statistical Programming:

Experience with SAS statistical procedures, double programming workflows, TLF shell development ·

NLP & AI:

Exposure to natural language processing applications on clinical narratives, adverse event coding, or generative AI for SDTM/ADaM automation ·

Data Visualization:

Proficiency in Tableau, Power BI, or custom dashboards (Plotly, Shiny) for stakeholder reporting

Keywords OMOP, SDTM, ADaM, TLFs, CDISC, OHDSI, ETL, EHR, RWD, Clinical Trials, Regulatory Submissions, FDA, PMDA, WhiteRabbit, Rabbit-in-a-Hat, Pinnacle 21, Genomic Variants, VCF, HGVS, ClinVar, Python, R, SAS, SQL, Airflow, Nextflow, Define.xml, Data Quality, Team Leadership, Bioinformatics, Precision Medicine

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Job Details

Posted Date: February 25, 2026
Job Type: information services
Location: India
Company: Molecular Connections

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.