6 days ago
Senior Data Engineer, Drug Discovery Data Engineering
South San Francisco, CA
$217,000-$229,000 / year
full-timeseniorBiotechnology/Pharmaceuticals
Tech Stack
Description
You will join as the founding member of the Drug Discovery Data Engineering group, acting as a technical bridge between scientific teams. You will drive projects from requirements-gathering to production deployment, engineering high-performance data systems that integrate with molecular databases, inventory systems, and internal platforms. Your work will accelerate the drug discovery feedback loop by defining data flows and building tools for stakeholder review.
Requirements
- BS/MS/PhD in Computer Science, Data Science, or a related technical field, or equivalent practical experience
- 5+ years of professional software or data engineering experience on the small molecule and antibody informatics side of pharmaceutical R D
- Proficiency in applying laboratory informatics systems such as CDD Vault, Titian Mosaic, and Benchling to the drug discovery process
- Fluency in Python with a strong grasp of software and data engineering principles (testing, modularity, design patterns, data modeling)
- Demonstrated experience developing and deploying cloud-based applications on Google Cloud Platform (GCP) (preferred), AWS, or Azure
- Strong experience with modern web frameworks and infrastructure, specifically FastAPI, React, Kubernetes, and Terraform
- Proven ability to lead complex projects involving diverse stakeholders (e.g. both bench scientists and Machine Learning engineers) from concept to production
- Experience enforcing robust data governance policies and compliance with internal information security standards and best practices
- Must be willing to work onsite at least four days per week
Responsibilities
- End-to-End Project Ownership: Collaborating with scientists in Assay Technology, Medicinal Chemistry, and Protein Sciences to gather requirements, architect solutions, and deploy production-grade software that facilitates data movement and analysis
- System Integration: Designing and implementing robust integrations between internal pipelines and third-party platforms, specifically the CDD molecular database, Mosaic inventory systems, and Benchling ELN
- Data Flow Architecture: Defining and optimizing data flows across the organization (e.g., ensuring seamless data handover from Machine Learning - Protein Sciences - Assay Technologies) to accelerate the drug discovery feedback loop
- Full-Stack Tool Development: Developing data systems and internal web applications (using React and Python) that allow stakeholders to review, visualize, and communicate complex scientific data
- Mentorship Leadership: Serving as a senior technical voice within a larger Engineering team; providing mentorship to junior engineers across Calico and helping onboard future hires into the Drug Discovery Data Engineering team
- Engineering Excellence: Championing best practices for infrastructure-as-code, CI/CD, and containerization while helping to set standards for data engineering at Calico
0 views 0 saves 0 applications