4h ago
Data Infrastructure Engineer
Berkeley, CA
full-timesenior Hybridbiotechnology
Tech Stack
Description
You will design, build, and maintain the data systems connecting our nanopore sequencing instruments to analysis and insight, transforming raw instrument output into clean, queryable datasets and enabling scientists to self-serve on routine analyses.
Requirements
- MS or PhD in Computer Science, Bioinformatics, Computational Biology, Data Engineering, or related field
- 4+ years of hands-on infrastructure engineering experience with multiomics datasets
- Experience building and maintaining bioinformatics or scientific data pipelines (Nextflow, Snakemake, or equivalent)
Responsibilities
- Own and extend end-to-end Nextflow pipelines on AWS (Seqera Platform) for nanopore sequencing data processing
- Build metadata-driven pipeline orchestration with sample sheets, automated run naming, and integration with Jira and Confluence
- Automate generation of standard analysis outputs (QC metrics, classification reports, signal diagnostics) for every sequencing run
- Design and implement a data model and schema for nanopore sequencing data, building ETL workflows for a centralized data lake on AWS
- Deploy and maintain data visualization tools for scientists to explore sequencing metrics independently
0 views 0 saves 0 applications