Research Scientist, Interpretability at Jobs at Anthropic

3h ago

Research Scientist, Interpretability

San Francisco, CA

full-timeseniorArtificial Intelligence

Tech Stack

Description

You will join Anthropic's Interpretability team to reverse-engineer how trained neural networks work, focusing on mechanistic interpretability to make AI systems safe and trustworthy. Your work involves discovering how neural network parameters map to meaningful algorithms.

Requirements

Strong background in neural networks and interpretability
Ability to conduct research in mechanistic interpretability
Experience with publications in interpretability or related fields
Proficiency in programming and model analysis

Responsibilities

Reverse-engineer how trained neural networks work
Develop mechanistic understanding of AI systems
Apply interpretability methods to improve AI safety
Build tools for analyzing neural network internals

Jobs at Anthropic

Other jobs at Jobs at Anthropic

No other jobs found.

0 views 0 saves 0 applications