3h ago
Research Scientist, Interpretability
San Francisco, CA
full-timeseniorArtificial Intelligence
Tech Stack
Description
You will join Anthropic's Interpretability team to reverse-engineer how trained neural networks work, focusing on mechanistic interpretability to make AI systems safe and trustworthy. Your work involves discovering how neural network parameters map to meaningful algorithms.
Requirements
- Strong background in neural networks and interpretability
- Ability to conduct research in mechanistic interpretability
- Experience with publications in interpretability or related fields
- Proficiency in programming and model analysis
Responsibilities
- Reverse-engineer how trained neural networks work
- Develop mechanistic understanding of AI systems
- Apply interpretability methods to improve AI safety
- Build tools for analyzing neural network internals
0 views 0 saves 0 applications