3h ago

Research Scientist, Interpretability

San Francisco, CA
full-timeseniorArtificial Intelligence

Tech Stack

Description

You will join Anthropic's Interpretability team to reverse-engineer how trained neural networks work, focusing on mechanistic interpretability to make AI systems safe and trustworthy. Your work involves discovering how neural network parameters map to meaningful algorithms.

Requirements

  • Strong background in neural networks and interpretability
  • Ability to conduct research in mechanistic interpretability
  • Experience with publications in interpretability or related fields
  • Proficiency in programming and model analysis

Responsibilities

  • Reverse-engineer how trained neural networks work
  • Develop mechanistic understanding of AI systems
  • Apply interpretability methods to improve AI safety
  • Build tools for analyzing neural network internals
0 views 0 saves 0 applications