Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Af
- Nathan Labenz
- Erik Torenberg
Afsnit
- 328
Udgivet
- 5. mar. 2026
Forlag
- Turpentine

0 Anmeldelser: 0
Afsnit: 328 of 360
Længde: 1T 49M
Sprog: Engelsk
Format
Kategori: Økonomi & Business

Dan Balsam and Tom McGrath from Goodfire return to explore the frontier of mechanistic interpretability and their new research pillar, Intentional Design. They explain the shift from sparse autoencoders to understanding geometric structure in latent spaces, and share a proof-of-concept method for reducing hallucinations using probes and RL. The conversation tackles concerns about reward hacking, principles for shaping the loss landscape instead of fighting backprop, and what this means for aligning powerful models. They also discuss recent Goodfire results on Alzheimer’s prediction, disentangling memorization vs reasoning weights, and how they balance commercial growth with a public benefit mission.

Nathan uses Granola to uncover blind spots in conversations and AI research. Try it at granola.ai/tcr with code TCR — and if you’re already using it, test his blind spot recipe here: https://bit.ly/granolablindspot

LINKS:

Detecting PII for Rakuten

Interpretability for Alzheimer's biomarker detection

You and Your Research Agent

Adversarial examples and superposition

Discovering rare behaviors with model diff

Priors in time for interpretability

Belief dynamics in in-context learning

Mixing mechanisms in language models

Sparse autoencoder scaling with manifolds

Sponsors:

VCX:

VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com

Claude:

Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro’s full capabilities at https://claude.ai/tcr

Serval:

Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week 4 at https://serval.com/cognitive

Tasklet:

Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai

PRODUCED BY:

https://aipodcast.ing

Forrige episode Næste episode

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Lyt når som helst, hvor som helst

Other podcasts you might like ...