Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

0 Anmeldelser
0
Episode
328 of 326
Længde
1T 49M
Sprog
Engelsk
Format
Kategori
Økonomi & Business

Dan Balsam and Tom McGrath from Goodfire return to explore the frontier of mechanistic interpretability and their new research pillar, Intentional Design. They explain the shift from sparse autoencoders to understanding geometric structure in latent spaces, and share a proof-of-concept method for reducing hallucinations using probes and RL. The conversation tackles concerns about reward hacking, principles for shaping the loss landscape instead of fighting backprop, and what this means for aligning powerful models. They also discuss recent Goodfire results on Alzheimer’s prediction, disentangling memorization vs reasoning weights, and how they balance commercial growth with a public benefit mission.

Nathan uses Granola to uncover blind spots in conversations and AI research. Try it at granola.ai/tcr with code TCR — and if you’re already using it, test his blind spot recipe here: https://bit.ly/granolablindspot

LINKS:

Detecting PII for Rakuten

Interpretability for Alzheimer's biomarker detection

You and Your Research Agent

Adversarial examples and superposition

Discovering rare behaviors with model diff

Priors in time for interpretability

Belief dynamics in in-context learning

Mixing mechanisms in language models

Sparse autoencoder scaling with manifolds

Sponsors:

VCX:

VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com

Claude:

Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro’s full capabilities at https://claude.ai/tcr

Serval:

Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week 4 at https://serval.com/cognitive

Tasklet:

Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai

PRODUCED BY:

https://aipodcast.ing


Lyt når som helst, hvor som helst

Nyd den ubegrænsede adgang til tusindvis af spændende e- og lydbøger - helt gratis

  • Lyt og læs så meget du har lyst til
  • Opdag et kæmpe bibliotek fyldt med fortællinger
  • Eksklusive titler + Mofibo Originals
  • Opsig når som helst
Prøv nu
DK - Details page - Device banner - 894x1036
Cover for Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Other podcasts you might like ...