Death by Uptime

Death by Uptime

0 Anmeldelser
0
Episode
166 of 166
Længde
1T
Sprog
Engelsk
Format
Kategori
Fakta

We hit a new (and disturbing!) failure mode recently when a production rack that had been up for several months saw every (!) compute sled's service processor become simultaneously unresponsive. Bryan and Adam were joined by the members of the Oxide team who debugged the vexing issue -- and reached its surprising root cause.

In addition to Bryan Cantrill and Adam Leventhal, we were joined by Oxide colleagues, Cliff Biffle, Matt Keeter, and Will Chandler.

Previously, on Oxide and Friends:

OxF s05e03 – Holistic Engineering with Robert MustacchiOxF s04e14 – Rebooting a datacenter: A decade laterOxF s01e26 – The Pragmatism of HubrisOxF s05e20 – Debugger-Driven Development • (omdb) OxF s05e07 – Transparency in Hardware/Software InterfacesOxF s05e31 – FuturelockOxF s05e33 – A Grown-up ZFS Data Corruption Bug Some of the topics we hit on, in the order that we hit them:

hubris #2304: STM32H7 Ethernet driver stops yielding CPU after many packetsgist — Summarizing the Hubris side of investigationsMatt's blog: Hunting a spooky ethernet driver bug If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!


Lyt når som helst, hvor som helst

Nyd den ubegrænsede adgang til tusindvis af spændende e- og lydbøger - helt gratis

  • Lyt og læs så meget du har lyst til
  • Opdag et kæmpe bibliotek fyldt med fortællinger
  • Eksklusive titler + Mofibo Originals
  • Opsig når som helst
Prøv nu
DK - Details page - Device banner - 894x1036
Cover for Death by Uptime

Other podcasts you might like ...