#503: The PyArrow Revolution

0 Anmeldelser
0
Episode
502 of 505
Længde
1T 8M
Sprog
Engelsk
Format
Kategori
Fakta

Pandas is at a the core of virtually all data science done in Python, that is virtually all data science. Since it's beginning, Pandas has been based upon numpy. But changes are afoot to update those internals and you can now optionally use PyArrow. PyArrow comes with a ton of benefits including it's columnar format which makes answering analytical questions faster, support for a range of high performance file formats, inter-machine data streaming, faster file IO and more. Reuven Lerner is here to give us the low-down on the PyArrow revolution.

Episode sponsors

NordLayer

Auth0

Talk Python Courses

Links from the show Reuven: github.com/reuven

Apache Arrow: github.com

Parquet: parquet.apache.org

Feather format: arrow.apache.org

Python Workout Book (45% off with code talkpython45): manning.com

Pandas Workout Book (45% off with code talkpython45): manning.com

Pandas: pandas.pydata.org

PyArrow CSV docs: arrow.apache.org

Future string inference in Pandas: pandas.pydata.org

Pandas NA/nullable dtypes: pandas.pydata.org

Pandas `.iloc` indexing: pandas.pydata.org

DuckDB: duckdb.org

Pandas user guide: pandas.pydata.org

Pandas GitHub issues: github.com

Watch this episode on YouTube: youtube.com

Episode transcripts: talkpython.fm

--- Stay in touch with us ---

Subscribe to Talk Python on YouTube: youtube.com

Talk Python on Bluesky: @talkpython.fm at bsky.app

Talk Python on Mastodon: talkpython

Michael on Bluesky: @mkennedy.codes at bsky.app

Michael on Mastodon: mkennedy


Lyt når som helst, hvor som helst

Nyd den ubegrænsede adgang til tusindvis af spændende e- og lydbøger - helt gratis

  • Lyt og læs så meget du har lyst til
  • Opdag et kæmpe bibliotek fyldt med fortællinger
  • Eksklusive titler + Mofibo Originals
  • Opsig når som helst
Prøv nu
DK - Details page - Device banner - 894x1036

Other podcasts you might like ...