Lyt når som helst, hvor som helst

Nyd den ubegrænsede adgang til tusindvis af spændende e- og lydbøger - helt gratis

  • Lyt og læs så meget du har lyst til
  • Opdag et kæmpe bibliotek fyldt med fortællinger
  • Eksklusive titler + Mofibo Originals
  • Opsig når som helst
Start tilbuddet
DK - Details page - Device banner - 894x1036

An Architecture for Fast and General Data Processing on Large Clusters

Serier

1 of 64

Sprog
Engelsk
Format
Kategori

Fakta

The past few years have seen a major change in computing systems, as growing data volumes and stalling processor speeds require more and more applications to scale out to clusters. Today, a myriad data sources, from the Internet to business operations to scientific instruments, produce large and valuable data streams. However, the processing capabilities of single machines have not kept up with the size of data. As a result, organizations increasingly need to scale out their computations over clusters.

At the same time, the speed and sophistication required of data processing have grown. In addition to simple queries, complex algorithms like machine learning and graph analysis are becoming common. And in addition to batch processing, streaming analysis of real-time data is required to let organizations take timely action. Future computing platforms will need to not only scale out traditional workloads, but support these new applications too.

This book, a revised version of the 2014 ACM Dissertation Award winning dissertation, proposes an architecture for cluster computing systems that can tackle emerging data processing workloads at scale. Whereas early cluster computing systems, like MapReduce, handled batch processing, our architecture also enables streaming and interactive queries, while keeping MapReduce's scalability and fault tolerance. And whereas most deployed systems only support simple one-pass computations (e.g., SQL queries), ours also extends to the multi-pass algorithms required for complex analytics like machine learning. Finally, unlike the specialized systems proposed for some of these workloads, our architecture allows these computations to be combined, enabling rich new applications that intermix, for example, streaming and batch processing.

We achieve these results through a simple extension to MapReduce that adds primitives for data sharing, called Resilient Distributed Datasets (RDDs). We show that this is enough to capture a wide range of workloads. We implement RDDs in the open source Spark system, which we evaluate using synthetic and real workloads. Spark matches or exceeds the performance of specialized systems in many domains, while offering stronger fault tolerance properties and allowing these workloads to be combined. Finally, we examine the generality of RDDs from both a theoretical modeling perspective and a systems perspective.

This version of the dissertation makes corrections throughout the text and adds a new section on the evolution of Apache Spark in industry since 2014. In addition, editing, formatting, and links for the references have been added.

© 2016 ACM Books (E-bog): 9781970001587

Release date

E-bog: 1. maj 2016

Andre kan også lide...

  1. Ultimate Microservices with RabbitMQ Peter Morlion
  2. Graph Data Science with Python and Neo4j Timothy Eastridge
  3. Tableau Training Manual Version 9.0 Advanced Adolph Barclift
  4. Mastering Data Engineering and Analytics with Databricks Manoj Kumar
  5. Ultimate Parallel and Distributed Computing with Julia For Data Science Nabanita Dash
  6. Individual-based Modeling and Ecology Steven F. Railsback
  7. Microsoft Teams For Dummies Rosemarie Withee
  8. Ultimate Excel with Power Query and ChatGPT Crispo Mwangi (MVP)
  9. How to Write an Exceptional Thesis or Dissertation: A Step-by-Step Guide from Proposal to Successful Defense J S Graustein
  10. Automate This: How Algorithms Came to Rule Our World Christopher Steiner
  11. Anxiety and Depression Allan Eastman
  12. The Princeton Companion to Mathematics Timothy Gowers
  13. Business Model Development Introbooks Team
  14. The Art of Rhetoric Aristotle
  15. Win with Advanced Business Analytics: Creating Business Value from Your Data Jean Paul Isson
  16. Data Management Introbooks Team
  17. Never Cry Wolf Farley Mowat
  18. Gut Well Soon: A Practical Guide to a Healthier Body and a Happier Mind Catherine Rogers
  19. How To Talk To Anyone: 51 Easy Conversation Topics You Can Use to Talk to Anyone Effortlessly James W. Williams
  20. The Odd Quantum Sam Treiman
  21. The Emperor's Handbook: A New Translation of The Meditations Marcus Aurelius
  22. Drawing Made Easy: A Stage by Stage Guide to Drawing Skills Barrington Barber
  23. The McKinsey Way Ethan M. Rasiel
  24. Becoming Lean Richard Keegan
  25. Wealth, Poverty, and Politics: An International Perspective Thomas Sowell
  26. A Joosr Guide to... Deep Work by Cal Newport: Rules for Focused Success in a Distracted World Joosr
  27. The Life-Changing Power of Gratitude: 7 Simple Exercises that will Change Your Life for the Better. Includes a 3 Month Gratitude Journal Marc Reklau
  28. Critical Thinking Skills For Dummies Martin Cohen
  29. Machiavelli: Philosophy in an Hour Paul Strathern
  30. Career and Family: Women's Century-Long Journey toward Equity Claudia Goldin

Vælg dit abonnement

  • Over 600.000 titler

  • Download og nyd titler offline

  • Eksklusive titler + Mofibo Originals

  • Børnevenligt miljø (Kids Mode)

  • Det er nemt at opsige når som helst

Flex

For dig som vil prøve Mofibo.

89 kr. /måned
  • 1 konto

  • 20 timer/måned

  • Gem op til 100 ubrugte timer

  • Eksklusivt indhold hver uge

  • Fri lytning til podcasts

  • Ingen binding

Prøv gratis
Den mest populære

Premium

For dig som lytter og læser ofte.

129 kr. /måned
  • 1 konto

  • 100 timer/måned

  • Eksklusivt indhold hver uge

  • Fri lytning til podcasts

  • Ingen binding

Start tilbuddet

Unlimited

For dig som lytter og læser ubegrænset.

149 kr. /måned
  • 1 konto

  • Ubegrænset adgang

  • Eksklusivt indhold hver uge

  • Fri lytning til podcasts

  • Ingen binding

Start tilbuddet

Family

For dig som ønsker at dele historier med familien.

Fra 179 kr. /måned
  • 2-6 konti

  • 100 timer/måned pr. konto

  • Fri lytning til podcasts

  • Kun 39 kr. pr. ekstra konto

  • Ingen binding

2 konti

179 kr. /måned
Prøv gratis