top of page
Writer's pictureAnastasia Khomyakova

Retrospective of Spark at SBTB. 2019

Updated: Jun 30, 2023



Blast from the past!


🎉 Announcing Retrospective of Spark at SBTB week!


Today we suggest you dive into 2019's last in-person gathering, unlock some valuable insights, and explore lessons learned.


Let's relive the magic of the Spark! ⚡️


While preparing for the 2019 edition, we interviewed Russell Spitzer and asked him what's the biggest thing that is misunderstood about Spark? He belived that the key misconception is that Spark guarantees instant speed boosts, but its true strength lies in effectively processing distributed data; for single-machine tasks, stick to one machine. And later, in his talk Russell Spitzer discussed three trends shaping the industry's future and whether #Spark may be the right fit for one's data problems. So, "To Spark or not to spark" you can learn from his video and also enjoy the whole interview here.




Two incredible engineers Prashant Sharma & Nick Pentreath from IBM explored how to deploy an end-to-end ML pipeline with Apache Spark Streaming and Kubernetes, utilizing Structured Streaming for large-scale, real-time machine learning. Discovered how to preprocess data, host models using IBM MAX, and scale pipelines with Spark and Kubernetes. This talk included a live demo of real-time object detection in images. Key takeaways: reusing ML models with IBM MAX, and scaling online ML applications with Spark and Kubernetes. Demo code


Spark & Delta Lake? Sure! Michael Armbrust showed that Delta Lake open-source storage layer offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing and explored Delta Lake Advantages: - ACID transactions on Spark: Ensuring consistent data for readers. - Scalable metadata handling: Managing petabyte-scale tables with ease. - Streaming & batch unification: Simplifying data ingest, backfill, and queries. - Schema enforcement: Preventing bad record insertion during ingestion. - Time travel: Allowing rollbacks, audit trails, and reproducible ML experiments.



A gentle reminder: If you've got awesome Spark ideas to share, don't forget to submit a CFP for Scale By The Bay!

Get Early Bird tickets 🐦 while they're hot!

See you by the Bay!

Comments


bottom of page