Tim is a teacher, author, and technology leader with StarTree, where he serves as the VP of Developer Relations. He is a regular speaker at conferences and a presence on YouTube explaining complex technology topics in an accessible way. He tweets as @tlberglund and blogs every few years at https://timberglund.com. He lives with his wife in Mountain View, CA, USA. He has three grown children and three grandchildren.
Pinot, Why Are You So Fast?
Apache Pinot™ is not the first database optimized for analytical queries, so why has it found its way into game-changing applications at companies like LinkedIn, Stripe, and Uber, and why is it being embraced with such enthusiasm by people building real-time, event-driven systems? There are so many databases to choose from, and so many ways to do real-time processing of streaming data, why this database, and why in these use cases? At the risk of oversimplification: because it's fast.
Pinot can ingest more than a million events per second directly from Kafka, making it a natural fit for streaming systems. But when our goal is to expose insights about these events to users immediately, we need query latencies low enough to serve UI features in real time. So Pinot needs a solid Kafka ingestion story, but also fast, fast reads.
In this talk, we'll look at how the Pinot read path scales out, dividing query processing among arbitrarily many individual nodes. We'll spend significant time looking at the fascinating set of Pinot indexing strategies. Now, there is no real magic in this: to make reads fast, we either need to scan less or scan faster, and Pinot's indexes artfully help it do both.
Come to this talk to dive into some Pinot internals, learn how distributed column-oriented databases are built, and how Pinot is becoming it the choice of more and more leading real-time, user-facing analytics applications that are on the leading edge right now.
Comments