Distributed Bitmapped Indexing Layer For Big Data




Distributed Bitmapped Indexing Layer for Big Data

1 November 2018


Added 01-Jan-1970

This month, we have Brandwatch's Phil Messenger talking about Memosyne, and look at the solution to a difficult scaling issue

The Brandwatch Audiences product allows adhoc queries joining hundreds of millions of social network profiles with billions of post and tens of billions of follower graph edges, all updated in realtime at thousands of transactions per second.

Faced with scalability limitations in our original data backend, we opted to build Mnemosyne, our own distributed indexing layer. Fusing succinct data structures, free text search, in memory computing with the JVM, CUDA and Kafka, the final system is able to ingest millions of entities a second whilst still answering complex queries.

This talk is the story of this build, diving into how Mnemosyne works and revealing some surprising things we learned along the way. We'll cover CAP theorem trade-offs, how brute force approaches are sometimes better than indexes, the data structures and techniques required to sort billions of records in milliseconds, how GPU's can solve unexpected problems and how to do all this on the JVM.

Phil is currently a Principal Engineer at Brandwatch, where he works on building distributed systems consuming social network firehose data. In the past he's held VPE, Chief Scientist and Technical Director roles at various companies working with machine learning and big data.

Hot food and a selection of soft and alcoholic drinks will be provided by this month's sponsors, Crunch.