Intro To Predictive Data Analysis Using Apache Kafka, Spark, Zeppelin On JVM




Intro to predictive data analysis using Apache Kafka, Spark, Zeppelin on JVM

2 October 2019

New York

Added 01-Jan-1970

Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning, that analyze current and historical facts to make predictions about future or otherwise unknown events.

Predictive Analysis is increasingly becoming a necessary skill set for data analysts. There are many ways to accomplish this, but there are many offerings, solutions and alternative methods in the marketplace today. Choosing a starting point by choosing a practical, intuitive and effective platform is not so easy.

A viable alternative and a good way to start, is to use Apache Kafka and Spark in conjunction with Apache Zeppelin. In unison this stack provides both a powerful analytics engine for responsive large-scale data processing and computation. We will show linear regression techniques to illustrate predictive analysis.

At this meetup, you will learn

- How Kafka can be an event source to collect weather data from say IoT sensors.
- See how Spark can be called from and execute from a few simple Java applications - Word Count, Basic functional aspects like map/reduce/fold
- How Apache Zeppelin `Notebook` allows for interaction with in memory Resilient Distributed Datasets (RDD) which provide parallelized predictive analysis on single and multiple raw datasets (e.g. How Flight Delay data may collate with Weather datasets.)