Large Scale Machine Learning with Apache Spark

Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes, mainly K-means.

Date: Wednesday, May 21 2014

Description

Apache Spark is easy to develop with and fast to run. Understand how to use K-means for clustering data, where you can then find anomalies from the typical patterns for fraud detection, network intrusions, and such. Learn how Spark takes advantage of Resilient Distributed Datasets (RDD) – parallel transformations on data in stable storage.

Next Steps