[Streamed version. Front & back trimmed. Slide issue in beginning.]
An edited version is available: https://www.youtube.com/watch?v=ANqB72b0r38
Greg Makowski, Director of Data Science, LigaDATA
This talk will start with a number of complex data real-time use cases, such as a) complex event processing, b) supporting the modeling of a data mining department and c) developing enterprise applications on Apache big-data systems. While Hadoop and big data has been around for a while, banks and healthcare companies tend not to be early IT adopters. What are some of the security or roadblocks in Apache big data systems for such industries with high requirements?
Data mining models can be trained in dozens of packages, but what can simplify the deployment of models regardless of where they were trained or with what algorithm? Predictive Modeling Markup Language (PMML), is a type of XML with specific support for 15 families of data mining algorithms. Data mining software such as R, KNIME, Knowledge Studio, SAS Enterprise Miner are PMML producers. The new open-source product, Kamanja, is the first open-source, real-time PMML consumer (scoring system). One advantage of PMML systems is that it can reduce time to deploy production models from 1-2 months to 1-2 days - a pain point that may be less obvious if your data mining exposure is competitions or MOOCs. Kamanja is free on Github, supports Kafka, MQ, Spark, HBase and Cassandra among other things. Being a new open-source product, initially, Kamanja supports rules, trees and regression.
I will cover an architecture of a sample application using multiple real-time open source data, such as social network campaigns and tracking sentiment for the bank client and its competitors. Other real-time architectures cover credit card fraud detection. A brief demo will be given of the social network analysis application, with text mining.
An overview of products in the space will include popular Apache big data systems, real-time systems and PMML systems.
For more details:
Venue sponsored by eBay,
Food and live streaming sponsored by LigaDATA,
San Jose, CA, July 27, 2015
Chapter Chair Bill Bruns
Data Science SIG Program Chair Greg Makowski
Vice Chair Ashish Antal
Volunteer Coordinator Liana Ye
Volunteers Joan Hoenow, Stephen McInerney, Derek Hao, Vinay Muttineni
Camera Tom Moran
Production Alex Sokolsky
Copyright © 2015 ACM
San Francisco Bay Area Professional Chapter