Non-interactive big-data analysis prohibits experimentation and can interrupt the analyst’s train of thoughts but analyzing and drawing insights in real time is no easy task with jobs often taking minutes/hours to complete. What if you want to put a interactive interface in front of that data that allows iterative insights? What if you need that interactive experience to be sub second?
Traditional SQL and most MPP/NoSQL databases cannot run complex calculations over large data in a performant manner. Popular distributed systems such as Hadoop or Spark can execute jobs but their job overhead prohibits sub second response times. Learn how an in-memory computing framework enabled us to perform complex analysis jobs on massive data points with sub second response times — allowing us to plug it into a simple, drag-and-drop web 2.0 interface.
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing
1. Real-time Interactive
Big Data Analysis
Using In-Memory
Computing
Mike
Joyce
–
Manager
So0ware
Engineer,
iCrossing
Shawn
Nguyen
–
Lead
So0ware
Engineer,
iCrossing
2. CONNECTED
MARKETING
PLATFORM
(TECHNOLOGY)
Bid
Management
/
Trading
Desk
Data
Management
PlaNorm
(Core
Audience)
+
+
STRATEGY
&
PLANNING
Market
Research
AnalyPcs
Strategy
&
Planning
PROGRAM
DESIGN
Media
Planning
&
Buying
CreaPve
&
Experience
Design
Content
CreaPon
&
Management
AUDIENCE
ENGAGEMENT
Search
MarkePng
Programs
Social
Media
/
Mobile
Technology
&
App
Development
Measurement
&
OpPmizaPon
3. Leveraging audience insights:
• 20+
brands
• 30+
TV
networks
• 50+
newspapers
• 300+
magazines
CONTENT
DIGITAL
AGENCY
INSIDE
A
EMPIRE
4. Big Data - Cookies!
300+
million
unique
cookies
• Subscribers
• Visitors
• InternaPonal
• MulPple
devices
6. Cookies + Audience Attributes = Super Big Data!
90M+
Cookies
Male
Age 20 - 35
Sports Enthusiasts
Average
user
800+
attributes
Iowa
High Income
iPad, iPhone
Drives Mini Van
Foodie
72B+
Attribute
User
pairs
7. Audiences – Targeting vs Discovering
• Who
you
are
targePng
• How
do
you
connect
with
them?
• What
describes
them?
8. Data Scientists
Discovering
Audience
A]ributes
1. Define
an
audience
using
a]ributes
2. IdenPfy
all
a]ributes
of
cookies
in
audience
3. Calculate
highly
indexing
a]ributes
9. 1) Define the Audience
Population"
90M Cookies"
Audience"
300K Cookies"
Age: 20-35"
US > North Dakota"
Gender: Male"
10. 2) Audience Attributes
Interest:
Sports
Enthusiast
Interest:
Moose
HunPng
Intent:
Auto
Purchase
>
Truck
US
>
North
Dakota
>
Fargo
Pet
Supplies
>
Dog
Food
Attributes of"
Cookies in Audience"
Audience"
300K Cookies"
11. A3ribute
Audience
Frequency
PopulaDon
Frequency
Interest:
Sports
Enthusiast
24%
27%
Interest:
Moose
HunPng
40%
6%
Intent:
Auto
Purchase
>
Truck
17%
4%
US
>
North
Dakota
>
Fargo
30%
2%
Pet
Supplies
>
Dog
Food
6%
9%
3) Index the Attributes
Interest:
Sports
Enthusiast
Interest:
Moose
HunPng
Intent:
Auto
Purchase
>
Truck
US
>
North
Dakota
>
Fargo
Pet
Supplies
>
Dog
Food
Attributes of"
Cookies in Audience"
12. Data Scientists
Development
Ask
1. Make
it
accessible
to
“normals”
2. Exportable
visualizaPons
&
calculaPons
3. Reduce
query
Pme
from
1
hr
to
1
sec
13. Why is this Hard?
90M+
Cookies
Male
Age 20 - 35
Sports Enthusiasts
Average
user
800+
attributes
Iowa
High Income
iPad, iPhone
Drives Mini Van
Foodie
72B+
Attribute
User
pairs
Algorithm
1. Check
every
cookie
if
it
saPsfies
audience
criteria
2. Collect
all
a]ributes
for
every
audience
cookie
3. Calculate
percentages
&
index
Within
1
sec
!!!!!!
14. • Audience discovery
– Cookie Attributes
– Frequency vs Population
• Built for non-technical users
– Strategy
– Sales / Account
– Anyone
• Flexible
– Research tool
– In-meeting, iterative discovery
• Approachable
– Real-time
– Results in seconds
– Simple, elegant interface
– Multiple export formats
“Making science accessible”
The Answer – Audience Discovery Tool
24. The Challenges
• Tedious
manual
data
distribuPon
• Gar
building
and
deployment
issues
• Development
challenges
25. What We Learned
• Indexed
data
requiring
minor
calculaPons
-‐-‐
databases
(relaPonal
&
noSQL)
great
• Large
non-‐indexed
data
-‐-‐
the
data
&
processing
need
to
live
in
the
same
(memory)
space