The Hadoop ecosystem is continuously growing to meet the needs of Big Data. This slideshare presentation will help you understand the role of each component of the Hadoop ecosystem.
2. The popularity of Hadoop has grown in the last few years, because
it meets the needs of many organizations for ļ¬exible data analysis
capabilities with an unmatched price-performance curve.
The Hadoop ecosystem is continuously growing to meet the needs
of Big Data. Letās understand the role of each component of the
Hadoop ecosystem.
3. Hadoop Ecosystem comprises of the following 12 components:
Hadoop HDFS
HBase
SQOOP
Flume
Apache Spark
Hadoop MapReduce
Pig
Impala
hadoop Hive
Cloudera Search
Oozie
Hue
7. Provides ļ¬le permissions and
authentication1
Streaming access to ļ¬le
system data2
Hadoop provides a command
line interface to interact with
HDFS
3
Suitable for the distributed
storage and processing4
A storage layer for Hadoop
5
HADOOP DISTRIBUTED FILE SYSTEM1
8. HBASE
Stores data in
HDFS
Mainly used when you need
random, real-time, read/write
access to your Big Data
A NoSQL database
or non-relational database
Provides support to high
volume of data and high
throughput
The table can have
thousands of columns
1 2 3 4 5
2
9. SQOOP
SqoopĀ is a tool designed
to transfer data between
Hadoop and relational
database servers
It isĀ usedĀ to import data
from relational databases
such as Oracle and
MySQL to HDFS and
export data from HDFS
to relational databases
3
11. SPARK
1
An open-source cluster
computing framework
3
Provides 100 times faster
performance than MapReduce
2
Supports Machine learning,
Business intelligence, Streaming,
and Batch processing
Spark Core and Resilient
Distributed Datasets (RDDs)
Spark SQL
Spark Streaming
Machine Learning
Library (Mlib)
GraphX
5
12. HADOOP MAPREDUCE
4
3
2
1Commonly used
An extensive and mature
fault tolerance framework
The original Hadoop processing
engine which is primarily Java based
Based on the map and
reduce programming model
6
14. IMPALA
1
High performance
SQL engine which
runs on Hadoop
cluster
2
Ideal for
interactive
analysis
4
Supports a
dialect of SQL
(Impala SQL)
3
Very low
latency ā measured
in milliseconds
8
16. CLOUDERA SEARCH
One of Cloudera's
near-real-time access
products
Users do not need SQL or
programming skills to use
Cloudera Search
A fully integrated data
processing platform
Enables non-technical users
to search and explore data
stored in or ingested into
Hadoop and HBase
10
17. OOZIE
Oozie is a workļ¬ow or
coordination system
used to manage the
Hadoop jobs
End
Action
Action1
Action2
Action3
A
B
C
Oozie Coordinator
Engine
Oozie Workļ¬ow
Engine
Start
11
18. HUE
Hue is an open source
Web interface for
analyzing data with
Hadoop
It provides SQL editors
for Hive, Impala, MySQL,
Oracle, PostgreSQL,
Spark SQL, and Solr SQL
12
19. DO YOU WANT TO BE
HADOOP
CERTIFIED?
CLICK HERE