What is Apache Hive? : Understanding Hive

preview_player
Показать описание
ATTENTION DATA SCIENCE ASPIRANTS:
Click Below Link to Download Proven 90-Day Roadmap to become a Data Scientist in 90 Days

In this video, you will get a quick overview of Apache Hive, one of the most popular data warehouse components on the big data landscape. It’s mainly used to complement the Hadoop file system with its interface.
Hive was originally developed by Facebook and is now maintained as Apache hive by Apache software foundation. It is used and developed by biggies such as Netflix and Amazon as well.

Why was Hive Developed
=====================
The Hadoop ecosystem is not just scalable but also cost effective when it comes to processing large volumes of data. It is also a fairly new framework that packs a lot of punch. However, organizations with traditional data warehouses are based on SQL with users and developers that rely on SQL queries for extracting data.

It makes getting used to the Hadoop ecosystem an uphill task. And that is exactly why hive was developed.

Hive provides SQL intellect, so that users can write SQL like queries called HQL or hive query language to extract the data from Hadoop. These SQL likes queries will be converted into map reduce jobs by the Hive component and that is how it talks to Hadoop ecosystem and HDFS file system.

How and when Hive can be used?
===========================
 Hive can be used for OLAP (online analytic) processing
 It is scalable, fast and flexible
 It is a great platform for the SQL users to write SQL like queries to interact with the large datasets that reside on HDFS filesystem
Here is what Hive cannot be used for:
==============================
 It is not a relational database
 It cannot be used for OLTP (online transaction) processing
 It cannot be used for real time updates or queries
 It cannot be used for scenarios where low latency data retrieval is expected, because there is a latency in converting the HIVE scripts into MAP REDUCE scripts by Hive
Some of the finest features of Hive
============================
 It supports different file formats like sequence file, text file, avro file format, ORC file, RC file
 Metadata gets stored in RDBMS like derby database
 Hive provides lot of compression techniques, queries on the compressed data such as SNAPPY compression, gzip compression
 Users can write SQL like queries that hive converts into mapreduce or tez or spark jobs to query against hadoop datasets
 Users can plugin mapreduce scripts into the hive queries using UDF user defined functions
 Specialized joins are available that help to improve the query performance
If you don’t understand any of the above terms, that is fine. We will look into the above features in detail in our upcoming videos.
Рекомендации по теме
Комментарии
Автор

Yo! Thanks for the video, really insightful and concrete. 5:23 minutes of my life well spent.

ricardomarino
Автор

I didn't like the comparison between Hive and RDBMS. Hive is for processing data and RDBMS for storing data. You could say Hive+HDFS to avoid confusion. Anyway thank's for the introduction !

Ayoub-adventures
Автор

Sir, will you please give me answer to this? What approach we should take to load thousands of small 1 KB files using Hive, do we load one by one or should we merge together and load at once and how to do this?

deepikakumari
Автор

what does HDFS stand for?
explanation please...

iskandarsyah
Автор

A commercial RDBMS machine has more than 10's of terabytes of just ram. RDBMS can manage much larger datasets, not 10 terabytes..

pradeep