Explaining Hadoop's HDFS (Hadoop Distributed File System)

preview_player
Показать описание
Big data is awash with acronyms at the moment, none more widely used than HDFS. Let's cut to the chase... it stands for Hadoop Distributed File System.

This is the system of distributing files that allows Hadoop to work on huge data sets at speed. It spreads blocks of data across different servers, as well as duplicating those blocks of data, and storing them distinctly.

Let's see why with an example.
Sarianne works in the financial markets, and runs a lot of predictive models to make sure her investments are minimum risk.

Utilising HDFS, her queries through Hadoop can run quickly because the data blocks are stored separately -- meaning all the computation can happen in one go, rather than queuing up behind each other.

As an added benefit, if one server fails (as one is bound to, given the amount of servers and disk drives needed to run big data projects) it won't stop Sarianne's models from pulling the data they need, because HDFS duplicated those blocks -- meaning Hadoop can return Sarianne's results in double quick time.
Рекомендации по теме
Комментарии
Автор

explained well but music is distracting..
Can you also explain "Data Blocks in HDFS".

lakhvinderkaur
Автор

Please try to turn this music off because it's challenging to catch what you say although you paid great offer in this playlist videos

sarahelmowafy
Автор

Couldn't watch to the end. Great content but music was unbearable.

pasitaf