How to Sort a very large file | External Sorting technique

preview_player
Показать описание
This video explains a very important interview question which is to sort a very large file which cannot fit in the main memory.We can only apply sorting algorithm to the data in main memory because accessing and doing operations on hard disk will be extremely slow.In this case, how do we sort optimally ? I have shown the intuitive solution to this problem which makes use of both main memory and hard disk to do optimized sorting using the external sorting technique.In this case, we divide the entire data into chunks of size available in main memory.Sort all the chunks and save them all in the hard disk.Now apply the technique of merging K sorted lists.This is how we can do external sorting for very large data size.

CODE LINK is present below as usual. If you find any difficulty or have any query then do COMMENT below. PLEASE help our channel by SUBSCRIBING and LIKE our video if you found it helpful...CYA :)

======================================PLEASE DONATE=============================
==============================================================================

=======================================================================
USEFUL LINKS:

RELATED LINKS:

#techdose #externalsort #sortlargefile
Рекомендации по теме
Комментарии
Автор

Hello Sir,
I have one doubt here.
As we are preparing sorted chunks, when we are going to merge them, again we will need chunk1 (space) + chunk2(space) in RAM.
Will this case not happen ? Sorry if I have understood it wrong

rohanseth
Автор

NIce explanation. But I see a problem in the second step, i.e merging of 4 sorted chunks, each of size 2. How will this be achieved with 2GB RAM? We would need O(k) space for merging K sorted chunks. And k is the number of chunks i.e 4. Hence 2GB RAM won't be enough to achieve the final sorted list? Let me know if I'm going wrong anywhere!

harshitpandey
Автор

I think merge sort itself require O(n) memory apart from input space.
how we can have chunk size of 2 GB when we have only 2 GB of RAM ?

priyankarrajgupta
Автор

Bro... I have 4 yrs in IT Serviced based company and I know java .. Could you please guide me how to get placed in top product tech companies ... I mean to say interview preparation guidance for experienced people.... Please help me 🙂

venkatj
Автор

Hi TECH DOSE, One doubt here while merging

> once all the chunks are sorted, while merging we pull in first elements from each chunk and sort it using min heap method as you've mentioned in your video for final sorting & merging.
> Now, the RAM is full with 2GB of sorted data and each chunks in large file will be partially empty, lets say C1/C2/C3/C4 is 1/4th empty
> we need to swap back this first 2GB sorted merged data into to ram and start sorting the remaining data, won't this create problem since the C1/2/3/4 are partially empty and how do we manage that situation ?

hariprasadcr
Автор

Hello sir,
Can u please tell me which software are u using for drawing and writing purpose?

jayakrishna
Автор

What type of software tool used for this type of presentation?

gateeasycse
Автор

Great explanation.
However, I have one query. For sorting of individual chunks, won't we need to pick an algorithm with O(1) space complexity. Merge sort will require O(N) space (2GB in our case) which will exceed the 2GB RAM as our chunk itself will take 2 GB.

YashDalmia
Автор

thank you for explaing the concepts. i have one doubt. i am thinking on what happes to time taken if RAM size is reduced to half for sorting the C.S records? my understanding is number of chunks will be doubled and S will halved keeping 2C.S/2 product as C.S. when i applied this to O(C.S Log S + C.S Log C), time change seems zero. do you agree with this? for some reasons, i think this conclusion seems wrong since it proves reducing RAM has no effect on the overall time taken. where am i going wrong?

naidurongali
Автор

First half was clear. but didnt explain the merge part ..how it is handled in memory

BindassShooter
Автор

I was asked in an interview. How will you do the sum of 10000 array elements efficiently? Do you know what he is expecting?

HariKrishnan-ffhf
Автор

Isn't the space complexity O(S) and not O(C)? This is because we can store at the max S elements in the RAM, which is the size of each chunk.

RaviKiran_Me
Автор

When we will apply the merge k lists algo then we will need O(N) extra space bcoz the data is not in the linked list form
And then we will copy the sorted data to the log file

aabhassaxena
Автор

if you r to merge the sorted chunk and again sort it, why the heck would you sort the chunk first... just merge everything and sort it there... your step is taking double the time. geez youtube ka 50 IQ ka instructor

ericforpm