SELECT COUNT(*) is Slow, Estimate it Instead (with Example in Node JS and Postgres)

preview_player
Показать описание
Get my database course

In this video, I'll show how we can use the database query planner to estimate the count with the table statistics and gain enormous performance. I'm going to use Postgres as my database and express node js as my backend.

Source Code

Get my database course


Become a Member on YouTube

🧑‍🏫 Courses I Teach

🏭 Backend Engineering Videos in Order

💾 Database Engineering Videos

🎙️Listen to the Backend Engineering Podcast

Gears and tools used on the Channel (affiliates)

🖼️ Slides and Thumbnail Design
Canva

🎙️ Mic Gear
Shure SM7B Cardioid Dynamic Microphone

Cloudlifter

XLR cables

Focusrite Audio Interface

📷 Camera Gear
Canon M50 Mark II

Micro HDMI to HDMI 

Video capture card

AC Wall for constant power

Stay Awesome,
Hussein
Рекомендации по теме
Комментарии
Автор



Learn the fundamentals of database systems to understand and build performant backend apps

Check out my udemy Introduction to Database Engineering


hnasr
Автор

That is the most beautiful frontend app I've seen in my entire life.

sadhlife
Автор

the only "efficient" (that I know of) way to keep count of rows, is to store the count somewhere in the database, that way u increase the number when you post something, decrease when u delete, etc.
Startups have gotten huge bills for making large amounts of queries, just to get a count or get a sum of something.

PASTRAMIKick
Автор

This is nice, although I can't think of any situation where an estimate would be good enough when looking for the actual count.

GergiH
Автор

For the question at 8:20, you can do a .then when you create the original promise and use Promise.all on the promises returned from then. So basically something like:
gradeCalls.push( generateGradeCall().then( updateOnGradeChange ) );
Promise.all(gradeCalls).then( updateOnFinishEverything );

You can chain as many .then's as you want, since they return promises. It may not make for clean code though.

vaishnavsm
Автор

@8:41 Just do
grades.forEach( g => gradeCalls.push( fetch(url).then(res => lblDuration.textContent = blahblahblah...) );
Promise.all(gradeCalls);
Fetch just returns a promise, so you can chain an additional promise that updates the UI and push that to your gradeCalls. Now Promise.all() will execute the fetch calls AND the updates to the UI, which will happen after each fetch is completed instead of after all the fetch calls are completed.

jaygrollman
Автор

The estimates are around 1.01 and 1.10 times as much as the actual values. You could actually then render some % (as a constant) of the estimates to align closer with the real values

CallumAtwal
Автор

I was so close to running select count(*) today but remembered this video and had to come back and steal this! Thanks a lot

amiratakhodaparast
Автор

@Hussien - i have a doubt as you are sending 10 requests in parallel, aren't browser limit of 6 connect per host queue the other 4 requests?

ramanjay
Автор

he literally never has a dislike on this content !! the best teacher for a reason.. :) Thanks for sharing

thechhavibansal
Автор

The problem you mentioned in at 8:20, is actually really simple with JS promises! Basically right after the fetch() call, you would add a .then() where you can update each element. Since you still get a promise after calling .then(), the rest of your code with Promse.all will continue to work as-is.

I just built a small demo to make sure it works, and here's the JS code I used:

function apiPromise(elem) {
return fetch('/api/', {
method: 'GET'
})
.then(response => {
return response.json()
})
.then(json => {
elem.innerText = json.value
})
}

document.getElementById("button").addEventListener("click", function(event) {
event.target.innerText = "loading"
var outputElems =
outputElems.forEach(elem => {
elem.innerText = "..."
})

var promises = outputElems.map(apiPromise)
=> {
event.target.innerText = "done"
})
});

vivekseth
Автор

IMPORTANT COMMENT!
Please read until the end.

I have no experience with backend and
I'm SQL newcomer but Veteran competitive programmer so there is a solution crossed my mind while watching the video.

We want to count the number of records of a specific table efficiently and accurately, right ?

Why not to make a small additional table this table will store the number of records of all our tables in the database. I said small because if we have 100 or even 10000 tables (actually I have no idea if this number is realistic or not) still small to make fast query on it.

Our additional table (let call it Foo) has two columns the first column is maybe something like Table_id and the second column is Count (that stores the number of records of a table)

So how that works ?
Imagine we add new record in Grades table, then I will go to Foo table and go to the cell responsible for the Grades table and increase its value by the number of the new records we added in Grades table. And the same thing is done with deleting records.

This idea is accurate and very fast because when we want to know the number of records of specific table we just make a query on Foo table which is small.

Well, maybe it's difficult to update Foo table when delete many records with single query statement because (according to my knowledge) we don't know the exact number of deleted records

But I think it works fine when counting the number of likes in social media apps
Why?
Cus no deleting many records with single query made on it.
Just when some user click like add him to table and increase the number of records in Foo table by one. When click unlike delete it and decrease Foo by one.
That works efficiently and give as the exact number of likes, right ?

Finally, as I said I'm beginner with SQL so Please if there is something wrong with this idea correct me :)
And sorry for my bad English

omar-elaraby
Автор

This channel is a gold mine . I get gold every time I visit

bijeesraj
Автор

Had to reopen the video today. To actually use the explain query.
Thanks @Nasser

scholarshiphelp
Автор

In my experience on a real scenario (100M+ rows for Pushpad), the estimates were completely wrong by several millions (much worse than your results). A real alternative / much better alternative is to keep the Visibility Map (VM) clean with frequent vacuum and then use index-only counts, which are fast. The problem is that it is not easy to always keep the VM perfectly clean. When a block is not clean PG goes to the heap, and this is the origin of all performance problems. There should be a way to tell Postgresql to just ignore the VM and consider all the index entries visible. Something like SELECT ESTIMATE(*) that considers all the index entries visible - without going to the VM or to the heap to check if they were modified or deleted recently.

collimarco
Автор

as soon as you click on the estimate button NATIONAL GEOGRAPHY channel pop up on my mobile screen 😂😂😂

vijayroy
Автор

So the point is the query "explain (format jaon) select * from grades where g between $1 and $2". When we just use explain query, dbms just shows the query plan result with some estimated counts.

c_
Автор

1. is there a reason you are not creating an index on the column you're filtering? without it you're essentially benchmarking sequential scan for (g between a and b), not count(*) per se.

2. the estimate approach without index kinda works only if you're lucky and the value-ranges you're counting are somewhat uniformly distributed.
Consider this example, where I use a second column with a different distribution of values:

postgres=# create table grades (id serial, g int, gg int);
CREATE TABLE
postgres=# insert into grades (g, gg) select 100*random(), (10*random())^2 from
INSERT 0

The query plan is not guaranteed to be a simple sequential scan, and if it's more complicated, the root node's "Plan Rows" value is not the estimate of how many rows the query would return.

unclesheo
Автор

I don't think there's a need for compromise here: There are good tools for handling fast changing values accurately and much faster. Why not use an in memory database like Redis for this?

ChumX
Автор

@8:41 I could be wrong here, but I don't think the behavior you want is possible using a Promise (since they only resolve once... right?). Instead, maybe try using an Observable?

tjalferes