SELECT COUNT(*) is Slow, Estimate it Instead (with Example in Node JS and Postgres)

Показать описание

Get my database course

In this video, I'll show how we can use the database query planner to estimate the count with the table statistics and gain enormous performance. I'm going to use Postgres as my database and express node js as my backend.

Source Code

Get my database course

Become a Member on YouTube

🧑‍🏫 Courses I Teach

🏭 Backend Engineering Videos in Order

💾 Database Engineering Videos

🎙️Listen to the Backend Engineering Podcast

Gears and tools used on the Channel (affiliates)

🖼️ Slides and Thumbnail Design
Canva

🎙️ Mic Gear
Shure SM7B Cardioid Dynamic Microphone

Cloudlifter

XLR cables

Focusrite Audio Interface

📷 Camera Gear
Canon M50 Mark II

Micro HDMI to HDMI

Video capture card

AC Wall for constant power

Stay Awesome,
Hussein

Рекомендации по теме

Комментарии

Learn the fundamentals of database systems to understand and build performant backend apps

Check out my udemy Introduction to Database Engineering

hnasr

That is the most beautiful frontend app I've seen in my entire life.

sadhlife

the only "efficient" (that I know of) way to keep count of rows, is to store the count somewhere in the database, that way u increase the number when you post something, decrease when u delete, etc.
Startups have gotten huge bills for making large amounts of queries, just to get a count or get a sum of something.

PASTRAMIKick

This is nice, although I can't think of any situation where an estimate would be good enough when looking for the actual count.

GergiH

For the question at 8:20, you can do a .then when you create the original promise and use Promise.all on the promises returned from then. So basically something like:
gradeCalls.push( generateGradeCall().then( updateOnGradeChange ) );
Promise.all(gradeCalls).then( updateOnFinishEverything );

You can chain as many .then's as you want, since they return promises. It may not make for clean code though.

vaishnavsm

@8:41 Just do
grades.forEach( g => gradeCalls.push( fetch(url).then(res => lblDuration.textContent = blahblahblah...) );
Promise.all(gradeCalls);
Fetch just returns a promise, so you can chain an additional promise that updates the UI and push that to your gradeCalls. Now Promise.all() will execute the fetch calls AND the updates to the UI, which will happen after each fetch is completed instead of after all the fetch calls are completed.

jaygrollman

The estimates are around 1.01 and 1.10 times as much as the actual values. You could actually then render some % (as a constant) of the estimates to align closer with the real values

CallumAtwal

I was so close to running select count(*) today but remembered this video and had to come back and steal this! Thanks a lot

amiratakhodaparast

@Hussien - i have a doubt as you are sending 10 requests in parallel, aren't browser limit of 6 connect per host queue the other 4 requests?

ramanjay

he literally never has a dislike on this content !! the best teacher for a reason.. :) Thanks for sharing

thechhavibansal

The problem you mentioned in at 8:20, is actually really simple with JS promises! Basically right after the fetch() call, you would add a .then() where you can update each element. Since you still get a promise after calling .then(), the rest of your code with Promse.all will continue to work as-is.

I just built a small demo to make sure it works, and here's the JS code I used:

function apiPromise(elem) {
return fetch('/api/', {
method: 'GET'
})
.then(response => {
return response.json()
})
.then(json => {
elem.innerText = json.value
})
}

document.getElementById("button").addEventListener("click", function(event) {
event.target.innerText = "loading"
var outputElems =
outputElems.forEach(elem => {
elem.innerText = "..."
})

var promises = outputElems.map(apiPromise)
=> {
event.target.innerText = "done"
})
});

vivekseth

IMPORTANT COMMENT!
Please read until the end.

I have no experience with backend and
I'm SQL newcomer but Veteran competitive programmer so there is a solution crossed my mind while watching the video.

We want to count the number of records of a specific table efficiently and accurately, right ?

Why not to make a small additional table this table will store the number of records of all our tables in the database. I said small because if we have 100 or even 10000 tables (actually I have no idea if this number is realistic or not) still small to make fast query on it.

Our additional table (let call it Foo) has two columns the first column is maybe something like Table_id and the second column is Count (that stores the number of records of a table)

So how that works ?
Imagine we add new record in Grades table, then I will go to Foo table and go to the cell responsible for the Grades table and increase its value by the number of the new records we added in Grades table. And the same thing is done with deleting records.

This idea is accurate and very fast because when we want to know the number of records of specific table we just make a query on Foo table which is small.

Well, maybe it's difficult to update Foo table when delete many records with single query statement because (according to my knowledge) we don't know the exact number of deleted records

But I think it works fine when counting the number of likes in social media apps
Why?
Cus no deleting many records with single query made on it.
Just when some user click like add him to table and increase the number of records in Foo table by one. When click unlike delete it and decrease Foo by one.
That works efficiently and give as the exact number of likes, right ?

Finally, as I said I'm beginner with SQL so Please if there is something wrong with this idea correct me :)
And sorry for my bad English

omar-elaraby

This channel is a gold mine . I get gold every time I visit

bijeesraj

Had to reopen the video today. To actually use the explain query.
Thanks @Nasser

scholarshiphelp

In my experience on a real scenario (100M+ rows for Pushpad), the estimates were completely wrong by several millions (much worse than your results). A real alternative / much better alternative is to keep the Visibility Map (VM) clean with frequent vacuum and then use index-only counts, which are fast. The problem is that it is not easy to always keep the VM perfectly clean. When a block is not clean PG goes to the heap, and this is the origin of all performance problems. There should be a way to tell Postgresql to just ignore the VM and consider all the index entries visible. Something like SELECT ESTIMATE(*) that considers all the index entries visible - without going to the VM or to the heap to check if they were modified or deleted recently.

collimarco

as soon as you click on the estimate button NATIONAL GEOGRAPHY channel pop up on my mobile screen 😂😂😂

vijayroy

So the point is the query "explain (format jaon) select * from grades where g between $1 and $2". When we just use explain query, dbms just shows the query plan result with some estimated counts.

c_

1. is there a reason you are not creating an index on the column you're filtering? without it you're essentially benchmarking sequential scan for (g between a and b), not count(*) per se.

2. the estimate approach without index kinda works only if you're lucky and the value-ranges you're counting are somewhat uniformly distributed.
Consider this example, where I use a second column with a different distribution of values:

postgres=# create table grades (id serial, g int, gg int);
CREATE TABLE
postgres=# insert into grades (g, gg) select 100*random(), (10*random())^2 from
INSERT 0

The query plan is not guaranteed to be a simple sequential scan, and if it's more complicated, the root node's "Plan Rows" value is not the estimate of how many rows the query would return.

unclesheo

I don't think there's a need for compromise here: There are good tools for handling fast changing values accurately and much faster. Why not use an in memory database like Redis for this?

ChumX

@8:41 I could be wrong here, but I don't think the behavior you want is possible using a Promise (since they only resolve once... right?). Instead, maybe try using an Observable?

tjalferes

SELECT COUNT(*) is Slow, Estimate it Instead (with Example in Node JS and Postgres)

SELECT COUNT(*) is Slow, Estimate it Instead (with Example in Node JS and Postgres)

Databases: Postgresql extremely slow count (with index, simple query) (4 Solutions!!)

Learn Quantrix Modeler: CountIf Too Slow. Forget It! Go with CountA Select Episode #0097

Databases: Mysql group by and count queries are slow (2 Solutions!!)

Postgres, MVCC, and you or, Why COUNT(*) is slow (David Wolever)

A Deep Dive in How Slow SELECT * is

Slow count | Postgres.FM 114 | #PostgreSQL #Postgres podcast

How do I improve this slow count query on MySQL? (3 Solutions!!)

Find your slow queries, and fix them! - Stephen Frost: PGCon 2020

Why Is My Query Slow? More Reasons Storing Dates as Numbers Is Bad

Databases: PostgreSQL extremely slow counts

PT-01 Identify slow query and Explain plan in Postgresql

Databases: SELECT count(\*) from a materialized view with unique index is slow

MySQL : mysql slow count in join query

Slow query! Why didn't the optimizer use my index?

SQL : postgresql: offset + limit gets to be very slow

Postgres Performance Diagnostics: Busting that slow running SQL

Find your slow queries, and fix them!

Why is my Power BI refresh so SLOW?!? 3 Bottlenecks for refresh performance

Databases: Slow bitmap heap scan on group/count query in a large table

51 - PostgreSQL : Slow long running and blocked queries in PostgreSQL

Finding and Fixing Slow SQL Queries

Identifying Slow Queries and Fixing Them!

Postgres Open 2016 - Identifying Slow Queries and Fixing Them!