filmov
tv
But how can it fit so much music? (The FLAC Codec #2 – Lossless Audio Compression)
Показать описание
UPDATE: Martijn van Beurden, one of the two authors of the IETF FLAC spec, has left a comment with some corrections and additional explanations! The comment is pinned below.
Episode 2 of the deep-dive series into FLAC, digital audio and its lossless compression. This video deals with the ideas and theory behind FLAC’s compression methods, including Linear Predictive Coding, Exponential Golomb Coding and Interchannel Decorrelation.
Chapters:
00:00 - Previously...
01:43 - Audio compression basics
03:16 - Excursion 1: Polynomials
04:13 - Linear Predictive Coding
09:02 - Excursion 2: Why these specific predictors?
11:40 - Excursion 2: A more formal method
15:54 - Linear Predictive Coding: Final remarks
17:20 - Residuals
19:55 - Residuals: Variable bit depth
22:36 - Residuals: Arbitrary order Exponential Golomb Coding
25:25 - Residuals: Extension to whole numbers
26:09 - Interchannel Decorrelation
27:26 - Ending
Footnotes (square brackets in the video):
[1] It’s more complicated than that. MP3 transforms the samples into the frequency domain using an FFT called MDCT (Modified Discrete Cosine Transform) and then splits those up into subbands which are quantized and reduced in resolution, throwing away information. Also, there’s additional steps before and after the FFT.
[3] This is of course a simplification, not all functions can be approximated well with Taylor series polynomials. However, there is a way of proving or disproving that the Taylor series converges to the original function, which means the infinite Taylor “polynomial” is exactly equal to the approximated function.
[4] Fun fact: Computers use these exact formulas for calculating sine and cosine.
[5] This way of writing series might seem weird to the math-inclined, but it’s how signals in digital signal processing theory are sometimes notated. You might also see regular function notation (that’s what my German textbook does), but I don’t use it here to emphasize the fact that we’re in discrete signal land instead of continuous function land.
[8] In case you’re wondering: There’s also the regular Golomb code. Instead of 2^k as a parameter it uses any integer parameter M, and it’s just quotient*factor + remainder encoding where the quotient is M. By framing it this way, Exponential Golomb is both a special case of Elias gamma coding and Golomb coding.
A general note about the LPC coefficient derivation section: I’m skipping over a significant amount of formalism, as the distance between samples is technically a constant h. However, as mentioned in the end of the geometric motivation, in our approximations we assume h=1 which makes all the h-related factors disappear. Therefore, I elected to leave them out entirely in order to simplify the equations.
Another general note about the whole video: I’m leaving out a lot of the concrete implementation considerations, so if you are already thinking hard about how on earth some of these things can work together, it will probably be addressed in the third video.
Episode 2 of the deep-dive series into FLAC, digital audio and its lossless compression. This video deals with the ideas and theory behind FLAC’s compression methods, including Linear Predictive Coding, Exponential Golomb Coding and Interchannel Decorrelation.
Chapters:
00:00 - Previously...
01:43 - Audio compression basics
03:16 - Excursion 1: Polynomials
04:13 - Linear Predictive Coding
09:02 - Excursion 2: Why these specific predictors?
11:40 - Excursion 2: A more formal method
15:54 - Linear Predictive Coding: Final remarks
17:20 - Residuals
19:55 - Residuals: Variable bit depth
22:36 - Residuals: Arbitrary order Exponential Golomb Coding
25:25 - Residuals: Extension to whole numbers
26:09 - Interchannel Decorrelation
27:26 - Ending
Footnotes (square brackets in the video):
[1] It’s more complicated than that. MP3 transforms the samples into the frequency domain using an FFT called MDCT (Modified Discrete Cosine Transform) and then splits those up into subbands which are quantized and reduced in resolution, throwing away information. Also, there’s additional steps before and after the FFT.
[3] This is of course a simplification, not all functions can be approximated well with Taylor series polynomials. However, there is a way of proving or disproving that the Taylor series converges to the original function, which means the infinite Taylor “polynomial” is exactly equal to the approximated function.
[4] Fun fact: Computers use these exact formulas for calculating sine and cosine.
[5] This way of writing series might seem weird to the math-inclined, but it’s how signals in digital signal processing theory are sometimes notated. You might also see regular function notation (that’s what my German textbook does), but I don’t use it here to emphasize the fact that we’re in discrete signal land instead of continuous function land.
[8] In case you’re wondering: There’s also the regular Golomb code. Instead of 2^k as a parameter it uses any integer parameter M, and it’s just quotient*factor + remainder encoding where the quotient is M. By framing it this way, Exponential Golomb is both a special case of Elias gamma coding and Golomb coding.
A general note about the LPC coefficient derivation section: I’m skipping over a significant amount of formalism, as the distance between samples is technically a constant h. However, as mentioned in the end of the geometric motivation, in our approximations we assume h=1 which makes all the h-related factors disappear. Therefore, I elected to leave them out entirely in order to simplify the equations.
Another general note about the whole video: I’m leaving out a lot of the concrete implementation considerations, so if you are already thinking hard about how on earth some of these things can work together, it will probably be addressed in the third video.
Комментарии