Discrete Cosine Transform
Discrete Cosine Transform (DCT) linearly transforms data (i.e., a set of numbers, e.g., a_{0}, a_{1}, …, a_{n-1}) into the frequency domain, where the data can be represented by a set of coefficients (w_{0}, w_{1}, …, w_{n-1}) in the frequency domain. The advantage of DCT is that the energy of the original data (e.g., a_{0}, a_{1},…, a_{n-1}) may be concentrated in only a few frequency coefficients of DCT depending on the correlation in the data. This means that most of the DCT coefficients (w_{0}, w_{1}, …, w_{n-1}) may be zero or very small values. DCT expresses a signal (i.e., data, a set of numbers) in terms of a sum of cosine functions with different frequencies. I understand that some are not comfortable with the term signal, if you do not like it, you can simply say “a set of numbers”, instead. A signal is nothing but a set of numbers, nothing scary.
signal = data = a set of numbers
Let’s try to transform our first data, a set A of n values:
A = {a_{0}, a_{1}, …, a_{n-1} }.
This is a one dimensional data and we will compute its one dimensional discrete cosine transform coefficients using the formula below:
We have n elements in A, so we will have n DCT coefficients. We will later see that the discrete cosine transform coefficients are actually the weights that indicate how strong the frequency components are in the data.
L Orthogonality of Discrete Cosine Transform
We will now have some boring stuff here, but the good point is that you can simply ignore if you are not interested in the orthogonality. Orthogonal transforms are known to have the tendency of de-correlating the given signal. To make DCT values orthogonal, we multiply the terms by scale factors and . If two lines are orthogonal, they are perpendicular at their point of intersection. Similarly, two vectors, a and b, are orthogonal if and only if their dot product is zero:
where and denote the lengths or magnitudes of a and b. In addition, two vectors in an inner product space are orthonormal if they are orthogonal and both of unit length.
J
Come back to the happy zone. Let’s refine the definition of discrete cosine transform coefficients by adding the boring orthogonality stuff above.
Step1: Expansion of the DCT Definition for k=0, the DC Coefficient
Let us write the coefficient w_{0} (k=0) in the expanded form,
w_{0}, which is the first DCT coefficient, is called DC (direct current, zero frequency) coefficient. gives us the mean of the set A.
Step2: Expansion of the DCT Definition for k > 0, AC Coefficients
Let us expand the DCT formula for the rest of the coefficients, where k is greater than zero. These coefficients are called AC (alternative current) coefficients.
Let us have a break here and remember the advantage of using DCT transform. If the set A consists of correlated values (there is redundancy in the data), then most of the DCT coefficients will be either zero or very small numbers, and only a few are large. Can we estimate which coefficients will be zero or small? Actually, yes. The early AC coefficients (e.g., w_{1}, w_{2}, w_{3} ) correspond to low-frequency components, and the low-frequency coefficients contain most of the important information about the data. The later AC coefficients (e.g., w_{n-3}, w_{n-2}, w_{n-1}) correspond to high-frequency components. I know that the term “information about the data” sounds very vague, but no worries, you will understand very well when we work on an example. For now, we can say that the low-frequency coefficients (they are usually large numbers) are more important than high-frequency coefficients (they are usually small numbers). If you are not familiar with the terms high-frequency and low-frequency, that is fine. Low-frequency refers to the slow change, and high-frequency refers to the rapid change.
Step3: Compute DCT Coefficients for a Given Set
Too much talk and no action. Let us work on an example, where we will use the following set
A = {1, 2, 3, 4, 5, 6, 7, 8}.
The set has 8 elements, so n is 8. As you know, in computer science we love to use the powers of two. Moreover, numbers in the set are correlated, an obvious pattern, incremented by one. It is time to compute DCT coefficients. There are 8 numbers in the set A, so there will be 8 DCT coefficients in the frequency domain. If you use the DCT formula above and compute the coefficients for k =0, 1, 2, 3, 4, 5, 6, 7, you will see that the DCT coefficients are
w_{A} = {12.7279, -6.4423, 0, -0.6735, 0, -0.2009, 0, -0.0507}
Let us talk about the values of the coefficients we computed. The DC coefficient, w_{0}, is 12.7279. The low-frequency coefficient, w_{1}, is -6.4423. As you see the high-frequency coefficients are small numbers. This is something we expected because we have correlated numbers in the set A. If you compute DCT coefficients of a set elements where the numbers are not correlated, we cannot say anything about the values of the coefficients. All may be large numbers.
Anyway, let’s go back to our example. We know that if the value of a coefficient is small, it does not carry much information, so we can ignore the coefficients with absolute values smaller than one. We now have
w_{A} = {12.7279, -6.4423, 0, 0, 0, 0, 0, 0}
Step4: Reconstruct the Data Using the DCT Coefficients
DCT is a transform and accordingly there is something called inverse DCT. The inverse discrete cosine transform (IDCT) reconstructs the data from its discrete cosine transform coefficients. IDCT is defined as
In summary, we use DCT in order to compute the DCT coefficients {w_{0}, w_{1}, …, w_{n-1}} of the set {a_{0}, a_{1}, …, a_{n-1}}. Then, we use IDCT to reconstruct the set {a_{0}, a_{1}, …, a_{n-1}} from its DCT coefficients {w_{0}, w_{1}, …, w_{n-1}}.
If we apply IDCT transform to {12.7279, -6.4423, 0, 0, 0, 0, 0, 0}, we will obtain the data set {1.3407, 1.8217, 2.7104, 3.8716, 5.1284, 6.2896, 7.1783, 7.6593}. Let’s compare the actual data set and the data set we reconstructed from the first two DCT coefficients (remember we ignored the rest of the coefficients).
Original data set |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
Reconstructed date set |
1.3407 |
1.8217 |
2.7104 |
3.8716 |
5.1284 |
6.2896 |
7.1783 |
7.6593 |
If we compute the mean squared error between the original data set and the reconstructed date set, we will see that the error is very small, 0.4965. Why? The energy of the signal A is concentrated only on first two coefficients because the elements in A are correlated. If you like, we can calculate the energy of the signal A and how much of its energy is concentrated on the first two coefficients. The formulation below gives the energy of A.
So, E_{A} = 1^{2}+2^{2}+3^{2}+4^{2}+5^{2}+6^{2}+7^{2}+8^{2 }= 204. And let’s compute the energy stored in w_{A} = {12.7279, -6.4423, 0, -0.6735, 0, -0.2009, 0, -0.0507}:
E_{W} = 12.7279^{2}+(-6.4423)^{ 2}+0^{2}+(-0.6735)^{ 2}+0^{2}+(-0.2009)^{ 2}+0+(-0.0507)^{2} = 204.
E_{A} and E_{W} are the same! We are not surprised, right? This is because of the principle of conservation of energy. And, the first two DCT coefficients carry 99.75% of the total energy:
(12.72792^{2} +6.4423^{2})/204= 0.9975
Here is the Matlab code for this experiment. If you like you can try Octave Online:
A_raw = 1:8;
w = dct(A);
w_cof = w;
w_cof(abs(w_cof)<1) = 0;
A_rec = idct(w_cof);
mse = mean(sum((A_raw-A_rec).^2))
Step5: Understand that DCT is not magic, it depends on the correlation
Let’s repeat the same experiment for a set of uncorrelated numbers. The DCT transform will now result in coefficients with large values because the energy will be scattered among the coefficients. Consider the set B given below:
B = { 1, 5, -9, -8, 7, 1, 0, 9}
If we apply DCT to B, we will obtain the following DCT coefficients
w_{B} = {2.1213, -6.0855, 7.5688, 5.2571, 4.2426, -11.8857, -3.9005, 1.0293}
Therefore, if you plan to use DCT for data compression, you should first find a good way to represent your data where the elements are correlated. For example, let us naively sort the elements of B to improve the correlation in the set:
B_sorted = {-9, -8, 0, 1, 1, 5, 7, 9}
and then apply DCT to the sorted set and we have:
w_{B_sorted} = {2.1213, -16.4520, -2.0719, -3.5681, -0.7071, 1.8680, 2.3890, 0.3323}
Let us compare the energy concentrated on the first 3 elements of w_{B} and w_{B_sorted} try to observe the sorting effects the results.
E_{w_B} = 2.1213^{2}+(-6.0855)^{ 2}+7.5688^{2}
E_{w_B} = 98.82
E_{w_B_sorted} = 2.1213^{2}+(-16.4520)^{ 2}+(-2.0719)^{2}
E_{w_B_sorted} = 279.46
As you see, E_{w_B_sorted} is much greater than E_{w_B} because of the improved correlation amount the elements of the B_sorted.
Here is the Matlab code for this experiment. If you like you can try Octave Online:
B = [1 5 -9 -8 7 1 0 9];
W = dct(B);
B_sorted = sort(B);
W_B_sorted = dct(B_sorted);
E_W_B = sum((W(1:3).^2))
E_W_B_sorted = sum((W_B_sorted(1:3).^2))
Step6: Understand the definition of DCT, the base vectors of the transform
One of the most interesting (and maybe weird) features of DCT is that the value of n, that is the number of the elements in the input data set A, actually plays a critical role in the transform’s base vectors. What is a base vector, please wait for a moment. Before going there, let us remember the definition of DCT:
As you see, n is in the heart of the DCT definition. Therefore, most applications employing the DCT specifically use 8 as the value of n. If there is more than 8 elements in the input data set A, they basically divide A into groups of 8 elements and then apply DCT to each group individually. The table below shows the value of
in degrees for
· n = 8,
· t = 0, ..., n-1 , and
· k = 0, ..., n-1.
n = 8 |
t=0 |
t=1 |
t=2 |
t=3 |
t=4 |
t=5 |
t=6 |
t=7 |
k=0 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
k=1 |
11.25 |
33.75 |
56.25 |
78.75 |
101.25 |
123.75 |
146.25 |
168.75 |
k=2 |
22.50 |
67.50 |
112.50 |
157.50 |
202.50 |
247.50 |
292.50 |
337.50 |
k=3 |
33.75 |
101.25 |
168.75 |
236.25 |
303.75 |
11.25 |
78.75 |
146.25 |
k=4 |
45.00 |
135.00 |
225.00 |
315.00 |
45.00 |
135.00 |
225.00 |
315.00 |
k=5 |
56.25 |
168.75 |
281.25 |
33.75 |
146.25 |
258.75 |
11.25 |
123.75 |
k=6 |
67.50 |
202.50 |
337.50 |
112.50 |
247.50 |
22.50 |
157.50 |
292.50 |
k=7 |
78.75 |
236.25 |
33.75 |
191.25 |
348.75 |
146.25 |
303.75 |
101.25 |
Before going to the details of the value of n, let us try to understand what happens when the value of k changes. Please look at the values at each row. For example, for k=0, there is no change at all, that is the DC component. For k=1,
k=1 |
11.25 |
33.75 |
56.25 |
78.75 |
101.25 |
123.75 |
146.25 |
168.75 |
there is a slow change, that corresponds to the low frequency. For k=7,
k=7 |
78.75 |
236.25 |
33.75 |
191.25 |
348.75 |
146.25 |
303.75 |
101.25 |
there is a rapid change (value jumps from 78.75 to 236.25, then goes to 33.75 ), that corresponds to the high frequency.
We use the first row to calculate the DCT coefficient w_{0} and similarly use the second row to calculate the DCT coefficient w_{1}, and so on so forth. When we calculate DCT coefficients, we have a set of n numbers as an input, and we have n coefficients as an output. For each DCT coefficient, we have a base frequency f and accordingly a base vector. And the value of n changes the set of the base vectors for the DCT transform. In order words, we sample the cosine signal, and where we take the samples on the cosine signal depends the value of n. This may sound hard to understand, so let’s draw the cosine signals (i.e., values) given at each row in the table above. Hopefully, it will be clearer to see how the cosine signal changes with the increasing frequency.
Here is the Matlab code for this experiment. If you like you can try Octave Online:
clear all;
close all;
clc;
n=8;
step = 0.1;
t=0:step:n-1;
f=0:n-1;
cosval = zeros(size(f,2),size(t,2));
angle = zeros(size(f,2),size(t,2));
for fr=1:size(f,2)
for i=1:size(t,2)
frequency = fr-1;
angle(fr,i) = rad2deg(((2*t(i)+1)*frequency*pi)/(2*n));
cosval(fr,i) = cos(deg2rad(angle(fr,i)));
end
end
line='-'
plot(cosval(1,:),['r' line]),hold on;
plot(cosval(2,:),['g' line])
plot(cosval(3,:),['b' line])
plot(cosval(4,:),['m' line])
hleg1 = legend('f_0','f_1', 'f_2', 'f_3')
grid on;
figure,
plot(cosval(5,:),['r' line]),hold on;
plot(cosval(6,:),['g' line])
plot(cosval(7,:),['b' line])
plot(cosval(8,:),['m' line])
hleg1 = legend('f_4','f_5','f_6','f_7');
grid on;
Now, let us look at the DCT more deeply. Remember that we have a set A of 8 values
A = {a_{0}, a_{1}, …, a_{7} }
and accordingly the DCT should produce 8 DCT coefficients:
w = {w_{0}, w_{1}, …, w_{7}}
To be able to generate 8 coefficients, we divide π by 8, and we use 8 cosine base vectors with increasing frequencies:
One by one, we multiply each cosine base vector by the input vector A to compute the DCT coefficients:
So, we kind of take samples of the input signal at different points for each coefficient and compute some kind of mean of the sample. Accordingly, we can compute the coefficient w_{5} by multiplying A = {1, 2, 3, 4, 5, 6, 7, 8} by f^{5}. Here is the Matlab code for this experiment. If you like you can try Octave Online:
clear all;
close all;
clc;
A=1:0.1:8;
n=8;
step = 0.1;
t=0:step:n-1;
f=0:n-1;
cosval = zeros(size(f,2),size(t,2));
angle = zeros(size(f,2),size(t,2));
for fr=1:size(f,2)
for i=1:size(t,2)
frequency = fr-1;
angle(fr,i) = rad2deg(((2*t(i)+1)*frequency*pi)/(2*n));
cosval(fr,i) = cos(deg2rad(angle(fr,i)));
end
end
line='-'
plot(A,['m' line]),hold on;
plot(cosval(4,:),['r' line])
plot(A.*cosval(4,:))
hleg1 = legend('A', 'f_5', 'A*f_5')
grid on;
This is the
end of the tutorial. I hope you enjoyed it. Please feel free to contact me at haberdar AT gmail dot
com. You can use anything here as long
as you give credit as follows:
Hakan Haberdar, "Discrete Cosine Transform
",
Computer Science Tutorials [online], (Accessed MM-DD-YYYY)
Available from: http://www.haberdar.org/discrete-cosine-transform-tutorial.htm