Team C:
R09922028 施承志
R09922071 劉俊緯
R09922091 張庭與
R09944021 鍾起鳴
JPEG codec divides image into 8x8 blocks for encoding. However, real-world images often constains large blocks with few details. We would like to design a new variant of JPEC codec that divides image based on its local detail.
Canny sometimes fails to detect smoother edges such as the pillar on the left of the first sample. However, this allows us to not mark the moderately less detailed blocks for further merging.
Sample | Detected Edge | Initial Block Division |
---|---|---|
- Red stripes: 8x8 blocks with edge
- Blue squares: 8x8 blocks without any edge
Contiguous mergeable 8x8 blocks are merged into larger blocks (16x16, 32x32 or 64x64) if allowed.
- Legend:
- Red stripes: 8x8 blocks that are not merged
- Orange squares: 16x16 merged blocks
- Blue squares: 32x32 merged blocks
- Green squares 64x64 merged blocks
- 4 different block size are represented with 2 bits
- Blocks are represented with a bitstream, in a top-to-bottom, left-to-right manner.
- Already-coded blocks are skipped for cleaness and unambiguity.
- Encoded bitstream:
00 10 01 00 00 00 00 00 00 00 01 00 01 10 00 ...
DCT for blocks with different size is trivial as the argument of DCT algorithm is modifiable.
Standard luminance table | Standard chrominance table |
---|---|
|
|
Quantisation tables for larger blocks are upsampled from the standard tables.
upscaled_table = ((std_table * qualityScale + 50) / 100).clip(1,255)
quantized_value = DCT(block) / upscaled_table
The range of DC value from larger blocks are greater than that from smaller blocks. Hence, nomalisation of DC values from blocks with different sizes is performed to keep the values in the same scale.
3.2. Zigzagging
Line-based encoding introduces the potential 'great hop' of DC value between lines and reduces the efficiency of dDC table. We introduce Zigzagging with the hope to reduce 'great hop' from happening.
Zigzagging index tables for the 4 different block sizes are pre-generated and coded in.
The intermediate bitstream format in our codec is very similiar with that in the JPEG codec.
BlockSizes(Y) | DC(Y) | AC(Y) |
BlockSizes(Cb) | DC(Cb) | AC(Cb) |
BlockSizes(Cr) | DC(Cr) | AC(Cr) |
- DCT, DQT, zigzag
- Block sizes
- Written in the format mentioned in section 1.3.
- DC
- Kept for the first block, and is encoded with DPCM (Differential Pulse Coded Modulation) for the rest of the blocks.
- AC
- Scanned with the zigzag pattern, as mentioned in section 2.2.
- Values:
0xXY
: Next X values are 0s, and the value after the 0s are Y bits long.0x00
: All remaining values are 0s0xF0
: Next 16 values are 0s
- Read blocks sizes
- Reconstructed from in a top-to-bottom, left-to-right manner (using a bitmap to check whether the block is placed)
- Skipping already-recognized spaces
- Read DC
- Read bitstream until it matches one of the value of DC table
- Then, read number of bit according to the index of matched code
- Use 1's complement representation to reconstruct the origin DC or DPCM value
- Read AC
- Read bitstream until it matches one of the value of AC table
- According to the code, we can get the bytes representing it
- We parse the byte in this manner
- For the upper 4 bits, we can reconstruct number of zeros of AC values
- For the lower 4 bits, we will read number of bits according to it, and reconstruct the origin AC value
- Inverse zigzag, IDQT, then IDCT
GIMP JPEG spec:
Quality 90,
Subsample: 4:4:4
DCT: Integer
Optimized, Progressive
- Comparison of different methods (Variation with quality 80)
File Name | Original PPM (Bytes) |
Variation-8x8 (Bytes) |
Variation-Adaptive (Bytes) |
GIMP JPEG (Bytes) |
---|---|---|---|---|
balls.ppm | 10880560 (100%) | 200338 (1.8%) Image link |
260329 (2.4%) Image link |
260170 (2.4%) Image link |
test_1.ppm | 90426514 (100%) | 1814081 (2.0%) Image link |
1095085 (1.2%) | 1708280 (1.9%) |
test_2.ppm | 96039186 (100%) | 2377799 (2.5%) Image link |
2140897 (2.2%) | 3022289 (3.1%) |
test_3.ppm | 89758153 (100%) | 1569522 (1.7%) Image link |
752873 (0.8%) | 1215101 (1.3%) |
The compress version of all test cases with 8x8 blocks and quality 80 does not deviate from the original ones by over 1 in every RGB channel in every pixel.
In the case of balls.ppm
, the adaptive compressed image is larger than the 8x8 compressed one. On the other hand, all other test cases shows a decline in file size from the 8x8 compression to the adaptive one.
Original ppm | Variation-8x8 |
---|---|
Variation-Adaptive | GIMP JPEG |
- Comparison of different quality using 8x8 (quantization table coefficients)
| File Name | original | 100 | 80 | 60 | | ---------- | -------- | ------------ | ------------- | ------------------ | --------- | | balls.ppm | 10880560 (100%) | 1071485 (9.8%) | 200338 (1.8%) | 161873 (1.5%) | | test_1.ppm | 90426514 (100%) | 6145913 (6.8%) | 1814081 (2.0%) | 1490622 (1.6%) | | test_2.ppm | 96039186 (100%) | 10951132 (11.4%) | 2377799 (2.5%) | 1769368 (1.8%) | | test_3.ppm | 89758153 (100%) | 4672494 (5.2%) | 1569522 (1.7%) | 1256616 (1.4%) |
We support different quality of quantization table. We can see the table list above that the data compression rate increase as we lower the quality. Despite of the fact that we lower the quality, it is hard to discriminate against different quality of images as we show below.
Original Image | Quality 100 |
---|---|
Quality 80 | Quality 60 |
The compress version of all test cases with 8x8 blocks and quality 80 does not deviate from the original ones by over 1 in every RGB channel in every pixel, which we consider as a success.
The adaptive compression suffers from severe blocking effect and unnatural artifacts. After inspecting the compressed image, we found that our original thoughts about Canny algorithm is not good enough and that the larger blocks ruins perceived image quality very badly
However, we still learn from the process and become more familiar with the algorithms and maths used in JPEG.
- Class handouts