CS180 Project 1

Introduction

In this project, we are required to implement an image alignment algorithm that colorizing the glass plate images from Prokudin-Gorskii photo collection by aligning the three channels of each image.

Part 1: Simple implementation on smaller `.jpg` images

In this part, my implementation is simple: search over the displacement window of \([-15, 15]\) pixels to find the best displacement for red and green channels that minimize the errors with the blue channel. Below are my results, each with the displacements of red and green channels:

Part1 image 1 — Cathedral
R: (12, 3) G: (5, 2)
Runtime: 0.63s

Part1 image 2 — Monastery
R: (3, 2) G: (-3, 2)
Runtime: 0.57s

Part1 image 3 — Tobolsk
R: (6, 3) G: (3, 3)
Runtime: 1.67s

It is worth noticing that I tried calculating the errors with MSE (Mean Squared Error, \(\frac{1}{hw}\sum_{i=1}^h\sum_{j=1}^w (\mathbf{X}_{ij} - \mathbf{Y}_{ij})^2\)) and NCC (Normalized Cross-Correlation, \(\frac{\mathbf{x}^T\mathbf{y}}{\Vert\mathbf{x}\Vert_2\Vert\mathbf{y}\Vert_2}\)). While both of them turned out to work well in terms of visual effects, the runtime of NCC was larger than that of MSE, with a runtime of 1.11s on cathedral.jpg compared to 0.63s, so I used MSE for all of the images to enhance speed. Also, it is important to crop the image before processing it, as the black edges would influence error calculation and the aligning process would be less accurate. I cropped 5% of the height and 10% of the width of each image before processing.

Part 2: Coarse-to-fine pyramid speedup on large `.tif` images

Next, I applied the algorithms to the larger images. To reduce runtime, I downsampled each image at 4 levels, each averaging the pixels in 2x2 blocks. At level \(k\), the optimal shift \((i_k, j_k)\) is found within a search window \(([-windowH_k, windowH_k], [-windowW_k, windowW_k])\), then the shift \((i_{k+1}, j_{k+1})\) at the next level with a finer image is found within a search window \(([-windowH_{k+1} + 2i_{k+1}, windowH_{k+1} + 2i_{k+1}], [-windowW_{k+1} + 2j_{k+1}, windowW_{k+1} + 2j_{k+1}])\). The window sizes at each level (from coarse to fine) were set to [8, 6, 4, 2], [16, 8, 4, 2], and [18, 9, 6, 3]. Here are the results:

Part2 image 1 — Church
R: (58, -4) G: (25, 3)
Runtime: 10.26s

Part2 image 2 — Emir
R: (107, 40) G: (49, 23)
Runtime: 16.21s

Part2 image 3 — Harvesters
R: (124, 13) G: (61, 16)
Runtime: 13.09s

Icon
R: (90, 23) G: (41, 17)
Runtime: 13.44s

Part2 image 5 — Italil
R: (77, 35) G: (38, 21)
Runtime: 13.40s

Part2 image 6 — Lastochikino
R: (76, -9) G: (-3, -2)
Runtime: 13.30s

Part2 image 7 — Lugano
R: (93, -29) G: (41, -17)
Runtime: 13.36s

Part2 image 8 — Melons
R: (179, 12) G: (85, 9)
Runtime: 22.56s

Part2 image 9 — Self Portrait
R: (176, 36) G: (81, 29)
Runtime: 23.20s

Part2 image 10 — Siren
R: (97, -25) G: (50, -7)
Runtime: 13.92s

Part2 image 11 — Three Generations
R: (112, 10) G: (55, 13)
Runtime: 13.22s

Extra self-selected images:

Part2 image 12 — Kivach
R: (126, 19) G: (37, 11)
Runtime: 14.02s

Part2 image 13 — Isfandiyar
R: (105, 1) G: (41, 5)
Runtime: 13.45s

Part2 image 14 — Religious Painting
R: (52, 38) G: (35, 24)
Runtime: 13.18s

Part 3: Bells and whistles

During the implementation on emir.tif, it was not as easy as I expected, as the outcome image is still awful after a few refinements on my algorithm. Therefore, I decided to use the gradient calculated by Sobel operator as the alignment clue:

Part3 image 1.1 — Without using gradient

Meanwhile, the colors of some of the images were not as realistic as expected, being either too blueish or too yellowish. I attempted to adjust the white balance of the images by matching each channel's mean to that of an assigned channel:

Part3 image 2.1 — Church, before adjust WB

Part3 image 2.2 — Church, after adjusting WB

Part3 image 2.3 — Lastochikino, before adjust WB

Part3 image 2.4 — Lastochikino, after adjusting WB

Summary

This project is not very hard, as the way to build the algorithm is very clear. However, it is a rather interesting one, though I have already done some other image processing projects before. I believe the projects of this course would be more and more interesting later on.

Introduction

Part 1: Simple implementation on smaller .jpg images

Part 2: Coarse-to-fine pyramid speedup on large .tif images

Part 3: Bells and whistles

Summary

Part 1: Simple implementation on smaller `.jpg` images

Part 2: Coarse-to-fine pyramid speedup on large `.tif` images