Introduction & Background
One of the fundamental limitations of a camera is it’s inability to capture the world in the same exquisite manner that a human mind can achieve. For decades, we have been chipping away at these limitations, attempting to address high dynamic range imaging, perceptual invariance to illumination changes, and a host of other problems that our minds seem able to process so effortlessly. One of these avenues seeks to mitigate the constraints imposed by a camera’s inherently limited field of view, so that we might be able to represent a scene that is closer to what we’re capable of perceiving at any given time. Or, in fact, so we may represent even more than what the human observer is capable of perceiving at any given time. Hence, we seek an algorithm that will allow us to perform panoramic image stitching.
I originally set out to implement an automatic version of panorama stitching, which would allow the user to input a series of related images, then return a coherent, blended image of the subject with an extended field of view. Ideally, I hoped that my algorithm would not require any user input aside from providing the images themselves. Unfortunately, there are still quite a few parameters that seem to be in need of attention, so that users will be able to get the best result — such as the neighborhood size they ought to consider for a given pixel, or the values at which they should set any of several necessary thresholds. These produce drastically different results from image to image, so for now it seems necessary that the user remain in charge of them.
I was also unable to achieve the entirety of what I set out to do in my proposal, as I underestimated the subtlety of some of the issues I ran into — one such bug, which took a few days to fully resolve, was a memory problem which was causing non-deterministic behavior to arise from an otherwise deterministic program. However, I was able to successfully implement my own version of several important numerical and vision related algorithms, which brought me very near to the goal I had originally set out to achieve.
Panorama stitching is a very well-studied problem, and as such I ended up learning about and implementing many well-known Computer Vision and Machine Learning techniques in my project. The main workflow can be observed in the following pseudocode:
// PANORAMA STITCHING // PSEUDO CODE identify the central image of the panorama detect feature points & descriptors for central image for each peripheral image im_i detect feature points & descriptors identify correspondences (im_i and central) compute homography using RANSAC mosaic the images, applying the homographies to warp them all toward the central image. return the final stitched panorama
Construction & Testing: C++
To create my panorama stitcher, I essentially implemented the elements of the above algorithm in reverse order — each piece removed another burden from the user, and tried to push the codebase toward my goal of a fully automated system. My process, along with a collection of intermediate results, is explained below.
After familiarizing myself with the Eigen library, I began by simply creating a method by which you could apply a known homography to an image. I compared my results to ground truth results I computed in previous courses [CS83 top, CS70 bottom] for the following images:
This required an inverse transformation (i.e., looping over the pixels of the output image and applying the inverse homography to see where each value should come from, instead of looping over the pixels of the input image to see where each value ought to go, which we could do by applying the homography directly). This ensures that the output image will be smooth, with no gaps.
Then, I began by ensuring that I would be able to compute a homography matrix based on given feature correspondences. Again, I obtained my ground truth comparison from exercises implemented in CS70 and CS83, respectively. I was able to estimate it to decent accuracy, as the images and numerical evidence below will suggest:
The overlay pictured above was achieved by a rough weighted blending of the warped image and the target image, where the weight was 0.5 for both images in places where their content overlapped, and 1 for each image in pixel locations where only a single image had content.
Next, I wanted to relax the need for the user to specify the correct correspondences, so I wrote functions which only required the user to supply the feature points themselves, but not necessarily the correspondences. First, I extract information about each point’s vectorized neighborhood from the source image. I used a very simple (albeit very unstable) method for detecting these “feature descriptors,” which was simply to consider the intensity values of the neighboring pixels. Then, I used a method to compute candidate correspondences between these feature points, which thresholded the Euclidean distance between each potential pair with a user-specified value.
Now, with this set of noisy correspondences, I implemented an algorithm called RANSAC, or Random Sample Consensus, to compute the most likely homography between the two images. Then, following the same steps as before, I applied the homography to arrive at the following result:
After computing a reasonable result with my automatically-established correspondences, I sought to further automate the process by computing the feature points as well. I implemented the Harris Corner detection method, which is based on the eigenvalue analysis of the second moment matrix for each pixel. First, I computed the gradient of the images, and obtained the following intermediate results:
My algorithm identified the following feature points in each image:
Next, I implemented non-maximum suppression, which selectively trims the collection of candidate feature points by only considering those which have a maximal corner response. (i.e., those pixels whose Harris detector value is a local maximum). Below is a visualization of the corners it described (highlighted in red):
And finally, for this section, I’ve generated some visualizations of the feature correspondences that I was able to generate:
These correspondences were all made in preparation for stitching the 5 images together based on their common relationship to the center image (featured on the right in each of these images).
Here is another example of successfully chosen correspondences — it’s intuitive to see that these correspondences make sense, because they all exhibit a similar direction. That is, the global motion appears coherent enough that it fits our expectations for the camera displacement between these two scenes.
There is certainly room for improvement in my correspondence detection (as is most clearly shown by the middle bedroom figure above). This issue could be mitigated slightly by spending time adjusting various parameters for each individual case. However, this poor performance is largely due to the fact that I am using such a simple, sensitive feature descriptor — namely, raw intensity data. For such a simple descriptor, this algorithm actually performs remarkably well!
Now, looking at the psuedocode above, it seems clear that we have implemented all of the things that were needed for the panorama stitching algorithm.
Unfortunately, despite all this positive evidence for success, when I incorporated this automatic feature detection into my panorama stitching algorithm, I was met with some rather undesirable (albeit interesting) results:
The reasons for this strange behavior is still unbeknownst to me, since all of the components seem to be working properly, but I intend to keep searching for the errors that are present in my code, because I would love to see this as a functional product. I have also outlined some other directions where I intend to take this project in the future, for my own enjoyment.
In addition to identifying and fixing the existing bugs in my code, I would like to make this system fully automatic, to the point where you could simply feed in a stream of potentially unrelated images, and the system would:
- Detect which images are related: segment the entire image stack into sub-groups based on relation
- Establish how those images are related: compute the traditional feature-based panorama-stitching pipeline as I have done here
- Create a coherent, blended panorama from each subgroup of related images
This would ideally include the implementation of a more powerful feature descriptor, such as SIFT.
I would also like to experiment with OpenGL to use some of these vision techniques to obtain 3D structure from a similarly related series of images.
Authors, Sources, and Contributors
The base code for this assignment included code that was sourced from our assignments throughout the term (credits provided in each individual file), as well as a number of other files that were specifically created for and related this Panoramic Stitching project. I used the following sources to assist me with concepts, implementation and testing along the way:
-CS 83/183: Computer Vision [Personal Notes and Assignments; Dartmouth College, Professor Lorenzo Torresani, Fall 2015)]
-CS 70/170: Numerical and Computational Tools for Applied Science [Personal Notes and Assignments; Dartmouth College, Professor Hany Farid, Spring 2015)]
-Robust Panoramic Image Stitching [Chau & Karol; http://cvgl.stanford.edu/teaching/cs231a_winter1415/prev/projects/Chau_Karol_CS231a_Final_Paper.pdf%5D
-MIT Lecture Notes (Lectures 10-13) [MIT, Professor Fredo Durand, Spring 2015; http://stellar.mit.edu/S/course/6/sp15/6.815/materials.html?showKind=lectureNotes%5D
With the exception of the basecode given in class, and the Eigen library for c++, all code was written by Liane Makatura. This project was created as a Final Project for Computer Science 89: Computational Photography, taught by Professor Wojciech Jarosz (Dartmouth College, Fall 2015).