Swipe Mosaic

Image mosaicking is the process of blending together several overlapping photos of a scene or object that cannot be taken in one frame into one relatively seamless mosaic. In this project the distinctive features of the images are first extracted , then matched to find corresponds. These matched points are then used to compute the transformation matrix from one image to another. The transformation matrix is applied to the images to align them, the images are then warped onto a single projection surface.

Data Acquisition

For this project videos of two spaces are recorded using an iPhone 6 camera and converted to image sequence using a software called Adapter. A thing to note is that Images used to create panoramic images must have a considerable amount of overlap of at least 15-30%. The video is captured from the same location but the camera rotates to capture the space from left to right.

Extraction and Matching of Key points

The SIFT algorithm is used to detect salient and distinctive features in two images. A correspondence is then established between the detected features. The code used is adapted from VLFeat [1]. The code finds the distinctive key points in each image, a region is then defined around each key point, the contents of the region are then extracted and normalized. A local descriptor is computed from the normalized region. The descriptors for each image are computed using the function vl_sift. The descriptors of image 1 are compared to that of image 2 to find matching descriptors using the function vl_ubcmatch. The descriptors of image 2 and image 3 are compared as well. Supposing the images being used are more than 3 and the order is not known, the descriptors will be matched between all images. The number of matching descriptors between the images will determine which image is later stitched next to each other. The vl_ubcmatch function returns a matrix 2 by N. N being the number of matches found. The matrix contains the index of a point in image 1, which matches a point in image 2.

Homography

Now that we have matching points, the transformation matrix known as homography can be computed between each pair of overlapping images. The homography matrix is a 3 by 3 matrix that transforms:

The homography is computed using four matching points, let’s say (x1, y1), (x2,y2), (x3, y3) and (x4, y4) in image 1 correspond to (u1, v1), (u2, v2), (u3, v3) and (u4, v4) in image 2. Using these 16 points another matrix A is formed.

𝑨𝜙 = 0

Phi is what we are after. To avoid getting 0 as phi, we compute the first matrix (A). The singular value decomposition of A is acquired, which gives:

SVD (A) = [U S V]

The last column of V is used as the homography matrix (H)

RANSAC Homography

The SIFT algorithm will incorrectly match some points, these points are known as outliers. To make sure that the best homography is computed, the RANSAC algorithm is used which eliminates incorrect matches. RANSAC is essentially a sampling approach to estimating homography.

The RANSAC algorithm implemented computes the best homography by iterating 100 times. In each iteration, H is computed using four randomly selected matched points. The homography with the maximum number of inliers is selected. This is done by projecting points from (x, y) to (x’, y’) for each potentially matching pair. The homography is applied to the points and the Euclidean distance is computed, the homography with the shortest distance is considered the best homography. The reference frame is set to image 2 therefore the homographies computed are H_21 and H_23. H_21, which maps points in image 1 to points in image 2. H_32 maps points in image 3 to points in image 2. Suppose I had more than 3 images, it would be easier to set the reference frame to image 1 and align all other images to 1. If that were the case then the homography between image 1 and 3 would be H_31, which would be computed by first getting the homography H_21 then H_32 and the product of H_21 and H_32 gives you the homography H_31 which can be applied to the image.

Stitching

Two images may not have perfectly matching pixels at the regions where they overlap because of this image blending is done to smooth the transition from one image to another, so that the boundary line is not visible. The incoming image corners are warped to determine the size of the output image. The maximum extent of each image after warping is computed. All other images are warped to image 2’s viewpoint (reference image). The bounding box is the size of the new panorama image. The code adapted from [2] computes the bounding box. The maximum and minimum coordinate is found for each warped image and using those we find the overall minimum row out of all the three as the minimum row, the overall maximum row as the maximum row, the overall minimum and maximum columns as the minimum and maximum columns of the new panorama image. The homography of image 2 is set to an identity matrix of size 3 by 3. Linear inverse mapping is used to map each pixel in the output image into the planes defined by the source images [3].

Results

References

VLFeat (2015). VLFeat.Org. Available at: http://www.vlfeat.org/overview/sift.html

Dostitch(2009). Available at: http://phvu.net/2013/04/10/image-stitch-matlab/

RANSAC (2011). Available at: http://www.mathworks.com/matlabcentral/fileexchange/30809-ransacalgorithm-with-example-of-finding-homography/content/ransac.m

Edible Art

Haptic Feedback Device