Cython / Python wrapper for VLFeat library

This blog post is about my work done during the GSoC coding period (May 23 – Aug 15). This blog post also serves as a tutorial for extended features of cyvlfeat.

This project is an extension of an existing project named cyvlfeat.

Brief introduction: We intend for this to be a light wrapper around the VLFeat toolbox. cyvlfeat will provide a mixture of pure Python and Cython code that looks to replicate the existing MATLAB toolbox. The role of Cython in this regard is to re-implement MATLAB mexfiles.

The VLFeat open source library implements popular computer vision algorithms specializing in image understanding as well as local features extraction and matching. Algorithms include Fisher Vector encodings, VLAD, SIFT, MSER, k-means, hierarchical k-means, agglomerative information bottleneck, SLIC superpixels, quickshift superpixels, large scale SVM training, and many others. It is written in C for efficiency and compatibility, with interfaces in MATLAB for ease of use. The corresponding documentation is detailed throughout both parts. It supports Windows, Mac OS X, and Linux. The latest version of VLFeat is 0.9.20.

GSOC mentor: Simon Niklaus

Other mentors: Patrick SnapeMikael Rousson

Current contributors: Patrick Snape, James BoothBenoit SeguinAlexis Mignon, Simmi Mourya.

This project wouldn’t be possible without my mentors Simon, Patrick and Mikael. All of them have been really supportive to me. A big thanks to all of you for your encouragement and support. 🙂

Simon, you are a wonderful mentor, friend and everything one could look for in a good mentor. Thank you for your words of encouragement and support. I will remain grateful. You have been an inspiration. I hope we can work together again soon.

I will continue working on cyvlfeat after GSOC as well.

When I started the project, the following methods from VLFeat were exposed:

  • fisher
    • fisher
  • generic
    • set_simd_enabled, get_simd_enabled, cpu_has_avx, cpu_has_sse3, cpu_has_sse2
    • get_num_cpus,
    • get_max_threads, set_num_threads, get_thread_limit
  • hog
    • hog
  • kmeans
    • kmeans
    • kmeans_quantize
    • ikmeans, ikmeans_push
    • hikmeans, hikmeans_push
  • sift
    • dsift
    • sift

I exposed the following methods of VLFeat library during summer’16. [PR links included] to library. They will subsequently be outlined one by one.

  1. Phow
  2. Plotsiftdescriptor
  3. Siftdescriptor
  4. VLAD
  5. SLIC
  6. LBP
  7. quickshift
  8. Flatmap
  9. Binsum (Work in progress)
  10. Quickseg
  11. Imseg
  12. Kdtreebuid
  13. Random
  14. References

1. Phow: Extract PHOW features. (Equivalent of vl_phow in VLFeat’s MATLAB Toolbox.)

The PHOW features [1] are a variant of dense SIFT descriptors, extracted at multiple scales. A combination of dsift and scipy.ndimage.filters.gaussian_filter can be used to easily and efficiently compute such features. By default, phow computes the gray-scale variant of the descriptor. The color option can be used to compute the color variant instead.

Unlike the MATLAB wrapper of VLFeat, the image is pre-smoothed at the desired scale level by a Gaussian filter provided by SciPy. cyvlfeat includes a simple wrapper, phow, that does exactly this:

Example usage:

>>> import scipy.ndimage
>>> import matplotlib.colors
>>> import numpy as np
>>> from cyvlfeat.sift import phow
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> frames, descriptors = phow(img, verbose=True)

Please consult the pull request for further details.

2. Plotsiftdescriptor: Plots sift descriptors. (Equivalent of vl_plotsiftdescriptor in VLFeat’s MATLAB Toolbox.)

plotsiftdescriptor(d) plots the SIFT descriptor d. If d is a matrix, it plots one descriptor per column. d has the same format used by sift().

plotsiftdescriptor(d, f) plots the SIFT descriptors warped to
the SIFT frames f specified as columns of the matrix f . f has the same format used by sift().

The function assumes that the SIFT descriptors use the standard
configuration of 4×4 spatial bins and 8 orientations bins. The
parameters num_spatial_bins and num_orientation_bins respectively can be used to change this.

Example usage:

 >>> import numpy as np
 >>> from cyvlfeat.sift import plotsiftdescriptor
 >>> from cyvlfeat.test_util import lena
 >>> img = lena().astype(np.float32)
 >>> result = sift(img, compute_descriptor=True)
 >>> F = result[0]
 >>> D = result[1]
 >>> plotsiftdescriptor(D, F)
 >>> #or
 >>> plotsiftdescriptor(D) # Default f will be used in this case.

Demo plots:

1. plotsiftdescriptor(d)

5883ed0a-3f96-11e6-8d45-c946c901c53b

2. plotsiftdescriptor(d, f)

595dc3b8-3f96-11e6-9b19-e550070cecb4

Please consult the pull request for further details.

3. Siftdescriptor: Plots sift descriptors. (Equivalent of vl_siftdescriptor in VLFeat’s MATLAB Toolbox.)

Calculates the SIFT descriptors of the keypoints frames on the
pre-processed image gradient_image.

In order to match the standard SIFT descriptor, the gradient should be
calculated after mapping the image to the keypoint scale. This is obtained
by smoothing the image by a a Gaussian kernel of variance equal to the scale
of the keypoint. Additionally, SIFT assumes that the input image is
pre-smoothed at scale 0.5 (this roughly compensates for the effect of the
CCD integrator), so the amount of smoothing that needs to be applied is
slightly less.

Example usage:

 >>> import scipy.ndimage
 >>> import numpy as np
 >>> from cyvlfeat.sift import siftdescriptor
 >>> from cyvlfeat.test_util import lena
 >>> img = lena().astype(np.float32)
 >>> # Create a single frame in the center of the image
 >>> frames = np.array([[256, 256, 1.0, np.pi / 2]])
 >>> sigma = np.sqrt(frames[0, 2] ** 2 - 0.25) # 0.25 = 0.5^2
 >>> img_smooth = scipy.ndimage.filters.gaussian_filter(img, sigma)
 >>> y, x = np.gradient(img_smooth)
 >>> mod = np.sqrt(x * x + y * y)
 >>> ang = np.arctan2(y, x)
 >>> gradient_image = np.stack((mod, ang), axis=0)
 >>>
 >>> d = siftdescriptor(gradient_image, frames, verbose=True)

Returns:

descriptors : (F, 128),uint8 or float32 ndarray
 F is the number of keypoints (frames) used. The 128 length vectors
 per keypoint extracted. uint8 by default.

Please consult the pull request for further details.

 4. VLADVector of Linearly Aggregated Descriptors. (Equivalent of vl_vlad in VLFeat’s MATLAB Toolbox.)

The Vector of Linearly Agregated Descriptors is similar to Fisher vectors but

(i) it does not store second-order information about the features and

(ii) it typically use KMeans instead of GMMs to generate the feature vocabulary (although the latter is also an option).

The VLAD encoding of a set of features is obtained by using the function vl_vlad_encode internally in Cython.

vl_vlad_encode requires a visual dictionary, for example obtained by using K-means clustering.

In the following example code, the vocabulary is first created using the KMeans clustering, then the points, that are to be encoded are assigned to its corresponding nearest vocabulary words, after that the original vlad encoding routine without any normalization option takes place. At the end of the process the encoding is stored in the enc variable.

Example usage:
>>> from cyvlfeat.vlad.vlad import vlad
>>> import numpy as np
>>> N = 1000
>>> K = 512
>>> D = 128
>>> x = np.random.uniform(size=(D, N)).astype(np.float32)
>>> means = np.random.uniform(size=(D, K)).astype(np.float32)
>>> assignments = np.random.uniform(size=(K, N)).astype(np.float32)
>>> enc = vlad(x, means, assignments)
Returns:
enc : [k, ] float32, ndarray
A vector of size equal to the product of 
k = the n_data_dimensions * n_clusters
Please consult the pull request for further details.

5. SLIC: Extracts the SLIC superpixels from the image. (Equivalent of vl_slic in VLFeat’s MATLAB Toolbox.)

SLIC [2] is a simple and efficient method to decompose an image in visually homogeneous regions. It is based on a spatially localized version of k-means clustering. Similar to mean shift or quickshift (quickshift.h), each pixel is associated to a feature vector

form_354

and then k-means clustering is run on those. As discussed below, the coefficient λ balances the spatial and appearance components of the feature vectors, imposing a degree of spatial regularization to the extracted regions.

SLIC takes two parameters: the nominal size of the regions (superpixels) regionSize and the strength of the spatial regularization regularizer. The image is first divided into a grid with step regionSize. The center of each grid tile is then used to initialize a corresponding k-means (up to a small shift to avoid image edges). Finally, the k-means centers and clusters are refined by using the Lloyd algorithm, yielding segmenting the image. As a further restriction and simplification, during the k-means iterations each pixel can be assigned to only the 22 centers corresponding to grid tiles adjacent to the pixel.

Example usage:

>>> from cyvlfeat.slic.slic import slic
>>> import numpy as np
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> segment = slic(img, region_size=10, regularizer=10)

Returns:

segments: float32, ndarray
contains the superpixel identifier for each image pixel.

Please consult the pull request for further details.

6. LBP: Local Binary Patterns. (Equivalent of vl_lbp in VLFeat’s MATLAB Toolbox.)

A LBP is a string of bit obtained by binarizing a local neighborhood of pixels with respect to the brightness of the central pixel. VlLbp implements only the case of 3×3 pixel neighborhoods (this setting perform best in applications). Thus the LBP at location (x, y) is a string of eight bits. Each bit is equal to one if the corresponding pixel is brighter than the central one. Pixels are scanned starting from the one to the right in anti-clockwise sense.

Computes the Local Binary Pattern (LBP) features for image where image is divided in cells of size cell_size.

Example usage:

>>> from cyvlfeat.misc.lbp import lbp
>>> import numpy as np
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> result = lbp(img, cell_size=200)

Returns:

histograms: (h, w, 58), float32 ndarray
 h = FLOOR(height/cell_size)
 w FLOOR(width/cell_size)
 histograms is a three-dimensional array containing one histograms of 
 quantized LBP features per cell. The width of histograms is 
 FLOOR(width/cell_size), where width is the width of the image. 
 The same for the height.The third dimension is 58.

Please consult the pull request for further details.

7. quickshift: quickshift image segmentation (Equivalent of vl_quickshift in VLFeat’s MATLAB Toolbox.)

quickshift is a mode seeking algorithm (like mean shift) which instead of iteratively shifting each point towards a local mean instead forms a tree of links to the nearest neighbor which increases the density.

quickshift arranges all of the data points into a tree where parents in the tree are the nearest neighbors in the feature space which increase the estimate of the density. By imposing a limit on the distance between nearest neighbors (maxdist), we decrease the amount of computation required to search for the nearest neighbors. However, we also break our tree into a forest, because local modes of the density will now have no neighbor which is close enough in the feature space to form a link.

If you pass medoid = True it will run Medoidshift on image. By default it runs quickshift.

Example usage:

>>> import numpy as np
>>> from cyvlfeat.quickshift.quickshift import quickshift
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)

>>> maps, gaps, estimate = quickshift(img,kernel_size=2,max_dist=10)
>>> # medoid shift
>>> maps, gaps, estimate = quickshift(img,kernel_size=2,max_dist=10,medoid=True)

Returns:

 map : [H, W] float64 ndarray. Array of the same size of I.
 Each element (pixel) of map is an index to the parent
 element in the forest.

 gaps: [H, W] float64 ndarray. Array of the same size of I.
 gaps contains the corresponding branch length.
 Pixels which are at the root of their respective
 tree have MAP(x) = x and GAPS(x) = inf.

 estimate: [H, W] float64 ndarray, optional
 The estimate of the density. Only returned if
 max_dist is not None.

Please consult the pull request for further details.

8. Flatmap: Flatten a tree, assigning the label of the root to each node (Equivalent of vl_flatmap in VLFeat’s MATLAB Toolbox.)

Since flatmap depends on quickshift and quickshift hasn’t been merged in master yet, I developed these features on a single branch.

flatmap(map) labels each tree of the forest contained in map.

Example usage:

>>> import numpy as np
>>> from cyvlfeat.quickshift.quickshift import quickshift
>>> from cyvlfeat.quickshift.flatmap import flatmap
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)

>>> maps, gaps, estimate = quickshift(img,kernel_size=2,max_dist=10)
>>> labels, clusters = flatmap(maps)

Returns:

labels : contains the linear index of the root node in `map`.
clusters : contains a label between 1 and the number of clusters.

Please consult the pull request for further details.

9. Binsum: Binned summation. (Equivalent of vl_binsum in VLFeat’s MATLAB Toolbox.)

Work in progress. I am yet to find a trick to work with .def files in Cython. Here’s the question which I asked on Stack Overflow regarding this very case.

10. Quickseg: Produce a quickshift segmentation of a grey-scale image. (Equivalent of vl_quickseg in VLFeat’s MATLAB Toolbox.)

Returns:

i_seg : A color image where each pixel is labeled by the mean color
in its region.
 
labels : [H, W] float64 ndarray.
Array of the same size of image.
A labeled image where the number corresponds to the cluster identity.
 
maps : [H, W] float64 ndarray.
Array of the same size of image.
maps as returned by quickshift : For each pixel, the pointer to the
nearest pixel which increases the estimate of the density.
 
gaps : [H, W] float64 ndarray.
Array of the same size of image.
gaps as returned by quickshift : For each pixel, the distance to
the nearest pixel which increases the estimate of the density.
 
estimate : [H, W] float64 ndarray.
Array of the same size of image.
estimate as returned by quickshift: The estimate of the density.

Quickseg depends on binsum whose development is under progress.

Work in progress (Quickseg).

11. Imseg: Color an image based on the segmentation (Equivalent of vl_imseg in VLFeat’s MATLAB Toolbox.)

Color an image based on the segmentation.
Labels output with the average color from image of each cluster indicated by labels.

Imseg depends on binsum whose development is under progress.

Work in progress (Imseg).

12. Kdtreebuild: Build randomized kd-tree. (Equivalent of vl_kdtreebuild in VLFeat’s MATLAB Toolbox.) [Tutorial– vl_kdtreebuild]

VLFeat implements the randomized kd-tree forest from FLANN. This enables fast medium and large scale nearest neighbor queries among high dimensional data points (such as those produced by SIFT).

A kd-tree is a data structure used to quickly solve nearest-neighbor queries. A kd-tree is a hierarchical structure built by partitioning the data recursively along the dimension of maximum variance. At each iteration the variance of each column is computed and the data is split into two parts on the column with maximum variance.

vl_kdtreebuild in action:

forest = vl_kdtreebuild(X)

returns a structure forest containing the kd-tree indexing the data X. X is a M x N dimensional matrix of class DOUBLE or SINGLE with one data point per column.

Work in progress(Kdtreebuild)

13. Random: Random number generator (VLFeat C API)

The generator is based on the popular Mersenne Twister algorithm [3] In cyvlfeat, a random number generator is implemented by an object of type VlRand.

The generator can be used to obtain random quantities of various types:

Although this feature is currently untested. I will test it along with kdtreebuild.

Work in progress(Random).

14. References:

[1] A. Bosch, A. Zisserman, and X. Munoz. Image classifcation using random forests and ferns. In Proc. ICCV, 2007.

[2] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. SLIC superpixels. Technical report, EPFL, 2010.

[3] M. Matsumoto and T. Nishimura. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1), 1998.

4. vlfeat.org

5. vision.princeton.edu [Vlfeat tutorials]

Advertisements

Today

Hi there! 🙂

This blog post took way more time than expected. Well there are 2 reasons for that:

  1. Caught flu. :/
  2. kdtree is actually more challenging than I thought.

So, I am posting a doubt today:

I couldn’t find a way to define this in Cython: In vl/kdree.h -> Line 20, 21

#define VL_KDTREE_SPLIT_HEAP_SIZE 5
#define VL_KDTREE_VARIANCE_EST_NUM_SAMPLES 1024

Quite similar question: http://stackoverflow.com/questions/30805744/how-to-use-define-as-size-of-char-array-in-cython

If I define it like this in Cython:

cdef int VL_KDTREE_SPLIT_HEAP_SIZE "VL_KDTREE_SPLIT_HEAP_SIZE"
cdef int VL_KDTREE_VARIANCE_EST_NUM_SAMPLES "VL_KDTREE_VARIANCE_EST_NUM_SAMPLES"

I get the following error:

Not allowed in a constant expression.

It actually gives error on line 104 in vl/kdtree.h:

VlKDTreeSplitDimension splitHeapArray [VL_KDTREE_SPLIT_HEAP_SIZE] ;

I wrote it like this in kdtree.pxd:

VlKDTreeSplitDimension splitHeapArray [VL_KDTREE_SPLIT_HEAP_SIZE]

I guess cython requires the numbers 5, 1024. Because when I write it like this

VlKDTreeSplitDimension splitHeapArray [5]

or in fact any arbitrary number in place of 5, code compiles without any error.

Apart from that, I have ported random and kdtreebuild.

Debugging and finishing up. 🙂

Thanks for reading! Comment below if you know the answer.

Today

Hi there! 🙂
I am working on vl_kdtreebuild. Turns out kdtree.h needs VlRand which is a part of random.h

So, I am porting this part right now. Half of it is finished. I’ll get back to kdtree once it works.

 

 

Today

Hi there 😀

Updates:

Last time when I completed quickshift, actually there was a problem with the PR.
Travis CI gives this error: https://travis-ci.org/simmimourya1/cyvlfeat/jobs/146924358
I found a similar issue opened on Github: https://github.com/m-labs/artiq/issues/157

Need to fix that.

Moving ahead with rest of the Quickshift package:

I was stuck with vl_flatmap. So here’s the catch:

In Matlab: If map = [8, 90, 23, 77, 88] is a 1x5 array then map([1, 4, 5]) would return [8, 77, 88].
So, () just selects elements.

map(map) : So basically map just selects itself.

How?

With every incoming index that map [inner] outputs to the map [outer]: map [outer] selects the particular element  corresponding to that index. [Yup! Sounds confusing. It takes a while.]

So it sounded bit easy at first but it did took a while.

Moving ahead:

Meanwhile I ported vl_quickseg.

Wrote a skeleton code for vl_imseg. Also imseg uses a function named vl_binsum.
I am gonna need to port this first to complete imseg.

flatmap is almost complete. Just writing some tests. Will post a PR soon.

After Quickshift, I’ll probably port kdtree methods.

Stay tuned. 🙂

Today

Hi there! 😀

Quickshift port is complete but there’s still a minor bug.

Return values are deviating little bit from the ground reference. But I think I’d be able to complete it before tonight. 🙂

Stay tuned for more updates.

Today : Coding period Day 60

Hi there! 😀

My last post says that I’d be implementing kdtree and related algorithms.

Well, there has been a change in plans. Had a discussion with my mentor. Will implement Quickshift first and then kdtree.

So coming up with quickshift, quickseg very soon. 🙂

Little introduction about the algorithm:

Quick shift is a fast mode seeking algorithm, similar to mean shift. The algorithm segments an RGB image (or any image with more than one channel) by identifying clusters of pixels in the joint spatial and color dimensions. Segments are local (superpixels) and can be used as a basis for further processing.

How it works: Given an image, the algorithm calculates a forest of pixels whose branches are labeled with a distance value (vl_quickshift,_get_parents, vl_quickshift,_get_dists). This specifies a hierarchical segmentation of the image, with segments corresponding to subtrees. Useful superpixels can be identified by cutting the branches whose distance label is above a given threshold (the threshold can be either fixed by hand, or determined by cross validation).

Thanks for reading. 🙂

Have a great day ahead!