This blog post is about my work done during the GSoC coding period (May 23 – Aug 15). This blog post also serves as a tutorial for extended features of cyvlfeat.
This project is an extension of an existing project named cyvlfeat.
Brief introduction: We intend for this to be a light wrapper around the VLFeat toolbox. cyvlfeat will provide a mixture of pure Python and Cython code that looks to replicate the existing MATLAB toolbox. The role of Cython in this regard is to re-implement MATLAB mexfiles.
The VLFeat open source library implements popular computer vision algorithms specializing in image understanding as well as local features extraction and matching. Algorithms include Fisher Vector encodings, VLAD, SIFT, MSER, k-means, hierarchical k-means, agglomerative information bottleneck, SLIC superpixels, quickshift superpixels, large scale SVM training, and many others. It is written in C for efficiency and compatibility, with interfaces in MATLAB for ease of use. The corresponding documentation is detailed throughout both parts. It supports Windows, Mac OS X, and Linux. The latest version of VLFeat is 0.9.20.
GSOC mentor: Simon Niklaus
Other mentors: Patrick Snape, Mikael Rousson
Current contributors: Patrick Snape, James Booth, Benoit Seguin, Alexis Mignon, Simmi Mourya.
This project wouldn’t be possible without my mentors Simon, Patrick and Mikael. All of them have been really supportive to me. A big thanks to all of you for your encouragement and support. 🙂
Simon, you are a wonderful mentor, friend and everything one could look for in a good mentor. Thank you for your words of encouragement and support. I will remain grateful. You have been an inspiration. I hope we can work together again soon.
I will continue working on cyvlfeat after GSOC as well.
When I started the project, the following methods from VLFeat were exposed:
- fisher
- fisher
- generic
- set_simd_enabled, get_simd_enabled, cpu_has_avx, cpu_has_sse3, cpu_has_sse2
- get_num_cpus,
- get_max_threads, set_num_threads, get_thread_limit
- hog
- hog
- kmeans
- kmeans
- kmeans_quantize
- ikmeans, ikmeans_push
- hikmeans, hikmeans_push
- sift
- dsift
- sift
I exposed the following methods of VLFeat library during summer’16. [PR links included] to library. They will subsequently be outlined one by one.
- Phow
- Plotsiftdescriptor
- Siftdescriptor
- VLAD
- SLIC
- LBP
- quickshift
- Flatmap
- Binsum (Work in progress)
- Quickseg
- Imseg
- Kdtreebuid
- Random
- References
1. Phow: Extract PHOW features. (Equivalent of vl_phow in VLFeat’s MATLAB Toolbox.)
The PHOW features [1] are a variant of dense SIFT descriptors, extracted at multiple scales. A combination of dsift and scipy.ndimage.filters.gaussian_filter can be used to easily and efficiently compute such features. By default, phow computes the gray-scale variant of the descriptor. The color option can be used to compute the color variant instead.
Unlike the MATLAB wrapper of VLFeat, the image is pre-smoothed at the desired scale level by a Gaussian filter provided by SciPy. cyvlfeat includes a simple wrapper, phow, that does exactly this:
Example usage:
>>> import scipy.ndimage
>>> import matplotlib.colors
>>> import numpy as np
>>> from cyvlfeat.sift import phow
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> frames, descriptors = phow(img, verbose=True)
Please consult the pull request for further details.
2. Plotsiftdescriptor: Plots sift descriptors. (Equivalent of vl_plotsiftdescriptor in VLFeat’s MATLAB Toolbox.)
plotsiftdescriptor(d) plots the SIFT descriptor d. If d is a matrix, it plots one descriptor per column. d has the same format used by sift().
plotsiftdescriptor(d, f) plots the SIFT descriptors warped to
the SIFT frames f specified as columns of the matrix f . f has the same format used by sift().
The function assumes that the SIFT descriptors use the standard
configuration of 4×4 spatial bins and 8 orientations bins. The
parameters num_spatial_bins and num_orientation_bins respectively can be used to change this.
Example usage:
>>> import numpy as np
>>> from cyvlfeat.sift import plotsiftdescriptor
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> result = sift(img, compute_descriptor=True)
>>> F = result[0]
>>> D = result[1]
>>> plotsiftdescriptor(D, F)
>>> #or
>>> plotsiftdescriptor(D) # Default f will be used in this case.
Demo plots:
1. plotsiftdescriptor(d)

2. plotsiftdescriptor(d, f)

Please consult the pull request for further details.
3. Siftdescriptor: Plots sift descriptors. (Equivalent of vl_siftdescriptor in VLFeat’s MATLAB Toolbox.)
Calculates the SIFT descriptors of the keypoints frames on the
pre-processed image gradient_image.
In order to match the standard SIFT descriptor, the gradient should be
calculated after mapping the image to the keypoint scale. This is obtained
by smoothing the image by a a Gaussian kernel of variance equal to the scale
of the keypoint. Additionally, SIFT assumes that the input image is
pre-smoothed at scale 0.5 (this roughly compensates for the effect of the
CCD integrator), so the amount of smoothing that needs to be applied is
slightly less.
Example usage:
>>> import scipy.ndimage
>>> import numpy as np
>>> from cyvlfeat.sift import siftdescriptor
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> # Create a single frame in the center of the image
>>> frames = np.array([[256, 256, 1.0, np.pi / 2]])
>>> sigma = np.sqrt(frames[0, 2] ** 2 - 0.25) # 0.25 = 0.5^2
>>> img_smooth = scipy.ndimage.filters.gaussian_filter(img, sigma)
>>> y, x = np.gradient(img_smooth)
>>> mod = np.sqrt(x * x + y * y)
>>> ang = np.arctan2(y, x)
>>> gradient_image = np.stack((mod, ang), axis=0)
>>>
>>> d = siftdescriptor(gradient_image, frames, verbose=True)
Returns:
descriptors : (F, 128),uint8 or float32 ndarray
F is the number of keypoints (frames) used. The 128 length vectors
per keypoint extracted. uint8 by default.
Please consult the pull request for further details.
4. VLAD: Vector of Linearly Aggregated Descriptors. (Equivalent of vl_vlad in VLFeat’s MATLAB Toolbox.)
The Vector of Linearly Agregated Descriptors is similar to Fisher vectors but
(i) it does not store second-order information about the features and
(ii) it typically use KMeans instead of GMMs to generate the feature vocabulary (although the latter is also an option).
The VLAD encoding of a set of features is obtained by using the function vl_vlad_encode internally in Cython.
vl_vlad_encode requires a visual dictionary, for example obtained by using K-means clustering.
In the following example code, the vocabulary is first created using the KMeans clustering, then the points, that are to be encoded are assigned to its corresponding nearest vocabulary words, after that the original vlad encoding routine without any normalization option takes place. At the end of the process the encoding is stored in the enc variable.
>>> from cyvlfeat.vlad.vlad import vlad
>>> import numpy as np
>>> N = 1000
>>> K = 512
>>> D = 128
>>> x = np.random.uniform(size=(D, N)).astype(np.float32)
>>> means = np.random.uniform(size=(D, K)).astype(np.float32)
>>> assignments = np.random.uniform(size=(K, N)).astype(np.float32)
>>> enc = vlad(x, means, assignments)
enc : [k, ] float32, ndarray
A vector of size equal to the product of
k = the n_data_dimensions * n_clusters
5. SLIC: Extracts the SLIC superpixels from the image. (Equivalent of vl_slic in VLFeat’s MATLAB Toolbox.)
SLIC [2] is a simple and efficient method to decompose an image in visually homogeneous regions. It is based on a spatially localized version of k-means clustering. Similar to mean shift or quickshift (quickshift.h), each pixel is associated to a feature vector

and then k-means clustering is run on those. As discussed below, the coefficient λ balances the spatial and appearance components of the feature vectors, imposing a degree of spatial regularization to the extracted regions.
SLIC takes two parameters: the nominal size of the regions (superpixels) regionSize and the strength of the spatial regularization regularizer. The image is first divided into a grid with step regionSize. The center of each grid tile is then used to initialize a corresponding k-means (up to a small shift to avoid image edges). Finally, the k-means centers and clusters are refined by using the Lloyd algorithm, yielding segmenting the image. As a further restriction and simplification, during the k-means iterations each pixel can be assigned to only the 22 centers corresponding to grid tiles adjacent to the pixel.
Example usage:
>>> from cyvlfeat.slic.slic import slic
>>> import numpy as np
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> segment = slic(img, region_size=10, regularizer=10)
Returns:
segments: float32, ndarray
contains the superpixel identifier for each image pixel.
Please consult the pull request for further details.
6. LBP: Local Binary Patterns. (Equivalent of vl_lbp in VLFeat’s MATLAB Toolbox.)
A LBP is a string of bit obtained by binarizing a local neighborhood of pixels with respect to the brightness of the central pixel. VlLbp implements only the case of 3×3 pixel neighborhoods (this setting perform best in applications). Thus the LBP at location (x, y) is a string of eight bits. Each bit is equal to one if the corresponding pixel is brighter than the central one. Pixels are scanned starting from the one to the right in anti-clockwise sense.
Computes the Local Binary Pattern (LBP) features for image where image is divided in cells of size cell_size.
Example usage:
>>> from cyvlfeat.misc.lbp import lbp
>>> import numpy as np
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> result = lbp(img, cell_size=200)
Returns:
histograms: (h, w, 58), float32 ndarray
h = FLOOR(height/cell_size)
w FLOOR(width/cell_size)
histograms is a three-dimensional array containing one histograms of
quantized LBP features per cell. The width of histograms is
FLOOR(width/cell_size), where width is the width of the image.
The same for the height.The third dimension is 58.
Please consult the pull request for further details.
7. quickshift: quickshift image segmentation (Equivalent of vl_quickshift in VLFeat’s MATLAB Toolbox.)
quickshift is a mode seeking algorithm (like mean shift) which instead of iteratively shifting each point towards a local mean instead forms a tree of links to the nearest neighbor which increases the density.
quickshift arranges all of the data points into a tree where parents in the tree are the nearest neighbors in the feature space which increase the estimate of the density. By imposing a limit on the distance between nearest neighbors (maxdist), we decrease the amount of computation required to search for the nearest neighbors. However, we also break our tree into a forest, because local modes of the density will now have no neighbor which is close enough in the feature space to form a link.
If you pass medoid = True it will run Medoidshift on image. By default it runs quickshift.
Example usage:
>>> import numpy as np
>>> from cyvlfeat.quickshift.quickshift import quickshift
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> maps, gaps, estimate = quickshift(img,kernel_size=2,max_dist=10)
>>> # medoid shift
>>> maps, gaps, estimate = quickshift(img,kernel_size=2,max_dist=10,medoid=True)
Returns:
map : [H, W] float64 ndarray. Array of the same size of I. Each element (pixel) of map is an index to the parent element in the forest. gaps: [H, W] float64 ndarray. Array of the same size of I. gaps contains the corresponding branch length. Pixels which are at the root of their respective tree have MAP(x) = x and GAPS(x) = inf. estimate: [H, W] float64 ndarray, optional The estimate of the density. Only returned if max_dist is not None.
Please consult the pull request for further details.
8. Flatmap: Flatten a tree, assigning the label of the root to each node (Equivalent of vl_flatmap in VLFeat’s MATLAB Toolbox.)
Since flatmap depends on quickshift and quickshift hasn’t been merged in master yet, I developed these features on a single branch.
flatmap(map) labels each tree of the forest contained in map.
Example usage:
>>> import numpy as np
>>> from cyvlfeat.quickshift.quickshift import quickshift
>>> from cyvlfeat.quickshift.flatmap import flatmap
>>> from cyvlfeat.test_util import lena
>>> img = lena().astype(np.float32)
>>> maps, gaps, estimate = quickshift(img,kernel_size=2,max_dist=10)
>>> labels, clusters = flatmap(maps)
Returns:
labels : contains the linear index of the root node in `map`. clusters : contains a label between 1 and the number of clusters.
Please consult the pull request for further details.
9. Binsum: Binned summation. (Equivalent of vl_binsum in VLFeat’s MATLAB Toolbox.)
Work in progress. I am yet to find a trick to work with .def files in Cython. Here’s the question which I asked on Stack Overflow regarding this very case.
10. Quickseg: Produce a quickshift segmentation of a grey-scale image. (Equivalent of vl_quickseg in VLFeat’s MATLAB Toolbox.)
Returns:
i_seg : A color image where each pixel is labeled by the mean color
in its region.
labels : [H, W] float64 ndarray.
Array of the same size of image.
A labeled image where the number corresponds to the cluster identity.
maps : [H, W] float64 ndarray.
Array of the same size of image.
maps as returned by quickshift : For each pixel, the pointer to the
nearest pixel which increases the estimate of the density.
gaps : [H, W] float64 ndarray.
Array of the same size of image.
gaps as returned by quickshift : For each pixel, the distance to
the nearest pixel which increases the estimate of the density.
estimate : [H, W] float64 ndarray.
Array of the same size of image.
estimate as returned by quickshift: The estimate of the density.
Quickseg depends on binsum whose development is under progress.
11. Imseg: Color an image based on the segmentation (Equivalent of vl_imseg in VLFeat’s MATLAB Toolbox.)
Color an image based on the segmentation.
Labels output with the average color from image of each cluster indicated by labels.
Imseg depends on binsum whose development is under progress.
12. Kdtreebuild: Build randomized kd-tree. (Equivalent of vl_kdtreebuild in VLFeat’s MATLAB Toolbox.) [Tutorial– vl_kdtreebuild]
VLFeat implements the randomized kd-tree forest from FLANN. This enables fast medium and large scale nearest neighbor queries among high dimensional data points (such as those produced by SIFT).
A kd-tree is a data structure used to quickly solve nearest-neighbor queries. A kd-tree is a hierarchical structure built by partitioning the data recursively along the dimension of maximum variance. At each iteration the variance of each column is computed and the data is split into two parts on the column with maximum variance.
vl_kdtreebuild in action:
forest = vl_kdtreebuild(X)
returns a structure forest containing the kd-tree indexing the data X. X is a M x N dimensional matrix of class DOUBLE or SINGLE with one data point per column.
13. Random: Random number generator (VLFeat C API)
The generator is based on the popular Mersenne Twister algorithm [3] In cyvlfeat, a random number generator is implemented by an object of type VlRand.
The generator can be used to obtain random quantities of various types:
- vl_rand_int31, vl_rand_uint32 for 32-bit random integers;
- vl_rand_real1 for a double in [0,1];
- vl_rand_real2 for a double in [0,1);
- vl_rand_real3 for a double in (0,1);
- vl_rand_res53 for a double in [0,1) with high resolution.
Although this feature is currently untested. I will test it along with kdtreebuild.
14. References:
[1] A. Bosch, A. Zisserman, and X. Munoz. Image classifcation using random forests and ferns. In Proc. ICCV, 2007.
[2] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. SLIC superpixels. Technical report, EPFL, 2010.
[3] M. Matsumoto and T. Nishimura. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1), 1998.
4. vlfeat.org
5. vision.princeton.edu [Vlfeat tutorials]