OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

BOVW "describe" images in terms of visual words

  • Thread starter Thread starter Afelium
  • Start date Start date
A

Afelium

Guest
I'm trying to use the BOVW model for image classification (https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision)

Here's what I've done so far:

  • got an array of labeled numpy arrays by loading images out of my dataset
  • turned the images into lists of their features using SIFT
  • created a vocabulary of visual words by putting all the features into a 1D array and feeding them to kmeans clustering

Here's how I create the vocabulary

Code:
def get_vocab(imgs):
    #get features from all images
    descriptors = [f for img in imgs for f in extractFeatures(img)]
    
    km = KMeans(n_clusters).fit(descriptors) 
    return km.cluster_centers_

if I understood the model correctly, the next step is to compute a frequency histogram, which is an array storing the count for every visual word occurring in the image.

My question are:


  • how do I "count" the visual words occuring inside an image? do I take a feature (a numpy array) and find the visual word closest to it, then add to that count?


  • how do I find the "closest" visual word to any given feature?


  • can I "reuse" the data from the clustering algorithm (i.e. find which cluster a feature has been put into)?
<p>I'm trying to use the BOVW model for image classification (<a href="https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision</a>)</p>
<p>Here's what I've done so far:</p>
<ul>
<li>got an array of labeled numpy arrays by loading images out of my dataset</li>
<li>turned the images into lists of their features using SIFT</li>
<li>created a vocabulary of visual words by putting all the features into a 1D array and feeding them to kmeans clustering</li>
</ul>
<p>Here's how I create the vocabulary</p>
<pre><code>def get_vocab(imgs):
#get features from all images
descriptors = [f for img in imgs for f in extractFeatures(img)]

km = KMeans(n_clusters).fit(descriptors)
return km.cluster_centers_
</code></pre>
<p>if I understood the model correctly, the next step is to compute a frequency histogram, which is an array storing the count for every visual word occurring in the image.</p>
<p>My question are:</p>
<ul>
<li><p>how do I "count" the visual words occuring inside an image?
do I take a feature (a numpy array) and find the visual word closest to it, then add to that count?</p>
</li>
<li><p>how do I find the "closest" visual word to any given feature?</p>
</li>
<li><p>can I "reuse" the data from the clustering algorithm (i.e. find which cluster a feature has been put into)?</p>
</li>
</ul>
 
Top