OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Faster kNN Classification Algorithm in Python

  • Thread starter Thread starter Eoin Ó Coinnigh
  • Start date Start date
E

Eoin Ó Coinnigh

Guest
I want to code my own kNN algorithm from scratch, the reason is that I need to weight the features. The problem is that my program is still really slow despite removing for loops and using built in numpy functionality.

Can anyone suggest a way to speed this up? I don't use np.sqrt for the L2 distance because it's unnecessary and actually slows it all up quite a bit.

Code:
class GlobalWeightedKNN:
    """
    A k-NN classifier with feature weights

    Returns: predictions of k-NN.
    """

    def __init__(self):
        self.X_train = None
        self.y_train = None
        self.k = None
        self.weights = None
        self.predictions = list()

    def fit(self, X_train, y_train, k, weights):        
        self.X_train = X_train
        self.y_train = y_train
        self.k = k
        self.weights = weights

    def predict(self, testing_data):
        """
        Takes a 2d array of query cases.

        Returns a list of predictions for k-NN classifier
        """

        np.fromiter((self.__helper(qc) for qc in testing_data), float)  
        return self.predictions


    def __helper(self, qc):
        neighbours = np.fromiter((self.__weighted_euclidean(qc, x) for x in self.X_train), float)
        neighbours = np.array([neighbours]).T 
        indexes = np.array([range(len(self.X_train))]).T
        neighbours = np.append(indexes, neighbours, axis=1)

        # Sort by second column - distances
        neighbours = neighbours[neighbours[:,1].argsort()]  
        k_cases = neighbours[ :self.k]
        indexes = [x[0] for x in k_cases]

        y_answers = [self.y_train[int(x)] for x in indexes]
        answer = max(set(y_answers), key=y_answers.count)  # get most common value
        self.predictions.append(answer)


    def __weighted_euclidean(self, qc, other):
        """
        Custom weighted euclidean distance

        returns: floating point number
        """

        return np.sum( ((qc - other)**2) * self.weights )
<p>I want to code my own kNN algorithm from scratch, the reason is that I need to weight the features. The problem is that my program is still really slow despite removing for loops and using built in numpy functionality.</p>

<p>Can anyone suggest a way to speed this up? I don't use <code>np.sqrt</code> for the L2 distance because it's unnecessary and actually slows it all up quite a bit.</p>

<pre><code>class GlobalWeightedKNN:
"""
A k-NN classifier with feature weights

Returns: predictions of k-NN.
"""

def __init__(self):
self.X_train = None
self.y_train = None
self.k = None
self.weights = None
self.predictions = list()

def fit(self, X_train, y_train, k, weights):
self.X_train = X_train
self.y_train = y_train
self.k = k
self.weights = weights

def predict(self, testing_data):
"""
Takes a 2d array of query cases.

Returns a list of predictions for k-NN classifier
"""

np.fromiter((self.__helper(qc) for qc in testing_data), float)
return self.predictions


def __helper(self, qc):
neighbours = np.fromiter((self.__weighted_euclidean(qc, x) for x in self.X_train), float)
neighbours = np.array([neighbours]).T
indexes = np.array([range(len(self.X_train))]).T
neighbours = np.append(indexes, neighbours, axis=1)

# Sort by second column - distances
neighbours = neighbours[neighbours[:,1].argsort()]
k_cases = neighbours[ :self.k]
indexes = [x[0] for x in k_cases]

y_answers = [self.y_train[int(x)] for x in indexes]
answer = max(set(y_answers), key=y_answers.count) # get most common value
self.predictions.append(answer)


def __weighted_euclidean(self, qc, other):
"""
Custom weighted euclidean distance

returns: floating point number
"""

return np.sum( ((qc - other)**2) * self.weights )
</code></pre>
 

Latest posts

Online statistics

Members online
0
Guests online
4
Total visitors
4
Top