OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Explaining the poor accuracy of a music genre classifier

  • Thread starter Thread starter dsp_user
  • Start date Start date
D

dsp_user

Guest
I've been playing with this music genre classifier ( https://github.com/musikalkemist/De...e classification/code/mlp_genre_classifier.py ) and been trying to improve the model accuracy, which currently is only about 60%. I've tried changing the network architecture by adding an additional hidden layer, used different values for the dropout but that didn't help. I've also experimented with different values for the learning rate but that too didn't help.

The dataset consists of a thousand 30-second tracks ( 100 per genre ), which are first divided into segments ( e.g. 10 segments per track ) and then a number of MFCC arrays is calculated for each segment. The actual number of MFCC arrays depends on the size of the FFT. For instance, if we take an FFT size to be 2048 samples long, we'll create 130 MFCC-s per segment so the input shape will be ( 9996, 130, 13 ). The audio preprocessing is done here https://github.com/musikalkemist/De...n: Preparing the dataset/code/extract_data.py .

The loading and splitting of data is done like so

Code:
 X, y = load_data('genres.json')  #genres.json is the file that contains all the MFCC arrays as well as genre labels/mappings

indices = np.arange(len(y))
# create train/test split
X_train, X_test, y_train, y_test, idx_train, idx_test = train_test_split(X, y, indices, test_size=0.3)

I then noticed that there must be some data leak between the training and testing sets. The reason for this leak is the fact that every track contains multiple MFCC arrays and some of these arrays might end up in either the training or testing set. Ideally, all the MFCCs belonging to a single track should end up in either the train set or the test set but not both.

So, my idea was to simply calculate the MFCCs for the two sets and load them separately, thus requiring no train/test split. In order to do this, I created 2 folders that would hold the tracks for the training and testing set respectively.

Code:
X_train, y_train = load_data('genre_train.json')
X_test, y_test = load_data(+'genre_test.json')

This is simple enough, but for some reason, the accuracy is now only around 40% ( average after multiple runs ).

I guess I must have made a mistake somewhere to get this low accuracy but I just can't seem to find one.

The rest of the implementation is pretty much the same as in the original article I referred to above.

Code:
import json
import os
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow.keras as keras
import matplotlib.pyplot as plt

# path to json file that stores MFCCs and genre labels for each processed segment
DATA_PATH_TEST = "genres_test.json"
DATA_PATH_TRAIN = "genres_train.json"


def plot_history(history):
"""Plots accuracy/loss for training/validation set as a function of the epochs
    :param history: Training history of model
    :return:
"""

fig, axs = plt.subplots(2)

# create accuracy sublpot
axs[0].plot(history.history["accuracy"], label="train accuracy")
axs[0].plot(history.history["val_accuracy"], label="test accuracy")
axs[0].set_ylabel("Accuracy")
axs[0].legend(loc="lower right")
axs[0].set_title("Accuracy eval")

# create error sublpot
axs[1].plot(history.history["loss"], label="train error")
axs[1].plot(history.history["val_loss"], label="test error")
axs[1].set_ylabel("Error")
axs[1].set_xlabel("Epoch")
axs[1].legend(loc="upper right")
axs[1].set_title("Error eval")

plt.show()


def load_data(data_path):
"""Loads training dataset from json file.
    :param data_path (str): Path to json file containing data
    :return X (ndarray): Inputs
    :return y (ndarray): Targets
"""

with open(data_path, "r") as fp:
    data = json.load(fp)

# convert lists to numpy arrays
X = np.array(data["mfcc"])
y = np.array(data["labels"])

print("Data succesfully loaded!")

return X, y

if __name__ == "__main__":
genreLabels = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 
'pop', 'reggae', 'rock']
 # load data
 
 X_train, y_train = load_data(DATA_PATH_TRAIN)
 X_test, y_test = load_data(DATA_PATH_TEST)
 print(X_test.shape)
 print(y_test.shape)

model = keras.Sequential([

    # input layer
    keras.layers.Flatten(input_shape=(X_train.shape[1], X_train.shape[2])),

    # 1st dense layer
    keras.layers.Dense(524, activation='relu'),
    keras.layers.Dropout(0.2),
    # 2nd dense layer
    keras.layers.Dense(256, activation='relu'),

    keras.layers.Dropout(0.2),
    # 3rd dense layer
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.2),
    # output layer
    keras.layers.Dense(10, activation='softmax')
])

print(X_train.shape)
# compile model
# model = keras.models.load_model(os.getcwd()+'genreModel')
# print('model loaded' )
optimiser = keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=optimiser,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

# train model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), batch_size=64, epochs=100)

loss, acc = model.evaluate(X_test, y_test, verbose=0)
print('Accuracy: %.3f' % acc)
plot_history(history)
<p>I've been playing with this music genre classifier ( <a href="https://github.com/musikalkemist/De...e classification/code/mlp_genre_classifier.py" rel="nofollow noreferrer">https://github.com/musikalkemist/De...e classification/code/mlp_genre_classifier.py</a> ) and been trying to improve the model accuracy, which currently is only about 60%. I've tried changing the network architecture by adding an additional hidden layer, used different values for the dropout but that didn't help. I've also experimented with different values for the learning rate but that too didn't help.</p>
<p>The dataset consists of a thousand 30-second tracks ( 100 per genre ), which are first divided into segments ( e.g. 10 segments per track ) and then a number of MFCC arrays is calculated for each segment.
The actual number of MFCC arrays depends on the size of the FFT. For instance, if we take an FFT size to be 2048 samples long, we'll create 130 MFCC-s per segment so the input shape will be ( 9996, 130, 13 ). The audio preprocessing is done here <a href="https://github.com/musikalkemist/De...n: Preparing the dataset/code/extract_data.py" rel="nofollow noreferrer">https://github.com/musikalkemist/De...n: Preparing the dataset/code/extract_data.py</a> .</p>
<p>The loading and splitting of data is done like so</p>
<pre><code> X, y = load_data('genres.json') #genres.json is the file that contains all the MFCC arrays as well as genre labels/mappings

indices = np.arange(len(y))
# create train/test split
X_train, X_test, y_train, y_test, idx_train, idx_test = train_test_split(X, y, indices, test_size=0.3)
</code></pre>
<p>I then noticed that there must be some data leak between the training and testing sets. The reason for this leak is the fact that
every track contains multiple MFCC arrays and some of these arrays might end up in either the training or testing set. Ideally, all the MFCCs belonging to a single track should
end up in either the train set or the test set but not both.</p>
<p>So, my idea was to simply calculate the MFCCs for the two sets and load them separately, thus requiring no train/test split. In order to do this, I created 2 folders that would hold the tracks for the training and testing set respectively.</p>
<pre><code>X_train, y_train = load_data('genre_train.json')
X_test, y_test = load_data(+'genre_test.json')
</code></pre>
<p>This is simple enough, but for some reason, the accuracy is now only around 40% ( average after multiple runs ).</p>
<p>I guess I must have made a mistake somewhere to get this low accuracy but I just can't seem to find one.</p>
<p>The rest of the implementation is pretty much the same as in the original article I referred to above.</p>
<pre><code>import json
import os
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow.keras as keras
import matplotlib.pyplot as plt

# path to json file that stores MFCCs and genre labels for each processed segment
DATA_PATH_TEST = "genres_test.json"
DATA_PATH_TRAIN = "genres_train.json"


def plot_history(history):
"""Plots accuracy/loss for training/validation set as a function of the epochs
:param history: Training history of model
:return:
"""

fig, axs = plt.subplots(2)

# create accuracy sublpot
axs[0].plot(history.history["accuracy"], label="train accuracy")
axs[0].plot(history.history["val_accuracy"], label="test accuracy")
axs[0].set_ylabel("Accuracy")
axs[0].legend(loc="lower right")
axs[0].set_title("Accuracy eval")

# create error sublpot
axs[1].plot(history.history["loss"], label="train error")
axs[1].plot(history.history["val_loss"], label="test error")
axs[1].set_ylabel("Error")
axs[1].set_xlabel("Epoch")
axs[1].legend(loc="upper right")
axs[1].set_title("Error eval")

plt.show()


def load_data(data_path):
"""Loads training dataset from json file.
:param data_path (str): Path to json file containing data
:return X (ndarray): Inputs
:return y (ndarray): Targets
"""

with open(data_path, "r") as fp:
data = json.load(fp)

# convert lists to numpy arrays
X = np.array(data["mfcc"])
y = np.array(data["labels"])

print("Data succesfully loaded!")

return X, y

if __name__ == "__main__":
genreLabels = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal',
'pop', 'reggae', 'rock']
# load data

X_train, y_train = load_data(DATA_PATH_TRAIN)
X_test, y_test = load_data(DATA_PATH_TEST)
print(X_test.shape)
print(y_test.shape)

model = keras.Sequential([

# input layer
keras.layers.Flatten(input_shape=(X_train.shape[1], X_train.shape[2])),

# 1st dense layer
keras.layers.Dense(524, activation='relu'),
keras.layers.Dropout(0.2),
# 2nd dense layer
keras.layers.Dense(256, activation='relu'),

keras.layers.Dropout(0.2),
# 3rd dense layer
keras.layers.Dense(64, activation='relu'),
keras.layers.Dropout(0.2),
# output layer
keras.layers.Dense(10, activation='softmax')
])

print(X_train.shape)
# compile model
# model = keras.models.load_model(os.getcwd()+'genreModel')
# print('model loaded' )
optimiser = keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=optimiser,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

model.summary()

# train model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), batch_size=64, epochs=100)

loss, acc = model.evaluate(X_test, y_test, verbose=0)
print('Accuracy: %.3f' % acc)
plot_history(history)
</code></pre>
 

Latest posts

Top