OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How to visualise top terms on each HDBSCAN cluster

  • Thread starter Thread starter J.Doe
  • Start date Start date
J

J.Doe

Guest
I'm currently trying to use HDBSCAN to cluster a bunch of movie data, in order to group similar content together and be able to come up with 'topics' that describe those clusters. I'm interested in HDBSCAN because I'm aware that it's considered soft clustering, as opposed to K-Means, which would be more suitable for my goal.

After performing HDBSCAN, I was able to find with movies were placed in each cluster. What I now wanted was to which terms/words represented each cluster.

I've done something similar with KMeans (code below):

Code:
model = KMeans(n_clusters=70)
model.fit(text)
clusters=model.predict(text)
model_labels=model.labels_
output= model.transform(text)

titles=[]
for i in data['title']:
        titles.append(i)
genres=[]
for i in data['genres']:
        genres.append(i)

films_kmeans = { 'title': titles, 'info': dataset_list2, 'cluster': clusters, 'genre': genres }
frame_kmeans= pd.DataFrame(films_kmeans, index=[clusters])

print("Top terms per cluster:")
print()
#sort cluster centers by proximity to centroid
order_centroids = model.cluster_centers_.argsort()[:, ::-1] 
for i in range(70):
    print("Cluster %d:" % i),
    for ind in order_centroids[i, :5]:
        print(' %s' % tfidf_feature_names[ind]),
    print()
    print()

    print("Cluster %d titles:" % i, end='')
    for title in frame_kmeans.loc[i]['title'].values.tolist():
        print(' %s,' % title, end='')
    print() #add whitespace
    print() #add whitespace

print()

While this works fine for KMeans, I couldn't find a similar way to do this for HDBSCAN, as I'm aware it doesn't have cluster centers. I have been looking at the documentation, but I'm pretty new at this and I haven't been able to fix my issue.

Any ideas would be very much appreciated! Thank you for your time.
<p>I'm currently trying to use HDBSCAN to cluster a bunch of movie data, in order to group similar content together and be able to come up with 'topics' that describe those clusters. I'm interested in HDBSCAN because I'm aware that it's considered soft clustering, as opposed to K-Means, which would be more suitable for my goal.</p>

<p>After performing HDBSCAN, I was able to find with movies were placed in each cluster. What I now wanted was to which terms/words represented each cluster.</p>

<p>I've done something similar with KMeans (code below):</p>

<pre><code>model = KMeans(n_clusters=70)
model.fit(text)
clusters=model.predict(text)
model_labels=model.labels_
output= model.transform(text)

titles=[]
for i in data['title']:
titles.append(i)
genres=[]
for i in data['genres']:
genres.append(i)

films_kmeans = { 'title': titles, 'info': dataset_list2, 'cluster': clusters, 'genre': genres }
frame_kmeans= pd.DataFrame(films_kmeans, index=[clusters])

print("Top terms per cluster:")
print()
#sort cluster centers by proximity to centroid
order_centroids = model.cluster_centers_.argsort()[:, ::-1]
for i in range(70):
print("Cluster %d:" % i),
for ind in order_centroids[i, :5]:
print(' %s' % tfidf_feature_names[ind]),
print()
print()

print("Cluster %d titles:" % i, end='')
for title in frame_kmeans.loc['title'].values.tolist():
print(' %s,' % title, end='')
print() #add whitespace
print() #add whitespace

print()
</code></pre>

<p>While this works fine for KMeans, I couldn't find a similar way to do this for HDBSCAN, as I'm aware it doesn't have cluster centers. I have been looking at the documentation, but I'm pretty new at this and I haven't been able to fix my issue. </p>

<p>Any ideas would be very much appreciated! Thank you for your time.</p>
 

Latest posts

Top