I am working on a dataframe containing abstracts from various NLP conferences, along with information on information on the respective authors (names) and the keywords they've associated with their abstracts; e.g.
- abstract
- author1, author2
- kw1, kw2, kw3
My objective is to cluster authors who frequently write about similar topics, as indicated by shared keywords. For the visualisation I am thinking of using t-SNE. However, I am unsure about specifying 'cluster labels' without manual intervention. Which algorithms would be suitable for this task* ?
*e.g. would K-means be a viable option given that number of clusters should be provided in advance? or should I opt for methods such as DBSCAN or Affinity Propagation ? Should I consider keywords as clusters (risking of producing an explosion of clusters -because of the large number of keywords-)