K-Means Based on Neighbors with Latent Semantic Indexing for Documents Clustering

  • Gede Aditra Pradnyana
  • I Putu Gede Hendra Suputra
  • Tri Adhi Wijaya

Abstract

The use of neighbors concept in K-means algorithm in document clustering process can
improve the quality of document clustering results. However, K-means clustering algorithm
which is based on neighbors still has some weakness such as not pay attention to the
meaning or semantic proximity between documents, has a high computational complexity,
and not pay attention to the influence of document length in the document weighting process
so that long document will likely get a higher weighting. In This study we propose a new
method K-means based on neighbors with Latent Semantic Indexing (LSI) for the
optimization of K-means clustering method to perform document clustering. This new
method uses the concept of LSI in determining the neighbors of a document on the process
of K-means, so that the meaning of the word or semantic proximity between documents will
be taken into account. Weighting method and feature selection algorithm is also used to
obtain better clustering results. Test results with the test data in the form of 850 pieces of
news in Indonesian language document shows the F-measure and purity value of the Kmeans
clustering method with LSI-based neighbors better than the initial algorithm,that is
equal to 0.68 and 0.60 respectively. From these results it can be concluded that the
application of the neighbors method with LSI on K-means algorithm in document clustering
process can improve the quality of clustering results.

Published
2015-11-19
How to Cite
PRADNYANA, Gede Aditra; SUPUTRA, I Putu Gede Hendra; WIJAYA, Tri Adhi. K-Means Based on Neighbors with Latent Semantic Indexing for Documents Clustering. Proceeding ICIRAD, [S.l.], v. 1, n. 1, nov. 2015. Available at: <https://eproceeding.undiksha.ac.id/index.php/icirad/article/view/50>. Date accessed: 21 oct. 2020.