Automatic Indexing of Digital Books using RAKE and Word2Vec

Authors

  • Arhan Windu Rizki Putra Budianto Politeknik Negeri Malang Author
  • Dr. Ulla Delfana Rosiani, ST., MT. Politeknik Negeri Malang Author
  • Vit Zuraida, S.Kom., M.Kom. Politeknik Negeri Malang Author
  • Rizki Putri Ramadhani Politeknik Negeri Malang Translator
  • Mohammad Alfarizi Abdullah Politeknik Negeri Malang Author

DOI:

https://doi.org/10.20961/joive.v9i1.3050

Keywords:

Automatic Indexing; Digital Book; RAKE; Word2Vec; PDF

Abstract

Manual indexing of digital books is time-consuming and prone to inconsistency. To address this, this study developed an automatic indexing system using RAKE (Rapid Automatic Keyword Extraction) method and Word2Vec. The system accepts PDF files as input, performs text preprocessing, and extracts key phrases using RAKE. These phrases are subsequently filtered based on semantic relevance to the specified topic using an Indonesian-language Word2Vec model. Users can manually add phrases and select relevant ones to be included in the final index. The resulting index includes phrases, page numbers, and relevance scores, which are inserted as an additional page at the end of the PDF document. Evaluation was conducted by comparing the system-generated index with the author’s manual index using precision, recall, and cosine similarity metrics. The results indicate that although precision and recall were very low, a cosine similarity score of 0.69 suggests a semantic similarity between the system output and the author’s index.

Downloads

Published

2026-01-27

How to Cite

Budianto, A. W. R. P., Ulla Delfana Rosiani, Vit Zuraida, & Mohammad Alfarizi Abdullah. (2026). Automatic Indexing of Digital Books using RAKE and Word2Vec. Journal of Informatics and Vocational Education, 9(1). https://doi.org/10.20961/joive.v9i1.3050