Automatic Indexing of Digital Books using RAKE and Word2Vec
DOI:
https://doi.org/10.20961/joive.v9i1.3050Keywords:
Automatic Indexing; Digital Book; RAKE; Word2Vec; PDFAbstract
Manual indexing of digital books is time-consuming and prone to inconsistency. To address this, this study developed an automatic indexing system using RAKE (Rapid Automatic Keyword Extraction) method and Word2Vec. The system accepts PDF files as input, performs text preprocessing, and extracts key phrases using RAKE. These phrases are subsequently filtered based on semantic relevance to the specified topic using an Indonesian-language Word2Vec model. Users can manually add phrases and select relevant ones to be included in the final index. The resulting index includes phrases, page numbers, and relevance scores, which are inserted as an additional page at the end of the PDF document. Evaluation was conducted by comparing the system-generated index with the author’s manual index using precision, recall, and cosine similarity metrics. The results indicate that although precision and recall were very low, a cosine similarity score of 0.69 suggests a semantic similarity between the system output and the author’s index.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Informatics and Vocational Education

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







