AI Based Digital Book Indexing System Using YAKE and WORD2VEC Methods
DOI:
https://doi.org/10.20961/joive.v9i1.3026Keywords:
Automatic Indexing, YAKE, Word2Vec, Artificial Intelligence, Natural Language ProcessingAbstract
Polinema Press, the publishing unit of the State Polytechnic of Malang (Polinema), requires an efficient solution for automatically generating book indexes. The current manual indexing process is time-consuming and inefficient. This research aims to develop an AI-based automatic indexing system utilizing the YAKE (Yet Another Keyword Extractor) and Word2Vec methods to improve the accuracy and efficiency of index generation. The system is designed to process digital books in PDF format through several stages: (1) text preprocessing (text extraction, stopword removal, tokenization), (2) keyword extraction using YAKE based on statistical features such as word frequency and position, (3) final keyword selection by measuring semantic similarity using Word2Vec, and (4) alphabetical index compilation along with page numbers where keywords appear. The indexing results are evaluated by comparing them with manual indexes using cosine similarity to measure the degree of similarity. This research has been tested on 37 digital books and resulted in the best configuration in the combination of YAKE and Word2Vec with phrases of 2-3 words, which obtained cosine similarity values of up to 0.91, precision of up to 0.38, and average processing time of less than 4 seconds per document. These results show that the system is able to produce relevant, fast, and contextual indexes when compared to manual indexes, and is expected to reduce the manual workload at Polinema Press and become a reference for the application of natural language processing (NLP) technology for Indonesian-language documents.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Informatics and Vocational Education

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







