AI Based Digital Book Indexing System Using YAKE and WORD2VEC Methods

Mohammad Alfarizi Abdullah; Ulla Delfana Rosiani; Vit Zuraida; Arhan Windu Rizki Putra Budianto; Rizki Putri Ramadhani

doi:10.20961/joive.v9i1.3026

Authors

Mohammad Alfarizi Abdullah Politeknik Negeri Malang Author
Ulla Delfana Rosiani Politeknik Negeri Malang Author
Vit Zuraida Politeknik Negeri Malang Author
Arhan Windu Rizki Putra Budianto Politeknik Negeri Malang Author
Rizki Putri Ramadhani Politeknik Negeri Malang Author

DOI:

https://doi.org/10.20961/joive.v9i1.3026

Keywords:

Automatic Indexing, YAKE, Word2Vec, Artificial Intelligence, Natural Language Processing

Abstract

Polinema Press, the publishing unit of the State Polytechnic of Malang (Polinema), requires an efficient solution for automatically generating book indexes. The current manual indexing process is time-consuming and inefficient. This research aims to develop an AI-based automatic indexing system utilizing the YAKE (Yet Another Keyword Extractor) and Word2Vec methods to improve the accuracy and efficiency of index generation. The system is designed to process digital books in PDF format through several stages: (1) text preprocessing (text extraction, stopword removal, tokenization), (2) keyword extraction using YAKE based on statistical features such as word frequency and position, (3) final keyword selection by measuring semantic similarity using Word2Vec, and (4) alphabetical index compilation along with page numbers where keywords appear. The indexing results are evaluated by comparing them with manual indexes using cosine similarity to measure the degree of similarity. This research has been tested on 37 digital books and resulted in the best configuration in the combination of YAKE and Word2Vec with phrases of 2-3 words, which obtained cosine similarity values of up to 0.91, precision of up to 0.38, and average processing time of less than 4 seconds per document. These results show that the system is able to produce relevant, fast, and contextual indexes when compared to manual indexes, and is expected to reduce the manual workload at Polinema Press and become a reference for the application of natural language processing (NLP) technology for Indonesian-language documents.

AI Based Digital Book Indexing System Using YAKE and WORD2VEC Methods

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Main Menu

ACCREDITATION

TEMPLATE

Journal Archive

COLLABORATION

recommended tools

Information

Visitor Statistics

Journal of Informatics and Vocational Education