Artículo
Acceso Abierto

A co-training model based in learning transfer for the classification of research papers

Enlace externo
Resumen

A multitude of scholarly papers can be accessed online, and their continual growth poses challenges in categorization. In diverse academic fields, organizing these documents is important, as it assists institutions, journals, and scholars in structuring their content to improve the visibility of research. In this study, we propose a co-training model based on transfer learning to classify papers according to institutional research lines. We utilize co- training text processing techniques to enhance model learning through transformers, enabling the identification of trends and patterns in document texts. The model is structured with two views (titles and abstracts) for data preprocessing and training. Each input employs different document representation techniques that augment its training using BERT's pre-trained scheme. For evaluating the proposed model, a dataset comprising 898 institutional papers is compiled. These documents undergo classification prediction in five or eleven classes, and the model performance is compared with individually trained models from each view using the BART pre-trained scheme and combined models. The best precision level of 0,87 has been achieved, compared to BERT pre-trained model's metric of 0,78 (five classes). These findings suggest that co-training models can be a valuable approach to improve the predictive performance of text classification.

Palabras clave
text classification
co-training
transformer
pre-trained
http://creativecommons.org/licenses/by-nc-sa/4.0/

Esta obra se publica con la licencia Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (BY-NC-SA 4.0)

item.page.license
Cargando...
Miniatura