Pubmed Journal Recommendation System dataset

  1. Jiayun Liu 1
  2. Manuel Castillo Cara 2
  3. Raúl García Castro 1
  1. 1 Universidad Politécnica de Madrid
    info

    Universidad Politécnica de Madrid

    Madrid, España

    ROR https://ror.org/03n6nwv02

  2. 2 Universidad Nacional de Educación a Distancia
    info

    Universidad Nacional de Educación a Distancia

    Madrid, España

    ROR https://ror.org/02msb5n36

Editor: Zenodo

Año de publicación: 2023

Tipo: Dataset

CC BY 4.0

Resumen

Dataset for Journal recommendation, includes title, abstract, keywords, and journal. We extracted the journals and more information of: Jiasheng Sheng. (2022). PubMed-OA-Extraction-dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6330817. Dataset Components: data_pubmed_all: This dataset encompasses all articles, each containing the following columns: 'pubmed_id', 'title', 'keywords', 'journal', 'abstract', 'conclusions', 'methods', 'results', 'copyrights', 'doi', 'publication_date', 'authors', 'AKE_pubmed_id', 'AKE_pubmed_title', 'AKE_abstract', 'AKE_keywords', 'File_Name'. data_pubmed: To focus on recent and relevant publications, we have filtered this dataset to include articles published within the last five years, from January 1, 2018, to December 13, 2022—the latest date in the dataset. Additionally, we have exclusively retained journals with more than 200 published articles, resulting in 262,870 articles from 469 different journals. data_pubmed_train, data_pubmed_val, and data_pubmed_test: For machine learning and model development purposes, we have partitioned the 'data_pubmed' dataset into three subsets—training, validation, and test—using a random 60/20/20 split ratio. Notably, this division was performed on a per-journal basis, ensuring that each journal's articles are proportionally represented in the training (60%), validation (20%), and test (20%) sets. The resulting partitions consist of 157,540 articles in the training set, 52,571 articles in the validation set, and 52,759 articles in the test set.