July 30, 2019
How can researchers boost the number of citations their papers receive? Mohamed Elgendi, a senior postdoctoral fellow in UBC’s Department of Obstetrics and Gynecology and an adjunct professor in UBC’s Department of Electrical and Computer Engineering, has used machine learning technology to find an answer.
Examining eight “formal” publication features in 200 highly and lowly cited papers, Elgendi discovered that highly cited papers have several things in common: their titles are between 7 and 13 words long; they are authored by six or more people; they include at least six figures and two tables; and they have a minimum of 33,600 characters (including references and excluding spaces).
He also observed that the number of mathematical equations included in a paper does not influence its citation rate, so authors should feel comfortable including as many as necessary.
“Non-formal publication factors like the importance and originality of the research, the reputation of the paper’s authors and the prestige of the journal where the paper was published clearly influence a paper’s citation count — probably far more than the features I examined do,” said Elgendi. “But the results of this study suggest that if a researcher has written a high-quality article, then they may be able to increase the number of times it is cited just by taking a few formal measures.”
To produce his results, Elgendi used “principal component analysis,” a machine learning method that explores correlations between features in an unsupervised manner — that is, without requiring the machine to be trained first. The dataset was comprised of 100 lowly cited and 100 highly cited papers published by the Multidisciplinary Digital Publishing Institute in 2017. The papers, which appeared in 202 peer-reviewed, open-access journals spanning a broad range of disciplines, were selected solely on the basis of their citation counts.
Elgendi’s study is the first to analyze the relationships between multiple publication features — here, the numbers of citations, views, characters, figures, tables, equations, authors and title words — simultaneously. He found that while longer titles were correlated with fewer citations, citation rates increased along with the numbers of views, characters, figures, tables and authors. The three most important features were the number of views, the number of characters and the title length.
Elgendi also used Google Scholar, Web of Science and Altmetric to investigate which words appeared in titles most often, noting that the appearance of the following words in an article title correlated with more views and citations: review, cancer, new, association, analysis, method, monitoring, therapeutic, applications, protein, DNA and health, among others.
For researchers, an increased citation count may lead to professional benefits associated with wider exposure and a higher “h-index” (a measure of an author’s productivity and citation impact), including more collaboration and funding opportunities.
Elgendi’s study, “Characteristics of a highly cited article: A machine learning perspective,” appeared recently in IEEE Access.
Image: "College Math Papers” by Loty is licensed under CC BY 2.0