Biomedical research is one of the largest areas of present-day science and embeds the hope and potential to improve the lives of the general public. In order to understand how individual scientists choose individual research questions, we study why certain genes are well studied but others are not. While it has been previously observed that most research on human genes only concentrates on approximately 2,000 of the 19,000 genes of the human genome, the reasons for this ignorance are largely unknown. We systematically test explanations for this observation by compiling an extensive resource that characterizes biomedical research, including but not limited to hundreds of chemical and biological properties of gene-encoded proteins, and the published scientific literature on individual genes. Using machine learning methods, we can predict the number of publications on individual genes, the year of the first publication about them, the extent of funding by the National Institutes of Health, and the existence of related medical drugs. We find that biomedical research is primarily guided by a handful of generic chemical and biological characteristics of genes, which facilitated experimentation during the 1980s and 1990s, rather than the physiological importance of individual genes or their relevance to human disease.
Stoeger T, Gerlach M, Morimoto RI, Nunes Amaral LA (2018) Large-scale investigation of the reasons why potentially important genes are ignored. PLoS Biol 16(9): e2006643. https://doi.org/10.1371/journal.pbio.2006643