pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature

Author(s)Ding, Ruoyao
Author(s)Arighi, Cecilia N.
Author(s)Lee, Jung-Youn
Author(s)Wu, Cathy H.
Author(s)Vijay-Shanker, K.
Ordered AuthorRuoyao Ding, Cecilia N. Arighi, Jung-Youn Lee, Cathy H. Wu, K. Vijay-Shanker
UD AuthorDina, Ruoyaoen_US
UD AuthorArighi, Cecilia N.en_US
UD AuthorLee, Jung-Younen_US
UD AuthorWu, Cathy H.en_US
UD AuthorVijay-Shanker, K.en_US
Date Accessioned2015-11-19T20:19:36Z
Date Available2015-11-19T20:19:36Z
Copyright DateCopyright ©2015 Ding et al.en_US
Publication Date2015-08-10
DescriptionFinal published version.en_US
AbstractBACKGROUND Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. METHODS In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN (pivot-based Gene Normalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. RESULTS We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9%(Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publicly available through the PIR text mining portal (proteininformationresource. org/iprolink/).en_US
DepartmentUniversity of Delaware. Department of Computer and Information Sciences.en_US
DepartmentUniversity of Delaware. Center for Bioinformatics & Computational Biology.en_US
DepartmentUniversity of Delaware. Department of Plant and Soil Sciences.en_US
CitationDing R, Arighi CN, Lee J-Y, Wu CH, Vijay- Shanker K (2015) pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature. PLoS ONE 10(8): e0135305. doi:10.1371/journal. pone.0135305en_US
DOI10.1371/journal. pone.0135305en_US
ISSN1932-6203en_US
URLhttp://udspace.udel.edu/handle/19716/17237
Languageen_USen_US
PublisherPublic Library of Scienceen_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.rightsArticle is made available in accordance with the University of Delaware Faculty Policy on Open Access and the publisher's policy.en_US
dc.sourcePLoS oneen_US
dc.source.urihttp://www.plosone.org/en_US
TitlepGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literatureen_US
TypeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
pGenN_1442595806T6552.pdf
Size:
2.39 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: