Enhancing model generalization for relation extraction in biomedical domain

Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Significant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data and easily overfit when the training datasets are small, which usually leads to poor generalization. In this work, we consider four different methods to help deep learning models generalize better on relation extraction tasks in biomedical domain. Each of our methods improve on the state-of-the-art BERT-based model on benchmark sets in the biomedical literature. ☐ First, we will investigate the method of enhancing the generalization of transformer-based BERT model on the relation extraction tasks. While there have been several adaptations of the BERT model to the biomedical domain, those models generalize differently on the downstream tasks. Motivated by this observation, we investigate the impact of additional domain adaptation by adding another level of adaptation on sub-domain data to bridge the gap between domain knowledge and task-specific knowledge. We show that further adaptation on task-specific sub-domains improves the results of leading system on different benchmark sets. ☐ Second, we will refine the fine-tuning process of BERT-based models. We show that the traditional architecture used in relation extraction fails to utilize all the knowledge embedded in the BERT model. After using summarized information from all the outputs in the final layer, we can improve the performance of relation extraction models. ☐ Third, we will try to improve model generalization by augmenting the manually annotated data using distant supervision. Distant supervision provides an inexpensive way to obtain annotated data. Using knowledge bases related to the relation extraction tasks, we can create large amounts of annotated data with no human effort. After investigating multiple methods to reduce noise in the automatically created training sets, we find that simple combination of human-labeled and the automatically generated data does not necessarily result in improved performance. However, our experiments show that by applying transfer learning technique, we can obtain significant gains over models trained on just the human labeled sets. ☐ Finally, we will investigate contrastive learning for improving the text representation from the BERT model for the relation extraction tasks. Contrastive learning can yield a better representation by comparing the similarity and dissimilarity of real data and augmented data. In our framework, we utilize a unique contrastive pre-training step tailored for the relation extraction tasks by seamlessly integrating linguistic knowledge into the data augmentation. Also, we investigate how large-scale data constructed from the external knowledge bases can enhance the generality of contrastive pre-training of BERT. The experiment results on three relation extraction benchmark datasets demonstrate that our method can improve the BERT model representation. In addition, we explore the interpretability of models by showing that BERT with contrastive pre-training relies more on rationales for prediction.
Description
Keywords
BERT, Biomedical text mining, Deep learning, Relation extraction, Transformer
Citation