Automating the Compilation of Potential Core-Outcomes for Clinical Trials
Abstract
Due to increased access to clinical trial outcomes and analysis, researchers and scientists are able to iterate or improve upon relevant approaches more effectively. However, the metrics and related results of clinical trials typically do not follow any standardization in their reports, making it more difficult for researchers to parse the results of different trials. The objective of this paper is to describe an automated method utilizing natural language processing in order to describe the probable core outcomes of clinical trials, in order to alleviate the issues around disparate clinical trial outcomes. As the nature of this process is domain specific, BioBERT was employed in order to conduct a multi-class entity normalization task. In addition to BioBERT, an unsupervised feature-based approach making use of only the encoder output embedding representations for the outcomes and labels was utilized. Finally, cosine similarity was calculated across the vectors to obtain the semantic similarity. This method was able to both harness the domain-specific context of each of the tokens from the learned embeddings of the BioBERT model as well as a more stable metric of sentence similarity. Some common outcomes identified using the Jaccard similarity in each of the classifications were compiled, and while some are untenable, a pipeline for which this automation process could be conducted was established.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2021
- DOI:
- 10.48550/arXiv.2101.04076
- arXiv:
- arXiv:2101.04076
- Bibcode:
- 2021arXiv210104076B
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Machine Learning