Decoding Patterns of Data Generation Teams for Clinical and Scientific Success: Insights from the Bridge2AI Talent Knowledge Graph
Abstract
High-quality biomedical datasets are essential for medical research and disease treatment innovation. The NIH-funded Bridge2AI project strives to facilitate such innovations by uniting top-tier, diverse teams to curate datasets designed for AI-driven biomedical research. We examined 1,699 dataset papers from the Nucleic Acids Research (NAR) database issues and the Bridge2AI Talent Knowledge Graph. By treating each paper's authors as a team, we explored the relationship between team attributes (team power and fairness) and dataset paper quality, measured by scientific impact (Relative Citation Ratio percentile) and clinical translation power (APT, likelihood of citation by clinical trials and guidelines). Utilizing the SHAP explainable AI framework, we identified correlations between team attributes and the success of dataset papers in both citation impact and clinical translation. Key findings reveal that (1) PI (Principal Investigator) leadership and team academic prowess are strong predictors of dataset success; (2) team size and career age are positively correlated with scientific impact but show inverse patterns for clinical translation; and (3) higher female representation correlates with greater dataset success. Although our results are correlational, they offer valuable insights into forming high-performing data generation teams. Future research should incorporate causal frameworks to deepen understanding of these relationships.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.09897
- Bibcode:
- 2025arXiv250109897X
- Keywords:
-
- Computer Science - Digital Libraries
- E-Print:
- Accepted by JCDL 2024