The Implications of Open Generative Models in Human-Centered Data Science Work: A Case Study with Fact-Checking Organizations
Abstract
Calls to use open generative language models in academic research have highlighted the need for reproducibility and transparency in scientific research. However, the impact of generative AI extends well beyond academia, as corporations and public interest organizations have begun integrating these models into their data science pipelines. We expand this lens to include the impact of open models on organizations, focusing specifically on fact-checking organizations, which use AI to observe and analyze large volumes of circulating misinformation, yet must also ensure the reproducibility and impartiality of their work. We wanted to understand where fact-checking organizations use open models in their data science pipelines; what motivates their use of open models or proprietary models; and how their use of open or proprietary models can inform research on the societal impact of generative AI. To answer these questions, we conducted an interview study with N=24 professionals at 20 fact-checking organizations on six continents. Based on these interviews, we offer a five-component conceptual model of where fact-checking organizations employ generative AI to support or automate parts of their data science pipeline, including Data Ingestion, Data Analysis, Data Retrieval, Data Delivery, and Data Sharing. We then provide taxonomies of fact-checking organizations' motivations for using open models and the limitations that prevent them for further adopting open models, finding that they prefer open models for Organizational Autonomy, Data Privacy and Ownership, Application Specificity, and Capability Transparency. However, they nonetheless use proprietary models due to perceived advantages in Performance, Usability, and Safety, as well as Opportunity Costs related to participation in emerging generative AI ecosystems. Our work provides novel perspective on open models in data-driven organizations.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2024
- DOI:
- arXiv:
- arXiv:2408.01962
- Bibcode:
- 2024arXiv240801962W
- Keywords:
-
- Computer Science - Human-Computer Interaction;
- Computer Science - Artificial Intelligence;
- Computer Science - Computation and Language;
- Computer Science - Computers and Society;
- Computer Science - Emerging Technologies
- E-Print:
- Accepted at Artificial Intelligence, Ethics, and Society 2024