Workshop on Information Extraction from Scientific Publications

The number of scientific papers published per year has exploded in recent years, strengthening its value as one of the main drivers for scientific progress. In astronomy alone, more than 41,000 new articles are published every year and the vast majority are available either via an open-access model or via pre-print services. Indexing the article’s full-text in search engines helps discover and retrieve vital scientific information to continue building on the shoulders of giants, informing policy, and making evidence-based decisions. Nevertheless, it is difficult to navigate in this ocean of data; finding articles rely heavily on string matching searches and following citations/references. Still, new approaches are necessary to differentiate the signal from the noise more easily (e.g., finding the key articles about the medical condition we are interested in).

Simple string matching has substantial limitations, human language is ambiguous in nature, context matters, and we frequently use the same word and acronyms to represent a multitude of different meanings. Extracting structured and semantically relevant information from scientific publications (e.g., named-entity recognition, summarization, citation intention, linkage to knowledge graphs) allows better selection and filter articles.

The Workshop on Information Extraction from Scientific Publications (WIESP) is a forum to foster discussion and research using Natural Language Processing and Machine Learning. In this space, leading professionals, organizations, early career researchers and students can cooperate towards building the algorithms, models, and tools that will pave the way for machine comprehension of science in the future.


  • Scientific document parsing
  • Scientific named-entity recognition
  • Scientific article summarization
  • Question-answering on scientific articles
  • Citation context/span extraction
  • Structured information extraction from full-text, tables, figures, bibliography
  • Novel datasets curated from scientific publications
  • Argument extraction and mining
  • Challenges in information extraction from scientific articles
  • Building knowledge graphs via mining scientific literature; querying scientific knowledge graphs
  • Novel tools for IE on scientific literature and interaction with users
  • Mathematical information extraction
  • Scientific concepts, facts extraction
  • Visualizing scientific knowledge