Thirty years ago, Paul Ginsparg – professor of information science and physics at Cornell – created the preeminent open-access, pre-print repository, arXiv, and in so doing revolutionized how scientific communities disseminate research. To date, nearly 2 million scholarly articles are housed on arXiv, and total downloads of research papers posted to the repository exceed 2 billion.

In a guest commentary published in Nature Reviews Physics, Ginsparg reflects on lessons learned from arXiv’s three decades of information sharing and how those lessons can inform solutions to problems like “freely flowing misinformation.” On this subject, he notes the machine learning framework he created to flag questionable research submissions to arXiv. Where once a group of active scientists served as quality control, an algorithm now checks new submissions against the entire database in milliseconds.

Ginsparg, Paul

Ginsparg, Paul
Ginsparg, Paul

"Much of the internal human effort is now directed to mediating and adjudicating the various human and robotic oversights at scale,” he wrote.

Elsewhere, Ginsparg conveys how vital pre-print repositories like bioRxiv and medRxiv were as forums for expert open commentary on COVID-19-related research.

“A preprint reporting results of a rigorous clinical study on the drug dexamethasone led to its deployment in the half-year prior to the study’s appearance as a journal publication, potentially saving many lives,” he wrote. “And it was a preprint that pushed back against an actual health hazard, by correcting misconceptions behind the long-assumed 5 μm boundary between (falling) droplets and (airborne) aerosols, and signalling the need for more effective revised health precautions against COVID-19 spread.”