DataCite DOI prevalence in the published literature by country

In a previous post, I dug a little deeper into the findings of the State of Open Data to try and understand whether there we're any regional trends in data sharing and re-use using the Make Data Count Data Citation Corpus.

Are we ready to start rewarding researchers for Open data?
The State of Open Data 2024: Special Report Bridging policy and practice in data sharing has gone live today. In it, we explored global trends in data sharing practices, focusing on bridging the gap between policy and practice. For the first time, the report incorporates not only survey data but

The 'Make Data Count Data Citation Corpus' looks for both DataCite DOIs and Accession numbers in the published literature. As there are variances in the 'types' of datasets that use DataCite DOIs or Accession numbers, I thought it would be a good idea to try and simplify this and normalise the data further. Dimensions.ai and their Google big query datasets can look at just the number of links to DataCite DOIs from anywhere in the full text of the published literature. As bibliometric data can sometimes take a while to filter through, I decided to look at 2022 for this analysis.

To do this I looked at Country based data linkage count.xlsx in the Data behind State of Open Data 2024 Special Report: Bridging policy and practice in data sharing 

Data behind State of Open Data 2024 Special Report: Bridging policy and practice in data sharing - Country, Funder and Affiliation Datasets
This Dataset contains 3 datasets behind graphs generated in the “State of Open Data 2024 Special Report: Bridging policy and practice in data sharing” The datasets include counts and percentages for papers that link to datasets filtered by Country, Funder and Affiliation DatasetsThe datasets were generated by combining the DataCite Data Citation Corpus (https://corpus.datacite.org/dashboard) with Dimensions (https://www.dimensions.ai/) in Google big query.

By creating Chloropleth maps, we can see where the percentage of papers with links to DataCite DOIs is highest.

We can see some obvious outliers here, and so can filter further. Below I have generated maps for

"The percentage of papers that link to DataCite DOIs in counries that publish more than 1000 papers per year"

and "The percentage of papers that link to DataCite DOIs in counries that publish more than 10,000 papers per year"

This last graph highlights where work needs to be done when it comes to data publishing and reuse in academia. If you are a funder in a country that publishes large volumes of peer reviewed academic publications, that are not encouraging data publication, you are behind. You are not realising the full value of the research you fund. The research you pay for will not be as exposed to the AI models generating new findings based on the research that has already been paid for. Do more. Or in some cases, just do something.

Subscribe to OpenResearch.wtf

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe