Are we ready to start rewarding researchers for Open data?
The State of Open Data 2024: Special Report Bridging policy and practice in data sharing has gone live today. In it, we explored global trends in data sharing practices, focusing on bridging the gap between policy and practice.
For the first time, the report incorporates not only survey data but also analyses of actual data-sharing behaviours, leveraging sources like Dimensions, the Springer Nature Data Availability Statements (DAS), and the Chan Zuckerberg Initiative Data Citation Corpus (CZI DCC). By combining these datasets, the report provides insights into how researchers actively share data, the impact of mandates, and the effectiveness of policies in fostering open data practices. This evolution from understanding researcher attitudes to tracking their actions aims to address barriers to open data, such as lack of incentives and resource disparities.
Key findings highlight significant regional, institutional, and funder-level differences in data sharing. Here are some of the key findings from the report, and some extra analysis that I have found interesting whilst playing with the Chan Zuckerberg Initiative Data Citation Corpus (CZI DCC) data, combined with Dimensions disambiguation.
When it comes to the percentage of papers linking to datasets, Sweden are winning at scale. Scandinavia in general does well.
If we look at the same data but reduce the threshold from countries that produce >50,000 publications per year to >1000 publications per year, Africa are doing better.
I'm excited to see what effect the NIH data publishing mandate has on their already impressive increase in links to datasets
More important that actual counts, is what percentage of papers are linking to a dataset.
In country funder differences are also important when trying to reverse engineer what is working. Of course, subject specific differences will be playing a role here, with different funders specialising in different areas.
The same approach can be applied to Institutions located in teh same country. Librarians at these organisations can follow best practices of those with better data sharing rates.
The hugely successful data sharing practices at the Francis Crick Institute not only re-emphasise that subject focus may be bringing in bias, but also that legacy institutions come with a lot of legacy baggage, and cannot focus as much budget or people on new initiatives like data sharing.
In order to drive societal change in academia, we need to use both carrots and sticks. To succeed, we need to move the process through the following steps:
- Policy
- Mandate
- Compliance
- Measurement
In the nine years that the State of Open Data has been running, we have seen policies forming and becoming mandates. This has driven a lot of the compliance we see today. Researchers are sharing because they are told they have to as a requirement of funding. The survey continues to tell us that the main reason that is stopping them engaging with open data publishing in a more serious manner is a lack of credit for their open data. Researchers cannot get credit if there is not a way to consistently measure data metrics across platforms and repositories. The CZI DCC allows us to do this. The NIH Data Sharing Index (S-Index) Challenge seeks innovative approaches to quantify and evaluate data sharing practices by biomedical researchers. The challenge is aimed at developing an “S-Index” to measure the extent and effectiveness of data-sharing, encouraging transparency and accessibility in research. Submissions are judged based on originality, feasibility, impact, and scalability of the proposed metrics.
The goal is to foster more widespread and high-quality sharing of scientific data. The DataWorks! Prize organized by the Federation of American Societies for Experimental Biology (FASEB) and NIH, is a challenge that encourages researchers to propose impactful secondary data analysis projects using
existing biomedical data. Being able to measure the impact of researchers sharing their data and ultimately reward them for doing so means that we are on the verge of having both carrots and sticks to enable open data sharing.
The methods behind these figures and the raw data itself, can be found on Figshare at:
Hahnel, Mark; Smith, Graham; Campbell, Ann (2024). The State of Open Data 2024: Special Report Bridging policy and practice in data sharing. Digital Science. Report. https://doi.org/10.6084/m9.figshare.27337476.v1