Skip to main content
Universiteitsbibliotheek – LibGuides

Searching quantitative data for historians: 1. How to cite datasets

Is citing data different from citing secondary sources?

A dataset you use for your research should be cited just the same as you cite scholarly literature. Repositories often provide instructions about how to cite a specific dataset. The order of the various elements in a citation can differ per platform. In general, use the citation style that is given in the platform from which you cite the dataset. 
 

A data citation has seven components. Five are human readable: the author(s), title, year, data repository (or distributor), and version number. Two components are machine-readable (the unique global identifier and the universal numerical fingerprint). 

Concepts

Identifier
An identifier is an association between a character string and an object. Objects can be files, parts of files, names of persons or organizations abstractions, etc. Objects can be online or offline. Character strings include URLs, serial numbers, names addresses, etc.

Persistent identifier
A "persistent identifier" is an identifier that is available and managed over time; it will not change if the item is moved or renamed. This means that an item can be reliably referenced for future access by humans and software (Source: https://www.force11.org/node/4770).

Unique global identifier
Begins with either “hdl” (this refers to the international HANDLE.NET system) or “doi” (this refers to a Digital Object Identifier (DOI) system). This identifier is designed to persist even if URLs–or the web itself–are replaced with something else. When the citation appears online, the identifier is hot-linked to the URL that references the identifier, which works in browsers available today. In print, the URL is also included in the citation.
The Universal numerical fingerprint (UNF) is a kind of identifier.

Read more.

Citation examples

This is a description of a data citation from Dataverse using the Joint Declaration of Data Citation Principles (2014) : a synthesis of all previously existing principles and initiatives on data citation. All principles are explained in the box below.
 
Bosker, Maarten; Buringh, Eltjo; Van Zanden, Jan Luiten, 2014, "Replication data  for: From Baghdad to London: Unraveling Urban Development in Europe, the Middle East, and North Africa, 800-1800", https://doi.org/10.7910/DVN/24747Harvard DataverseV1UNF:5:TX7wXbgNZmsEMMpxQhLvQg== [fileUNF] 
citation chart
                                                                Source: https://dataverse.org/best-practices/data-citation
 
Examples of data citations from other repositories are:

Data citation principles

1. Importance
Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.

2. Credit and Attribution
Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data.

3. Evidence
In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited.

4. Unique Identification
A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.

5. Access
Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.

6. Persistence
Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe.

7. Specificity and Verifiability
Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verfiying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited.

8. Interoperability and Flexibility
Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities.


Source: Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 [https://www.force11.org/group/joint-declaration-data-citation-principles-final].