Searching for historical data can be done in multiple ways, depending on your research question, knowledge of the topic and the data available. In this LibGuide we provide several starting points to search for data. As publishing and archiving historical data online is still a quite recent development, an infrastructure to provide for this is still developing. Therefore, one of the most successful strategies to start a data search is to begin with secondary sources, although other starting points are provided in this LibGuide.
It is important to realize that searching for data can be done with a direct or an indirect search (see the table below). A direct search will in many cases mean fewer steps to find relevant data, but can also have the effect that relevant data is not found or important data archives are not scrutinized. Therefore it is important to use both methods, and to know on which level (data archive or data sets) the search engine you are using is searching on.
Data may be thought of as unprocessed atomic statements of fact. It very often refers to systematic collections of numerical information in tables of numbers such as spreadsheets or databases. When data is structured and presented so as to be useful and relevant for a particular purpose, it becomes information available for human apprehension.
Dataset (also: data set)
Any organised collection of data. ‘Dataset’ is a flexible term and may refer to an entire database, a spreadsheet or other data file, or a related collection of data resources.
Data collection
Datasets are created by collecting data in different ways: from manual or automatic measurements (e.g. weather data), surveys (census data), records of decisions (budget data) or ongoing transactions (spending data), aggregation of many records (crime data), mathematical modelling (population projections), etc.
Database (also: data base; synonym: databank, also data bank)
1. Any organised collection of data may be considered a database. In this sense the word is synonymous with dataset.
2. A software system for processing and managing data, including features to extend or update, transform and query the data.
Note: In the context of this LibGuide, the word database is exclusively used in the second definition.
Metadata
Information about a dataset such as its title and description, method of collection, author or publisher, area and time period covered, licence, date and frequency of release, etc. It is essential to publish data with adequate metadata to aid both discoverability and usability of the data.
Visualization
A visual representation of data is often the most compelling way of communicating the data, bringing out its key features, correlations and outliers.Though many tools exist, creating a visualisation for a dataset is not an automatic process, but requires careful attention to the meaning of the variables, the relations between them and the stories inherent in the data, to design a visual representation that lets the message of the data shine through.
Source |
Direct search: |
Indirect search: find datasets through metadata in databases |
1. Via secondary sources |
|
|
2. Via data archives |
|
|
3. Via statistical databases |
|
|
4. Via internet |
|
|