You should create and maintain sufficient documentation or metadata (i.e. structured information about the data) to enable research data to be identified, discovered, associated with its owners and creators, linked to other related data or publications, contextualised in time and space, and to have the quality of the data assessed and research results validated.

If you poorly document your data, it will be difficult (or impossible) to find it and manage it in the longer term. Even if you (or others, in future) can find the data, its value will be diminished if it is hard to interpret.

Practices will differ depending on your discipline, but you should always ensure that protocols are agreed early in the project and adopted by all researchers consistently.

Choosing a metadata standard

Some common descriptive standards are available that work for many different kinds of material and across disciplines. One widely-used metadata standard, Dublin Core, facilitates the finding, sharing and management of data. It includes elements such as Title, Creator, Subject, Date and Type, and can be used to describe many different types of content.

In many disciplines, you will find an existing standard specifically designed for describing and sharing data for that community. The UK’s Digital Curation Centre provides a useful list of metadata standards suitable for different disciplines that you can search and browse.

File naming for digital files

Digital file names can be important for identifying and finding digital files. You should develop file naming conventions early in a research project, and agree on these with colleagues and collaborators before data is created.

Conventions will differ depending on the nature and size of a research project. In all cases, filenames should be unique, persistent and consistently applied, if they are to be useful for finding and retrieving data.

An identifier is a reference number or name for a data object and forms a key part of your documentation and metadata. To be useful over the long-term, identifiers need to be:

  • unique - globally unique if possible, but at the very least unique within your particular systems and processes, and
  • persistent - the identifier should not change over time.

The emerging identifier standard for publicly available datasets is the Digital Object Identifier (DOI). Although DOI s have been traditionally used for electronically published journal articles, they can now be assigned to datasets. Griffith University can assign a DOI to a collection that you make available through an institutional repository.

Controlled vocabularies

A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of interest in that discipline. It models the concepts in a discipline by applying labels to the concepts and relating the concepts to each other in a formal structure.

Vocabularies take many forms. They include glossaries, dictionaries, gazetteers, code lists, taxonomies, subject headings, thesauri, semantic networks and ontologies.

Wherever possible, you should use an existing controlled vocabulary. Even if you need to adapt or customise an existing standard, this is preferable to creating something from scratch.

