Data Management Plans#

This guide is to help prepare a Data Management Plan (DMP), using the NOAA Omics Data Management Plan Template (v1.2) provided in the noaa-omics-templates Github repository. Those with NOAA accounts can access this Google Docs version. DMPs are often required by internal and external funding programs, and will soon be required of all NOAA projects by NOAA Management of Environmental Data and Information directives. Beyond these requirements, DMPs are also valuable tools for developing internal and external data management workflows.

What organizational level should I write my DMP?#

DMPs can be made at the level of:

  • a single project containing multiple data types
  • a Principal Investigator (PI) for standardized data management of multiple projects with the same data types, or
  • at the group-wide level to describe a standard workflow for particular types of data or projects across multiple PIs.

The level of a DMP can be determined by how centralized data management is at your institution. For example, if you are a research group with 10 people conducting metabarcoding surveys, and everyone follows the same general workflow for collecting samples, proccessing the data, storing the data, and sharing data externally, then you may only need one DMP plan. However, if each person takes a different approach to any of those steps, then separate DMPs would be preferable. DMPs plans are most useful when they can describe in detail the who, what, where, and how of a data management workflow.

Should my DMP also cover non-omics data associated with a project? #

Often, NOAA Omics projects are only part of a much larger research effort including other research groups, for example a research cruise with eDNA sampling and chemical oceanography measurements. In general, we recommend focusing your DMP on the data that you or your group are responsible for from collection to public release. Sometimes these measurements are a critical part of an Omics study and are even included as sample metadata during submission to Omics repositories (eg, NCBI SRA). In this case, describe how you obtain those data from collaboraters and associate it with your Omics samples.

If your collaboraters also have a DMP (which they should!), you can reference their Point of Contact if needed, or provide details on how your data can be associated with their data through persistent digital identifiers.

Guidance for the NOAA Omics DMP Template#

The NOAA Omics Data Management Plan Template is based on the NOAA Data Governance Committee's DMP Template (V3), but more concise and with more guidance and examples geared toward typical NOAA Omics applications. As with anything, the first time you fill out the template will be the most time-consuming, but subsequent DMPs will take much less time as you will have a starting point.

The template includes [bracketed descriptions] which should be overwritten, and highlighted examples which should be replaced with your own text.

1. General Description of Data to be Acquired and Managed#

1.2 Project or data collection goal#

Short paragraph description of the goals of the project or data collection program.

1.8 Type(s) of data.#

This section is one of the most important ones of the DMP. Here we recommend providing a table of data types that will be generated, along with the format, and metadata standard used if applicable. We have provided an example table with a non-exhaustive list of common data types generated by NOAA Omics and our recommended metadata standard and repository options. You can use this table as a starting point for your DMP by removing rows that are not relevant to your data, editing repositories as needed, adding rows if needed, and finally removing the last column ("Common study types") prior to completion as this is just for reference.

Example data types table#
Data Type Format Metadata standard (if applicable) Common study types
Raw DNA sequence data Raw FASTQ NA amplicon, metagenomic, population genomics, RNA-seq
Extracted DNA Physical NA All
Environmental measurements Tabular MiXS, DwC Amplicon survey, functional genomics
Experimental conditions Tabular MiXS functional genomics, population genomics
Protocols and methods (sampling, molecular, analysis) persistent URLs, Tabular MiXS, DwC All
Analysis code python/R code, markdown All
Figure code R code All
Taxonomic assignment Tabular NCBI, Worms amplicon, metagenomic, DNA references
Amplicon Sequence Variants Fasta amplicon
Functional genomics data (quantitative gene expression, ChIP-Seq, HiC-seq, methylation seq) Tabular (usually) Functional genomics
RNA transcript assemblies Fasta or SQN RNA-seq
Genome assemblies FASTA or SQN file, optional AGP file Genomics
Quantitative PCR data Tabular qPCR surveys
Mass spectrometry data Raw mass spectra, MZML, MZID metabolomics, proteomics
Feature observation tables BIOM (HDF5) format Amplicon surveys

Note, if you are also collecting physical, biological, and/or chemical observations of the environment it is recommended to list these and provide standards and instrumentation used.

There are other types of data or products that may be generated and should be described as well, such as research manuscripts, images, web portals, databases, and natural resource management plans to name a few.

1.9 Are there any restricted designations for acquired data?#

Use this section to list any restrictions or designations associated with the data, such as data ownership, international and national regulations, privacy regulations, etc.

2. Point of Contact for this Data Management Plan#

This person or persons is often the project or program manager or their designee; someone who is familiar with the data management plan and the data being collected. Add as many people as needed, in a table format if more than one person.

3. Data Management Workflow#

If you already have detailed, publicly accessible documentation on your data management workflow and it has a persistent DOI or URL you can provide that instead!

The data processing workflow is the other critical portion of the DMP. It should be written in either paragraph format or detailed bullet points, focusing on the methods used for data management. Almost all Omics projects will include: - biological and/or environmental samples - data about the sample and where/when/how it was obtained (sometimes called sample metadata) - processed sample material (e.g., extracted DNA,PCRs) - processed data (e.g., raw sequence files), and - data about how the samples were processed and analyzed.

A detailed data management workflow should describe how all these elements are being stored, tracked, and made publically available. Important questions to answer for each of the elements above:

  • Where are [X] physically being stored?
  • What format is [X]? What data standards are being used (if applicable)?
  • How long will [X] be stored there?
  • How are sample IDs associated with metadata and processed data?
  • Will [X] be publicly available? If so, how will access be facilitated and for how long?

We have broken this section up into five sections:

3.1. Sample collection, storage, and processing#

Describe the collection and storage of physical samples. In depth details of collection methods are not required.

3.2. Sample metadata storage and processing#

Describe what sample metadata is collected, how it is recorded, where it is stored, and what data standards are used.

3.3. Processed sample storage and accessibility (eg, extracted DNA, PCRs)#

Describe in brief how samples are processed for omics analysis (e.g., DNA extraction, PCR), how they are physically stored, and how they are associated with Sample IDs and metadata. Also note how metadata on processing is recorded and stored.

Are these materials publicly available?

3.4. Raw and data storage and accessibility#

What format is raw ‘omics data in? Where is it stored? How is it backed up? Where will raw ‘omics data be submitted for public accessibility?

3.5. Processed data storage and accessibility#

What types of processed ‘omics data will be generated? Where will it be stored and how will it be backed up? How will these data be made publicly available? How will these data be associated with raw data and sample data?

3.6 Analysis metadata accessibility#

What format is your analysis metadata (eg, scripts, details on analysis workflow)? Where will it be stored and how will it be backed up? How will these data be made publicly available? How will these data be associated with raw data and sample data?

3.7 Quality control procedures employed#

Quality control procedures should be applied to the raw data, analyzed results, and the sample metadata prior to make these publically accessible. Examples of quality control procedures:

  • gross error checks for data that fall outside of physically realistic ranges (e.g. impossible data or GPS);
  • peer review (e.g., two people will independently inspect sample metadata for errors)
  • comparisons made with other independent sources of the same measurement;
  • blanks or controls used during molecular processing to identify contamination
  • filtering of samples or omics data prior to making publicly accessible

4. Data Access#

4.1 Are there any restrictions on access or use of the data?#

Restrictions may include PII and other sensitive data (export controlled data) and data restricted by contract or other written, binding agreement (permitted to be withheld under the Evidence Act) including commercial data licensed via contract, data obtained from another third party subject to a restrictive license (international partner, CRADA, etc.)

5. Data Preservation and Protection#

We recommend bullet points or a table with rows for each data type, including the long-term archive location, whether a repository or physical storage location.

6. Resources#

Programs must identify resources within their own budget for managing the data they produce.

Short DMP Template for proposals#

Some NOAA proposals require a DMP that is no more than 2 pages. The full NOAA Omics DMP can be adapted using this template:

The [Project Name] (award #), implemented by [Applicant(s) Name] will generate environmental data and information, including [list type(s) of data that will be collected]. Datasets will provide specifics on [Describe the information collected, and collection dates]. [Data type 1] will be collected by [Person/Group Collecting Data] according to the procedures described in [Name the application, manual, or published article that describes data collection protocols], utilizing [expected metadata standard to be used] metadata standards, and stored [Location or Method of Data Storage, with expected open-access, machine readable data format (e.g., .csv, ASCII, .shp,]. [repeat sentence for each data type as needed]

The data will be available to the public starting on [Date No Later than Two Years After Data Collection]. [Data type 1] will be submitted to [repository name], [Data type 2] will be hosted by [repository name]... Metadata about this project and the resulting data will be publically available at [location of project metadata file, eg. NCEI]. Documents and pre-publication manuscripts will be submitted to the NOAA Central Library’s Institutional Repository and made Section 508 compliant.

Contact [Name] at [Phone/Email] for more information on data management. In the past, we have shared similar data at [Describe Past Data Sharing Methods with links, if any]. All future sub-awardees not identified in this plan will have as a condition of their contract acceptance of this data sharing plan. Any additional data sharing stipulations for future subawardees may be outlined at that time and described in their contract.

Other DMP template options#

The NOAA DMP Template in the Data Management Directives Handbook has 8 required sections:
1. General Description of Data to be Acquired and Managed
2. Point of Contact for this Data Management Plan (author or maintainer)
3. Responsible Party for Data Management -- Who is responsible for ensuring the proper management of the data produced?
4. Resources -- Programs must identify resources within their own budget for managing the data they produce.
5. Data Lineage and Quality -- Follows NOAA Information Quality Guidelines on the quality, objectivity, utility, and integrity of information which it disseminates.
6. Data Documentation
7. Data Access
8. Data Preservation and Protection

Alternative external DMP template: NMDC Omics Data Management guide and accompanying DMPTool to generate a template, which has 6 sections:
1. Data Policy Compliance -- Identify any published data policies with which the project will comply, including the Data and Sample Policy of your primary funder as well as other policies that may be relevant if the project is part of a larger coordinated research program.
2. Sample and Data Types -- Describe the data set including basic identification information, average size, volume of estimated number of data files produced
3. Data Standards and Formats -- This section communicates that you are aware of and will abide by community best practices whenever possible.
4. Roles and Responsibilities -- How your data management plan will be executed and ensures that your team’s data management responsibilities are clearly defined.
5. Data Dissemination and Archiving -- describes what the final data products will be and how you will protect data if applicable.
6. Data and Sample Preservation -- This section communicates the sustainability plan for your data, showing your funder that the data products will last after the completion of the project.