CMIP6 Guidance for Data Users

Link back to guide homepage

Karl E. Taylor, Paul J. Durack, Sasha Ames, Martina Stockhause, …

Document overview:

Experiment design
Model output specifications
Accessing model output
Terms of use and citation requirements
Model and experiment documentation
Reporting suspected errors
Registering published work based on CMIP6
CMIP6 organization and governance

1. Experiment design

The CMIP6 protocol and experiments are described in a special issue of Geoscientific Model Development with an overview of the design and scientific strategy provided in the lead article of that issue by Eyring et al. (2016).

Each model participating in CMIP6 will contribute results from the four DECK experiments (piControl, AMIP, abrupt4xCO2, and 1pctCO2) and the CMIP6 historical simulation. These experiments are the only ones directly overseen by the the CMIP Panel, and together these constitute the ongoing (slowly evolving) “CMIP Activity”. They are described in Eyring et al. (2016).

In addition to the DECK and historical simulations, each modeling group may choose to contribute to any of the CMIP6 endorsed MIPs. See the GMD Special CMIP6 Issue for descriptions of each MIP and its experiment specifications. The official names of the currently endorsed CMIP6 MIP activities are recorded in a “json” file.

When called for by the experiment protocol, standard forcing data sets (e.g. Durack et al. (2018)) have been used. Any deviation from the standard forcing are supposed to be clearly documented.

Further documentation about CMIP6 experiments will be available from ES-DOC, and the reference controlled vocabularies used to define and identify these experiments are available in a “json” file and are also rendered in table form.

2. Model output specifications

The CMIP6 Data Request defines the variables requested from each experiment and specifies the time intervals for which they are supposed to be reported. One option for perusing the lists of variables that should be available from at least some experiments is to display the excel spreadsheet.

CMIP6 model output includes metadata and is structured similar to CMIP5 output, but changes have been made to accommodate the more complex structure of CMIP6 and its data request. Some changes have been made to make it easier for users to find the data they need and to enable new services to be established providing, for example, model and experiment documentation and citation information.

As in CMIP5, all CMIP6 output has been written to netCDF files with one variable stored per file. The data have been “cmorized” (i.e., written in conformance with the CF-conventions and all the CMIP standards). The CMIP6 data requirements are defined and discussed in the following documents:

Definition of CMIP6 netCDF global attributes
Reference “controlled vocabularies” (CV’s) for CMIP6
Specifications for file names, directory structures, and CMIP6 Data Reference Syntax (DRS)
Specifications for output file content, structure, and metadata are available in draft google doc.
Guidance on grid requirements
Information on pressure levels requested
Guidance on time-averaging (with masking)

Note that in the above, controlled vocabularies (CV’s) play a key role in ensuring uniformity in the description of data sets across all models. For all but variable-specific information, reference CV’s are being maintained by PCMDI. These CV’s are relied on in constructing file names and directory structures, and they enable faceted searches of the CMIP6 archive as called for in the search requirements document.

As indicated in the guidance specifications for output grids, weights will be provided to regrid all output to a few standard grids (e.g., 1x1 degree). All regridding information (weights, lats, lons, etc.) will be stored consistent with a standard format approved by the WIP.

3. Accessing model output

CMIP6 model output is available through a distributed data archive developed and operated by the Earth System Grid Federation (ESGF). Balaji et al. (2018) provide an overview of the design of additional infrastructure and the configuring of ESGF in supporting CMIP6. The data are hosted on a collection of nodes located at modeling centers or data centers across the world. The data can be accessed through any of the CMIP6 CoG web interfaces, which enable users to search across the entire distributed archive as if it were all centrally located.

See this summary table to view available experiments and models.

Here are the currently active CMIP6 CoG sites (all data can be accessed via any one of these):

USA, PCMDI/LLNL (California) - https://esgf-node.llnl.gov/projects/cmip6/
Germany, DKRZ - https://esgf-data.dkrz.de/projects/cmip6-dkrz/
UK, CEDA - https://esgf-index1.ceda.ac.uk/projects/cmip6-ceda/
France, IPSL - https://esgf-node.ipsl.upmc.fr/projects/cmip6-ipsl/

To get to the search interface click on “More search options” under the large red text near the center of the page. There are additional options for searching through the web interface (see “More Search Options” near the top right of the page) and there is also an API that can be used to perform searches. Tutorials are available by following the link labeled “Technical Support” near the top right of the page. Expert users may also want to use the ESGF Search RESTful API.

Globus is available for downloading some datasets and will provide much better performance for large data volumes. With the Globus Download option, ESGF will prepare a python script for batch downloads, or you can monitor transfers for a “Web Download”. You can download an entire “data cart” in one step if all datasets in the cart are served by Globus. The Globus option requires you to establish a user account on ESGF (see “create account” at top right of CoG pages). Note also that a second logon with a Globus-enabled credential is required (nb.: Google ids in addition to many institutions are accepted).

4. Terms of use and citation requirements

To enable modeling groups and others who support CMIP6 to demonstrate its impact (and secure ongoing funding), you are required to cite and acknowledge those who have made CMIP6 possible. You also must abide by any licensing restrictions, see below.

Please carefully read and adhere to the CMIP6 Terms of Use.

CMIP6 model output datasets and forcing datasets should, according to the terms of use, be cited by any publication that make use of them (see Data Citation Guidelines). It is important to include the version (latest dataset version or if not available the latest data download date) in the data citation of the evolving CMIP6 data. Further information on the data citation concept for CMIP6 is available at cmip6cite.wdc-climate.de and described in Stockhause and Lautenschlager (2017).

Data references give credit to the data providers and enable the traceability of research findings (see contribution to the CMIP6 Model Analysis Workshop). They are provided on two granuliarities: fine experiment (contribution by one model to one experiment) and coarse model/MIP (contribution by one model to one MIP). Data references can be found:

in the CMIP6 CoG sites (see list above): Brief data citation information can be displayed by clicking on “Show citation” next to each dataset. The provided URL or DOI for the finer data reference for the experiment will direct you to the corresponding “landing page” with detailed information including the coarser model/MIP granularity.
using the Citation Service search
searching DataCite’s catalog
using Google’s Dataset Search
following the global attribute “further_info_url” link in the file via the ES-DOC page to the citation “landing page”.

5. Model and experiment documentation

The controlled vocabularies contain basic information about the models, institutions, and experiments in CMIP6. The CMIP6 results will be fully documented and made accessible via the ES-DOC viewer and comparator interface (https://search.es-doc.org). Each CMIP6 model output file includes a global attribute called “further_info_url” which will link to a signpost web page providing simulation/ensemble information, model configuration details, current contact details, data citation details etc. This link is also selectable next to each dataset returned by the CMIP6 CoG search interface. ES-DOC will include documentation of:

Experiments: The ES-DOC project has already recorded documentation of the CMIP6 experiments including lists of forcings, model configuration, numerical requirements, information about building the ensembles, links to citations and contact information of the principal investigators as well as text descriptions and information about the rationale behind each experiment
Models: Models will be described on a realm-by-realm basis (i.e. atmosphere, ocean, sea ice, etc.) as well as the top level (coupled model configuration). ES-DOC provides a variety of tools (script-based, text-based, and form-based) for gathering this information from modeling groups, allowing for personal/institutional preference in the way in which documents are created
Experimental conformance: Each simulation should conform to a number of specific requirements established by the MIP leaders. For example, an experiment may have the requirement that all simulations must start and end on particular dates. The full set of experimental requirements for each experiment can be viewed at https://search.es-doc.org. Sometimes there could be more than one way to meet the requirements, so modeling groups will record information about how each simulation conforms to the specifications
Individual members of an ensemble:. Some ensemble documentation is harvested by ES-DOC from published netCDF files, but additional information will be available in ES-DOC. In each model output file the “ripf” identifier is used to uniquely distinguish each member of an ensemble, but the differences between members may not always be clearly (or correctly) recorded in the “variant_info” global attribute. ES-DOC will therefore serve as the reference source for understanding differences between ensemble members. As described in more detail elsewhere (Definition of CMIP6 netCDF global attributes and ES-DOC for CMIP6), there are 4 indices defining an ensemble member: “r” for realization, “i” for initialization, “p” for physics, and “f” for forcing. Modeling groups will record in ES-DOC the key to interpreting the differences between simulations identified by different indices. In particular for each forcing index, the list of forcing data sets applied in the simulation will be recorded
Computer hardware performance: Information will also be provided by ES-DOC on the hardware used in running simulations (e.g. the number of cores) and also metrics describing the performance of each simulation on its machine (e.g. the number of simulated years per real day, etc.)

6. Reporting suspected errors

Information about discovered issues of CMIP6 data is captured by the ES-DOCs Errata Service. The Errata Service provides the ability to query modifications and/or corrections applied to CMIP6 data in two ways:

Any ESGF user can report an error to the appropriate modeling group (see “contact” attribute in the netCDF files), or through the ESGF user mailing list. After a report is received, the corresponding data manager can create a new errata entry using an easy and user-friendly form. A command line client is also available. The aim is to clearly and concisely document the issue and through the PID integration, this errata service will include all the datasets/files affected when documentation is completed correctly.

7. Registering published work based on CMIP6

Please register on the CMIP6 publication database any articles you publish that make use of CMIP6 output.

8. CMIP6 organization and governance

The CMIP Panel, which is a standing subcommittee of the WCRP’s Working Group on Climate Modeling provides overall guidance and oversight of CMIP activities. Notably it determines which MIPs will participate in each phase of CMIP using the established selection criteria listed in Table 1 of Eyring et al. (2016). On its webpages the CMIP Panel provides additional information that may be of interest to CMIP6 participants, but only the CMIP6 Guide (this document) provides definitive documentation of CMIP6 technical requirements.

The endorsed MIPs are managed by independent committees, but acceptance of endorsement obligates them to follow CMIP’s technical requirements. Thus across all MIPs, the modeling groups can prepare their model output following a common procedure.

The CMIP Panel has delegated responsibility for most of the technical requirements of CMIP to the WGCM Infrastructure Panel (WIP). The mission, rationale and Terms of Reference for the panel can be found here. The WIP has drafted a number of position papers summarizing CMIP6 requirements and specifications. Among these is the CMIP6 reference specifications for global attributes, filenames, directory structure and Data Reference Syntax (DRS). The WIP has also set up a CMIP Data Node Operations Team (CDNOT) to interface with data node managers responsible for serving CMIP6 data. This team provides a direct link from the panels establishing data node requirements to those implementing the requirements.

Information is under preparation describing the governance of the following:

ESGF & CoG & major replication data centers
CF-conventions
ES-DOC
Data citation
Long-term archival (LTA) and data quality assurance (QA)
Evaluation activities
input4MIPs
obs4MIPs