CMIP6 Participation for modelers

Karl E. Taylor, Paul J. Durack, Michael Lautenschlager and Martina Stockhause

Document overview:

  1. Requirements and expectations
  2. Experiment design
  3. Forcing data sets
  4. Model output fields
  5. Model output requirements
  6. Software for preparing/checking output
  7. Archiving/publishing output
  8. Documentation process
  9. CMIP6 organization and governance

1. Requirements and expectations

Those groups who plan to participate in CMIP6 should (in roughly this order, although model documentation should be provided as early as possible):

2. Experiment design

The CMIP6 protocol and experiments are described in a special issue of Geoscientific Model Development with an overview of the overall design and scientific strategy provided in the lead article of that issue by Eyring et al. (2016)

3. Forcing data sets

In CMIP6 it is essential that all models adopt the same forcing datasets (and boundary conditions). Experts contacted by the CMIP Panel have prepared the forcing datasets, and a new “input4MIPs” activity has been initiated by PCMDI to encourage adherence to many of the same data standards imposed on obs4MIPs data and CMIP data. These datasets are being collected into a curated archive at PCMDI. All conforming datasets can be downloaded via the Earth System Grid Federation’s input4MIPs CoG. Any dataset not yet conforming to the input4MIPs specifications can be obtained from the individual preparing the dataset, as indicated in the input4MIPs summary sheet.

The input4MIPs summary sheet separately lists the CMIP6 datasets needed for the DECK and historical simulations and the datasets needed for the CMIP6-endorsed MIP experiments. The summary provides contact information, documentation of the data, and citation requirements. Included in the collection are datasets specifying emissions and concentrations of various atmospheric species, sea surface temperatures and sea ice (for AMIP), solar variability, and land cover characteristics.

Some of the endorsed-MIP forcing datasets are still in preparation, but should be available soon. Any changes made to a released dataset will be documented in the summary.

4. Model output fields

The CMIP6 Data Request defines the variables that should be archived for each experiment and specifies the time intervals for which they should be reported. It provides much of the variable-specific metadata that should be stored along with the data. It also provides tools for estimating the data storage requirements for CMIP6.

[Further explanation will be added here.]

5. Model output requirements

CMIP6 model output requirements are similar to those in CMIP5, but changes have been made to accommodate the more complex structure of CMIP6 and its data request. Some changes will make it easier for users to find the data they need and will enable new services to be established providing, for example, model and experiment documentation and citation information.

As in CMIP5, all CMIP6 output will be stored in netCDF files with one variable stored per file. The requested output fields can be determined as described above, and as in CMIP5, the data must be “cmorized” (i.e., written in conformance with all the CMIP standards). The CMIP standards build on the CF-conventions, which define metadata that provide a description of the variables and their spatial and temporal properties. This facilitates analysis of the data by users who can read and interpret data from all models in the same way.

The CMOR software library can be used to meet most of the CMIP data requirements, but its use is not mandatory. To ensure that a critical subset of the requirements have been met, a CMIP data checker (“PrePARE”) will be applied before data are placed in the CMIP6 data archive, but PrePARE currently cannot check a file for full compliance with all the data requirements. It is therefore recommended that CMOR be used to write CMIP6 model output.

The CMIP6 data requirements are defined and discussed in the following documents:

Additional metadata requirements are imposed on a variable by variable basis as specified in the CMIP6 Data Request. Many of these are recognized by CMOR (through input via the CMIP6 CMOR Tables), which will ensure compliance.

Note that in the above, controlled vocabularies (CV’s) play a key role in ensuring uniformity in the description of data sets across all models. For all but variable-specific information, reference CV’s are being maintained by PCMDI against which all quality assurance checks will be performed. These CV’s will be relied on in constructing file names and directory structures, and they will enable faceted searches of the CMIP6 archive as called for in the search requirements document. Additional, variable-specific CVs are part of the CMIP6 Data Request. These CV’s are structured in a way that makes clear relationships between certain items appearing in separate CV’s. For example, the CV for model names (“source_id”) indicates which institutions are authorized to run each model, and the complete list of institutions is recorded in a CV for “institution_id”.

As indicated in the guidance specifications for output grids, weights should be provided to regrid all output to a few standard grids (e.g., 1x1 degree). All regridding information (weights, lats, lons, etc.) should be stored consistent with a standard format approved by the WIP. Specifications for the required standard format will be forthcoming.

CMIP6 output requirements that are critical for successful ingestion and access via ESGF will be enforced when publication of the data is initiated. The success of CMIP6 depends on making sure that even the requirements that can not be checked by ESGF are met. This is the responsibility of anyone preparing model output for CMIP6. A minimum set of requirements for publication of CMIP6 data will be met if a dataset passes the checks performed by the PrePARE software package described in the next section.

6. Software for preparing/checking output

To facilitate the production of model output files that meet the CMIP6 technical standards, a software library called “CMOR” (Climate Model Output Rewriter) has been developed and version 3 (CMOR3) is now available at this site, but read the installation instructions available here. This package was first used in CMIP3 and has been generalized and improved for each new CMIP phase. Use of CMOR is not mandatory, but past experience suggests that many common errors in model output files can be avoided by its use. (code & documentation)

For those not using CMOR, some checks for compliance with CMIP specifications can be performed using a new code developed in support of CMIP6: the Pre-Publication Attribute Reviewer for ESGF (PrePARE). For information about tests performed by PrePARE, view the design requirements. PrePARE is included as part of the CMOR software suite and all files produced by CMOR are effectively checked by PrePARE, but PrePARE can be invoked without using CMOR to write the output.

In addition to PrePARE, tests for file compliance with the CF-conventions can be made using a tool called the CF-checker. Both PrePARE and the CF-checker will be run as part of the ESGF publication job stream, and only files passing all tests will be published and made available for download.

It should be noted if data are written using CMOR, additional checks will be performed that will, for example:

Additional codes useful in preparing model output for CMIP6 include:

7. Archiving/publishing output

To be written soon. [What is needed here is simply a few statements introducing ESGF and the data node, with an indication of what modeling centers need to do: either host a data node or find someone else who will serve their data. There will be another guide (for data node managers and operators – see below) that will provide a more complete overview of what is involved in this aspect, but that information probably won’t interest most modelers.]

8. Documentation process

Information will be provided soon about ES-DOCs, which is the project responsible for collecting and making available model and experiment documentation.

9. CMIP6 organization and governance

The CMIP Panel, which is a standing subcommittee of the WCRP’s Working Group on Climate Modeling provides overall guidance and oversight of CMIP activities. Notably it determines which MIPs will participate in each phase of CMIP using the established selection criteria listed in Table 1 of Eyring et al. (2016). On its webpages the CMIP Panel provides additional information that may be of interest to CMIP6 participants, but only the CMIP6 Guide (this document) provides definitive documentation of CMIP6 technical requirements.

The endorsed MIPs are managed by independent committees, but acceptance of endorsement obligates them to follow CMIP’s technical requirements. Thus across all MIPs, the modeling groups can prepare their model output following a common procedure.

The CMIP Panel has delegated responsibility for most of the technical requirements of CMIP to the WGCM Infrastructure Panel (WIP). The mission, rationale and Terms of Reference for the panel can be found here. The WIP has drafted a number of position papers summarizing CMIP6 requirements and specifications. Among these is the CMIP6 reference specifications for global attributes, filenames, directory structure and Data Reference Syntax (DRS). The WIP has also set up a CMIP Data Node Operations Team (CDNOT) to interface with data node managers responsible for serving CMIP6 data. This team provides a direct link from the panels establishing data node requirements to those implementing the requirements. Section 7 provides further information about data node operational requirements.

Information is under preparation describing the governance of the following:

Document version: 6.0.0 (15 June 2017)