file: cdfprofile.oceanography OCEANOGRAPHIC CONVENTIONS for NETCDF Draft no.1 - 10/7/92 Phase 1 - identifying issues CONTENTS: Introduction. Guidelines for Creating Profiles. Guidelines for Robust Applications. Oceanographic Profile Issues. Introduction. This file, cdfprofile.oceanography, will document a profile of conventions to standardize the usage of netCDF for many oceanographic data applications. The primary goal of this standardization is to facilitate data interchange. The document is expected to develop in three phases: Phase 1 - We will enumerate the issues relevant to oceanographic data storage with netCDF and provide several alternative resolutions to issues. In Phase 1 **all** aspects of this document are open to comments and revision. Phase 2 - Through email dialog we will discuss the resolutions of the issues that have been identified working towards a consensus on each issue. In Phase 2 the basic layout and strategy of the document will be fixed, however all technical content will be revisable. New issues will be added only if there is general agreement that they are vital to the document. (At this stage the dialog may be shifted off of netcdfgroup.) Phase 3 - We will edit the document removing ambiguities and producing stable, readable text. Issues that become apparent during implementations must also be resolved at this time. The entire process of arriving at a standardized profile should be open and should reflect the views of all members of the "oceanographic community" (an admittedly ambiguous term) who wish to participate. If a consensus cannot be reached on some issues it may be necessary to formalize a "voting" procedure to resolve the issues. (These procedures could be defined in Phase 1 of the document? volunteer?) The scope of this work is broader than what we can hope fully to achieve. Some issues may need to be classified as "Beyond the scope of this document". Again, we should try to reach general agreement before classifying an issue this way. Producing a final document that unambiguously describes all of the issues and resolutions will clearly be a significant piece of work. This can only be accomplished if we each offer complete and concise text when we make a contribution. I (Steve Hankin) will volunteer to serve as the document editor - pulling our contributions together into a single document and making it available via email and/or anonymous ftp. Since this is to be an entirely open process please speak your mind if you know of a preferable prospect for document editor. This document presumes the contents of "conventions.info" (unidata.ucar.edu anonymous ftp directory pub/netcdf) and will not duplicate what is already described there. As both conventions.info and profile.oceanography will be evolving in parallel we will need to coordinate the documents throughout their evolutions. Guidelines for Creating Profiles. In the process of discussing issues and comparing alternative resolutions an explicit set of "guiding principles" would be an asset. Such principles include (please extend): o keep it simple (avoid proliferation of attributes) o minimize restrictions (don't reduce functionality) o profile-compliant files should remain intelligible to applications that know nothing of the profile (where possible) Guidelines for Robust Applications. Application programs will in general be far more restrictive in scope than the conventions described herein. These application programs can still perform useful work on many netCDF files that observe the conventions if they observe the following "motherhood" rules: o meta rule: don't crash, don't give up if possible o Uninterpretable attributes should be ignored o Variables with unsupported data types should be ignored o Applications should not assume particular units will be attached to particular variable names. o Applications that require recognized variable names should ignore variable names they do not recognize o Applications should avoid assumptions about the structure of the netCDF file: - dimensions may be defined which are unused - variables may use dimensions which have no corresponding coordinates defined - etc. (expand list) Oceanographic Profile Issues. 1) Time axis representation The file "conventions.info" suggests (e.g.) variables: double time(nobs); time:units = "milliseconds since (1992-9-16 10:09:55.3 -600)" (This will be implemented shortly in the udunits library.) Should we impose restrictions on data types (double, float, etc)? How should we standardize the format for the date string? (is this specificed by udunits?) 2) How to determine the orientation of a coordinate variable The orientation of coordinate axes can be specified through a variety of mechanisms: agreed-upon names such as "lat","lon", etc.; implicit orientations inferred from the ordering of dimension names within a variable definition; orientations inferred from the units of the coordinate variable. None of these mechanisms appear to be adequate in all cases. Alternative 1: Minimal restrictions on the naming of coordinate variables and choice of units. Applications should apply a multi-step algorithm to identify orientation as follows: First - check the units of the coordinate variable: Do the units imply a unique orientation (e.g. units of time, "degrees longitude", "layer", etc.) ? If no, then check the name of the coordinate variable: Does the variable name match a template (e.g. *depth*, *lon*, *lat*, *time*, x*, y*, z*, t*, etc.)? Is this approach too complex? What about cases where the orientation remains ambiguous? Alternative 2: Introduce a variable attribute 'orientation' with a suitable naming convention for orientation strings (e.g. "west-east", "south-north") Should this be an optional attribute that can be applied when the Alternative 1 technique fails? 3) Indicating Missing Data Two attributes for missing data have been suggested: missing_value and _FillValue. The missing_value attribute has been dropped in netCDF version 2.0. Is there a need to support both attributes? 4) Case-insensitive Names Should application programs be case-sensitive with respect to attribute and variable names? Should variable and attribute names within a single file be required to be case- insensitive-unique? (This refers to the **names** only; the values of string attibutes such as units would remain case-sensitive.) Alternative 1: Case-insensitive. The peculiarities of Unix and C, while familiar to programmers, are not necessarily comfortable for users. Publication and conversation are complexified by case-sensitive names. Alternative 2: Case-sensitive. Case-insensitivity would lead to incompatibilities with non-oceanographic netCDF files. There are conveniences to the use of e.g. "time", "Time", and "TIME" within the same file. 5) Multiple Time Axes in a File Is there a need for multiple time axes defined within a single netCDF file? Or is there a reason to limit files to a single time axis? (Multiple time axes would conflict with some time encodings that have been discussed that involve global variables.) Alternative 1: Permit multiple time axes (no conflict with time axes as suggested in conventions.info). 6) Need a global attribute to indicate profile type and revision There should be a global attribute informing application programs explicitly what netCDF profile and revision a file adheres to. This issue needs to be addressed at a level higher than this oceanographic profile but some recommendations would be appropriate. Alternative 1: :profile = "oceanography"; :profile_version = 1.0; 7) Standardized (Conventional) Variable Names The meteorological community has suggested a list of standardized variable names (see conventions.info). Should this list be extended to include additional oceanographic variables? How should these names fit this into the framework of "resources" as described in conventions.info? (We need input from folks familiar with "resources" in this context.) 8) Name String Lengths Should attribute and variable names be further restricted with respect to length beyond the limit of `MAX_NC_NAME' described in conventions.info? Alternative 1: a practical limit of (say) 32 characters should be imposed. This is consistent with most programming languages. It simplifies the formatting burdens on applications. It does not prevent application programs from supporting longer names. Alternative 2: Any limit other than the default limit of MAX_NC_NAME (128) could lead to incompatibilities with non- oceanographic netCDF files. 9) Multiple coordinate variables of same orientation Is there a need to support multiple coordinate variables of the same orientation in a single netCDF file? (such multiplicity would preclude the use of strict names such as "lat" to designate geographical coordinate variables though templates like *lat* would still be possible) Alternative 1: yes, there is a need (e.g. multiple current meter arrays with differing deployment depths; in modelling it is often desirable to compare results computed on numerous different axes of the same orientation - restrictions on naming of axes could be very inconvenient) 10) Requiring non-coordinate variables to be 4 dimensional Is it acceptable to insist that all non-coordinate variables be represented as 4-dimensional (lat/long/depth/time) structures? Should there be other restrictions on number of axes? Alternative 1: dimensionality should not be restricted to exactly 4 - the restriction would preclude some data types and would force misrepresentation of others. Some restriction on the maximum number of dimensions for a variable would, however, ease the burden on application writing. 11) Mandatory ordering of geographical dimensions Is it acceptable to mandate that if dimensions with geographical significance are used in defining a variable they will be ordered as lat-lon-depth-time (i.e. time as the slowest moving axis)? Alternative 1: yes with reservations - are there serious performance penalties? Alternative 2: no - applications require greater flexibility than this. Perhaps a standard ordering could be defined and an attribute introduced that would indicate permutations. Example: var:permutation = "TXYZ"; 12) Coordinate Systems As mentioned in conventions.info there is work underway at unidata on this subject leading towards the development, presumably, of a collection of conventional attributes and a new Unidata library, `udgeoref'. Is this work sufficient for oceanographic data? Is this beyond the (initial) scope of this document? 13) Application-specific attributes Would it be useful to standardize a collection of attributes that would coach application programs in areas not directly related to the data content - for example attributes that recommended display techniques such as preferred_display_style="contour" preferred_display_map="spherical polar" Candidates? ... 14) Climatological Axes What is the best method to represent a climatological time axis? Alternative 1: attach the (boolean) variable attribute "periodic" to the time coordinate axis indicating the axis ends "join" modulo-fashion (this solution is useful for any periodic axis - also applicable to longitude). What about the base-date string (see issue 1)? time: periodic = " "; Alternative 2: Like alternative 1 but the attribute should indicate the "branch points" of the periodicity: time: periodic_values = 0.,365.; 15) Use of Boolean Attributes Issue 14 raises the general question of the appropriateness of boolean attributes (whose presence or absence indicates a modal state). There is no explicit mechanism in netCDF for creating a value-less attribute (see Issue 14 Alternative 1). Should profile.oceanography avoid boolean attributes? Or is this largely an aesthetic issue of the appearance of CDL files? Could CDL be extended in a future revision to support e.g. time: periodic; 16) Vertical axis orientation Often oceanographic data is organized with positive down on vertical axes. What is the best mechanism to indicate this in a netCDF file? (A similar question arises on latitude axes which may be south-positive or north-positive.) Alternative 1: Introduce a (boolean) coordinate variable attribute "reversed". Alternative 2: Combine this property together with others that have been discussed in a new attribute depth: properties = "reversed, coordinates, vertical"; 17) Longitude axis encodings Longitudes encodings are not standardized - they may be continuous across the dateline or continuous across the prime meridian; either westward or eastward may be positive; the range may be -180 to 180 or 0 to 360 or some other choice. How should netCDF convey this encoding? Alternative 1: 4 variable attributes applied to the longitude coordinate variable: - "reversed" for X positive, westward - "discontinuity"=value (always give the minimum value) - one of "Greenwich=value" or "dateline=value" e.g. To define a longitude axis from 0 to 360, positive eastward, with zero representing Greenwich variables: float lon(lon); lon:Greenwich=0.; lon:discontinuity=0.; Alternative 2: modify Alternative 1 by replacing the "discontinuity" attribute with lon:periodic_values = 0., 360.; 18) Unequally spaced coordinates Is the location of grid points sufficient information to fully describe a coordinate axis with irregularly-spaced points? Or do we need auxiliary machinery to represent the boundaries between points? Alternative 1: There are cases that require explicit boundaries between cells on an axis e.g. data collected in unequal bins. Is this a special-purpose need beyond the scope of this document? 19) Huge Data Sets / Multiple Files Should we provide a standardized mechanism for associating multiple files in a single "project"? How should it function? as a time axis distributed among files? as multiple variables distributed among files? Is this beyond the scope of this document? Alternative 1: a "parent" netCDF file with variables and attributes suitably defined to point to "child" files. Alternative 2: a file naming convention such as my_cdf.001, my_cdf.002, my_cdf.003, ... that will implicitly concatenate netCDF files along their record (or time?) axis. 20) Representing Sigma Coordinate Systems How should variables defined on sigma coordinate grids be represented? Is this question within the scope of this document? Will it covered by the `udgeoref' library? Alternative 1: A variable defined on a sigma coordinate system should possess an attribute "sigma". The coordinate variable corresponding to the vertical dimension should exist and have simple enumerated values 1, 2, ..., n. The coordinate variable should further have an attribute "sigma_positions" (?better name?) which gives the name of a variable containing the z coordinates. The z coordinate variable should be defined on the same dimensions as the original variable. e.g. variables: float u(lat,lon,level,time); // on sigma coords// u: sigma = " "; integer level(level); level: sigma_positions = "depths"; float depths(lat,lon,level); // time may be a dimension, too// depths:units="meters"; ************************** Real-Time and Shipboard data collection? What are the special issues? How to represent a cruise track? (** a requirement? **) How to store variables with differing sampling intervals? (Beyond Scope?) Arctic oceanography What are the special issues? Climate research What are the special issues? Chemical oceanography What are the special issues? Biological Oceanography What are the special issues? Compressed data Are there special cases where compression can fit a general framework? Other special topic issues? | NOAA/PMEL | ph. (206) 526-6080 Steve Hankin | 7600 Sand Point Way NE | FAX (206) 526-6744 | Seattle, WA 98115-0070 | hankin@noaapmel.gov