Full Explanation
Overview
There are three different sections in an iCSV file. The METADATA section, containing all information that one wants to hand down about the location, circumstances and origin of the data. The FIELDS section contains information on how the data is stored, which column contains which variable, what the names are and many more. The DATA section then contains the data as usual in a CSV.
Any line in the header sections, METADATA and FIELDS, need to be prepended by a '#'. They contain some required keys, which will be specified later, and explain how the data should be read. The DATA section only contains the data values, seperated by a delimiter specified in the METADATA section.
Firstline
Any iCSV file needs to start with the line, assuming that the version of the format is 1.0, optionally the name of an application profile needs to be added in all capital letters:
# iCSV 1.0 UTF-8 {APPLICATION PROFILE}
This will give a decoder the necessary information to read the file. And indicate, that only UTF-8 characters are supported.
Metadata
Next follows the METADATA section. It starts with a disclaimer:
# [METADATA]
after which all the metadata follows with key/value(s) pairs in the form of:
# key = value
where one key occupies one line.
Required Metadata
For a file to be a valid iCSV file, we enforce a few METADATA keys:
field_delimiter-
The delimiter that is used to separate data values, can be one of [ ',' '|' '\' '/' ':' ';' ]
geometry-
Information about the location/extent... of the data. Needs to be one of:
- POINT(float float)
- POINTZ(float float float)
- WKT_string, i.e. any valid WKT string
- column name, i.e. the name of the column in which this information can be found (e.g. for moving sensors)
srid-
The coordinate system used to handle the geometry in the form of: EPSG:0000 (with the valid epsg code instead of 0000)
For different applications, the required Metadata could change, to facilitate interoperability in specific disciplines. Those changes can be found under 'Application profiles'.
Recommended Metadata
station_id-
An alphanumeric string to represent the station/satallite... the data is coming from
nodata-
The value that is used to indicate missing data. (integer or float)
timezone-
The timezone of the time used in the data. (integer, float or tz_string)
doi-
A unique identifier for the data set
timestamp_meaning-
The meaning of timestamps, if present in the data. ('beginning' | 'end' | 'middle' | 'instantaneous' | 'other' | 'undefined')
ACDD-
Any available Metadata from the Attribute Convention for Data Discovery
Of course any other metadata the you wish to add you can add in similar fashion.
Fields
After the METADATA comes the FIELDS section, starting with:
# [FIELDS]
After this in each line we again have a key/values pair in each line. However, here we need for each data column one value, seperated by the field_delimiter specified in METADATA.
Required Fields
Here we only require one field:
fields-
The list of column names in the data. (e.g. timestamp, PSUM, snow height (assuming "," as field_delimiter))
Recommended Fields
We also advise to add some more information on the data columns, specifically:
units_multiplier-
A multiplier that needs to be used, to obtain the actual values.
units_offset-
An offset that is used for the values.
units-
The units in each column
long_name-
A long explanatory name of the variable in the column.
standard_name-
The standard name set by (WMO?)
timestamp_meaning-
The meaning of timestamp, if it changes between variables.
Any other information you wish to convey on the columns is possible to set in the same fashion.
Data
The actual data is stored in the data section, starting with:
# [DATA]
After this no more '#' are allowed. Each following line contains the data values, one per column, separated by the field_delimiter, and matching the number of 'fields'. Whitespaces are allowed between the value and the delimiter.