Getting started with icoscp_core

The examples in this section can be tried on a public Jupyter Hub running Python3 notebooks, where the library is preinstalled, for example https://exploredata.icos-cp.eu/

Please, click here to request access to the Explore-Data service.

If run on a standalone machine rather than an ICOS Carbon Portal Jupyter Hub instance, the data access examples assume that the authentication has been configured as explained in the next section.

General note on metadata

An important background information on ICOS metadata is that all the metadata-represented entities (data objects, data types, documents, collections, measurement stations, people, etc) are identified by URIs. The metadata-access methods usually accept these URIs (or their lists) as input arguments.

Discover data types

Data type is the main "dimension" used to classify the ICOS data objects. It's an "umbrella" term aggregating a number of other metadata properties, such as label, data level, project, theme, object format, etc.

from icoscp_core.icos import meta
# fetches the list of known data types, including metadata associated with them
data_types = meta.list_datatypes()

data_type_names = [dt.label for dt in data_types]
data_type_uris = [dt.uri for dt in data_types]

# data types with data access
previewable_datatypes = [dt for dt in data_types if dt.has_data_access]

Discover stations

All measurement stations in ICOS metadata have a property called station id. However, this id is not guaranteed to be unique to a station, as it is sometimes reused by co-located stations, and sometimes two "incarnations" of a station, ICOS and "pre-ICOS" one, coexist in the metadata for a while. The only true id is a URI.

from icoscp_core.icos import meta, ATMO_STATION

# fetch lists of stations, with basic metadata
icos_stations = meta.list_stations()
atmo_stations = meta.list_stations(ATMO_STATION)
all_known_stations = meta.list_stations(False)

# get fully detailed metadata for a station
htm_uri = 'http://meta.icos-cp.eu/resources/stations/AS_HTM'
htm_station_meta = meta.get_station_meta(htm_uri)

List data objects

To select and filter (using various criteria), sort and list data objects, one can use list_data_objects method. All its arguments are optional, and by default it returns 100 latest (by upload time) data objects.

from icoscp_core.icos import meta
# discovered/chosen data type uri for ICOS ATC CO2 Release
co2_release_dt = 'http://meta.icos-cp.eu/resources/cpmeta/atcCo2L2DataObject'
latest_co2_release = meta.list_data_objects(datatype=co2_release_dt)

latest_htm_co2_release = meta.list_data_objects(datatype=co2_release_dt, station=htm_uri)

Batch data access

For lists of uniform data objects of the same data type (or, more generally and exactly, sharing variable metadata), like latest_co2_release from the previous example, the most efficient way of fetching the data is as follows:

from icoscp_core.icos import data
co2_release_data = data.batch_get_columns_as_arrays(latest_co2_release, ['TIMESTAMP', 'co2'])

The result of this call is an iterator ("lazy" sequence) that gets evaluated when used (iterated). Each element of the iterator is a pair, where the first value is an element from latest_co2_release, and the second value is a dictionary mapping variable names to numpy arrays with their values. This output can be used as is for many purposes, but if it is desirable to convert it to pandas DataFrames, it can be done like so (preserving the "lazyness"):

import pandas as pd
co2_release_data_pd = ( (dobj, pd.DataFrame(arrs)) for dobj, arrs in co2_release_data)

Examples

See Examples for more lengthy examples using all of the functionality introduced above.

Accessing documentation

As this library depends on icoscp_core, all the functionality of the latter can be used, not only the examples from above. It is introduced on the PyPi project page, and the source code is available from GitHub.

To discover all the rich possibilities of filtering, sorting and paging the lists of the data objects, it is helpful to read the Python docstring of list_data_objects method:

from icoscp_core.icos import meta
help(meta.list_data_objects)

The method signature is not easily readable due to expansion of type annotations, but the docstring explains the method parameters in detail.

The output from list_data_objects is a list of DataObjectLite instances. Documentation of this class can be accessed as follows:

from icoscp_core.metaclient import DataObjectLite
help(DataObjectLite)

Similarly, the output from list_datatypes is a list of DobjSpecLite instances, whose docstring is accessible like so:

from icoscp_core.metaclient import DobjSpecLite
help(DobjSpecLite)

Naturally, one can also request Python help on the whole meta constant (which is in fact an instance of class MetaClient), and on data constant (which is an instance of DataClient), and on all the methods therein:

from icoscp_core.icos import meta, data
help(meta)
help(meta.get_dobj_meta)
help(meta.get_collection_meta)
help(meta.get_station_meta)
help(data)
help(data.batch_get_columns_as_arrays)
help(data.get_columns_as_arrays)

Finally, MetaClient's methods fetching detailed metadata (e.g. get_dobj_meta, get_collection_meta, get_station_meta) return classes who are (or whose constituents are) defined inside module icoscp_core.metacore. This module is not available in the source code on GitHub because it is autogenerated and auto-imported, but can be very instructive to examine and use as a reference. Due to type annotations, it effectively contains metadata specification for all the entities available from the metadata repository. Standalone library users can find it inside their Python installation folder (can be venv or .venv if using a virtual Python environment) at location lib/icoscp_core/metacore.py. Jupyter users can inspect the classes in this module by calling help on it:

from icoscp_core import metacore
help(metacore)