Legacy modules

The following legacy modules are available in the library to find and access data hosted at the Carbon Portal. After a successful installation into your python environment you should be able to load the modules with:

  • from icoscp.dobj import Dobj (recommended)
  • from icoscp.cpb.dobj import Dobj
  • from icoscp.station import station
  • from icoscp.collection import collection
  • from icoscp.sparql.runsparql import RunSparql
  • from icoscp.sparql import sparqls

Dobj

This is the basic module to load a digital object (data set) into memory. You need to know a valid persistent identifier (PID/URL) to access the data. Either you can browse the data portal to find PIDs or you can use the "station" package to find PIDs programmatically (see section station ).

In essence each data object is linked to a unique and persistent identifier in the form of a URL. Hence, each data object has an online landing page. If you select any data object on https://data.icos-cp.eu and then navigate to the PID link (which looks like 11676/pli1C0sX-HE2KpQQIvuYhX01) you end up on the 'landing' page of the document. If you look at the address bar of your browser, you will see a URL similar to https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01. To access the data you need to know this URL or the last part of the URL (pli1C0sX-HE2KpQQIvuYhX01).

Load the module and initialise the Dobj class with a PID.
The following statements yield the same result:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
dobj = Dobj('11676/pli1C0sX-HE2KpQQIvuYhX01')
dobj = Dobj('pli1C0sX-HE2KpQQIvuYhX01')

Properties

Dobj.alt

Retrieve the float value representing the altitude above sea level of the station associated with the Dobj. Be aware, this is not the sampling height for the data. If the station does not have a specified altitude, return None.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
altitude = dobj.alt

Dobj.citation

Return the citation string linked to the Dobj in plain string format.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
citation = dobj.citation

See also class method Dobj.get_citation()

Dobj.colNames

Return a list of available column names for a station-specific time series data object or None if no column names are available. This information is part of the Dobj.meta property, which holds all the available metadata.
Raise a MetaTypeError exception for spatiotemporal data objects.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/j7-Lxlln8_ysi4DEV8qine_v')
column_names = dobj.colNames

Dobj.data

Retrieve the actual data for the PID in Pandas DataFrame format.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/j7-Lxlln8_ysi4DEV8qine_v')
data = dobj.data

Dobj.dobj

Retrieve the PID for the Dobj.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
pid = dobj.dobj

See also Dobj.id

Dobj.elevation

Retrieve the float value representing the elevation above sea level of the station associated with the Dobj. Be aware, this is not the sampling height for the data. If the station does not have a specified elevation, return None.
This property will be deprecated in the next release.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
elevation = dobj.elevation

See also Dobj.alt

Dobj.id

Retrieve the PID for the Dobj.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
pid = dobj.id

See also Dobj.dobj

Dobj.info

Return a dictionary based on the metadata available from the landing page of the ICOS Carbon Portal website.
This property will be deprecated in the next release.

Example:

from icoscp.dobj import Dobj
from pprint import pprint

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
pprint(dobj.info)

See also Dobj.meta

Dobj.lat

Retrieve the float value representing the latitude of the station associated with the Dobj. If the station does not have a specified latitude, return None.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
latitude = dobj.lat

Dobj.licence

Return a dictionary with these keys: 'baseLicence', 'name', 'url', 'webpage', containing information about the dataset's associated license.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
latitude = dobj.lat

Dobj.lon

Retrieve the float value representing the longitude of the station associated with the Dobj. If the station does not have a specified longitude, return None.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
longitude = dobj.lon

Dobj.meta

Return a dictionary based on the metadata available from the landing page of the ICOS Carbon Portal website. Every data object has a rich set of metadata available. You can download an example from the data portal. This will then be parsed into a python dictionary representing the metadata from ICOS. Some of the important key properties, like 'previous', 'next', 'citation', e.t.c., are extracted for easy access and made available as properties.

Example:

from icoscp.dobj import Dobj
from pprint import pprint

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
pprint(dobj.meta)

Dobj.next

Return a landing page in the form of a string, featuring the next version of this data object if it exists.
Return None if a next version does not exist.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/j7-Lxlln8_ysi4DEV8qine_v')
next_version = dobj.next

Dobj.previous

Return a landing page in the form of a string, featuring the previous version of this data object if it exists.
Return None if a previous version does not exist.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/j7-Lxlln8_ysi4DEV8qine_v')
previous_version = dobj.previous

Dobj.station

Return a dictionary containing metadata associated with the station corresponding to the Dobj. Please be aware that prior to version 0.1.15 this has returned a string with station id, which is now available as station['id'].

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/j7-Lxlln8_ysi4DEV8qine_v')
station_meta = dobj.station

Dobj.valid

Return the validity of a Dobj as a boolean. This is kept for backwards compatibility reasons. From icoscp 0.2.0 and onwards, the Dobj class cannot be instantiated with an invalid PID, thus this will always return True.
This property will be deprecated in the next release.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
validity = dobj.valid

Dobj.variables

Return a Pandas DataFrame providing access to all available variables, including the name, unit, type, and the landing page for the format used (int, float, char, ...).
Raise a MetaValueError exception if no variable information is available.

The following example and its output shows the variables of an atmospheric methane concentration data object:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/zjNZLdVDcVUNwonvJIN5GQ3b')
variables = dobj.variables
print(variables)

Output:

index name unit type format
0 TIMESTAMP None time instant, UTC http://meta.icos-cp.eu/ontologies/cpmeta/iso8601dateTime
1 Flag None quality flag http://meta.icos-cp.eu/ontologies/cpmeta/bmpChar
2 NbPoints None number of points http://meta.icos-cp.eu/ontologies/cpmeta/int32
3 ch4 nmol mol-1 CH4 mixing ratio (dry mole fraction) http://meta.icos-cp.eu/ontologies/cpmeta/float32
4 Stdev nmol mol-1 standard deviation of gas mole fraction http://meta.icos-cp.eu/ontologies/cpmeta/float32

Methods

Dobj.get(columns)

Retrieve the actual data for the PID in Pandas DataFrame format. You have the option to retrieve only selected columns (or variables) using a list of variables as an input argument. Only valid and unique entries will be returned. You can see valid entries with Dobj.colNames or Dobj.variables. If columns are not provided, or if none of the provided variables are valid, or if you work with local data, the default DataFrame (with all columns) will be returned.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/j7-Lxlln8_ysi4DEV8qine_v')
col_names = dobj.colNames
# or
# col_names = dobj.variables['name'].to_list()
data = dobj.get(columns=col_names)

Dobj.getColumns(columns)

Retrieve the actual data for the PID in Pandas DataFrame format.

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/j7-Lxlln8_ysi4DEV8qine_v')
col_names = dobj.colNames
# or
# col_names = dobj.variables['name'].to_list()
data = dobj.getColumns(columns=col_names)

See also Dobj.get(columns) and Dobj.data

Dobj.get_citation(format)

Return the citation string in different formats. By default, a plain formatted string is returned.
Possible formats are:

Example:

from icoscp.dobj import Dobj

dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
ris_citation = dobj.get_citation('ris')

See also Dobj.citation


Original legacy Dobj

The actual original legacy (prior to version 0.2.0) Dobj class resides in module icoscp.cpb.dobj. Most code is recommended to migrate to the new implementation of this class residing in module icoscp.dobj, but the old class is preserved to avoid breaking any dependent code by library update, and additionally, there are some known differences between the versions, documented below.

Dobj initialization

Using from icoscp.cpb.dobj import Dobj, you can initialize the Dobj class in one of the following ways:

from icoscp.cpb.dobj import Dobj

my_dobj = Dobj('https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01')
my_dobj = Dobj('11676/pli1C0sX-HE2KpQQIvuYhX01')
my_dobj = Dobj('pli1C0sX-HE2KpQQIvuYhX01')

or create an 'empty' Dobj instance and set the identifier later:

from icoscp.cpb.dobj import Dobj

dobj = Dobj()
dobj.dobj = 'https://meta.icos-cp.eu/objects/pli1C0sX-HE2KpQQIvuYhX01'

Using from icoscp.dobj import Dobj, instantiating an 'empty' Dobj class, and setting the identifier later will result in a TypeError. This functionality was removed to preserve the Dobj class state and prevent unexpected behavior.

Dobj datetime conversion control

(Not available in the new implementation, as all datetime conversions are handled by pandas library in a uniform way)

Dobj.dateTimeConvert = True The binary data representation provides a UTC Timestamp as Unix-timestamp with start point of 1970-01-01 00:00:00. By default, when using from icoscp.cpb.dobj import Dobj this is converted to a DateTimeObject of type pandas._libs.tslibs.timestamps.Timestamp. If you prefer to have the raw Unix-timestamp (numpy.float64), set Dobj.dateTimeConvert = False prior to load the data with .get() or .data or .getColumns().

  • Return BOOL

Dobj.size()

(Not available in the new implementation. To get data object size in bytes from the new Dobj class, call Dobj.metadata.size or Dobj.meta['size'])

The real size of the dobj in [bytes, KB, MB, TB]. Since this object may contain the data, it is no longer just a pointer to data.

  • Return TUPLE (int32, STR), where int32 represents the size and STR the unit. Example output looks like: (4.353, 'MB')

Station

The station module provides a search facility to explore ICOS stations and find associated data objects and data products. There is a lot of information available about the ICOS stations, partner countries, measured variables and much more in the ICOS Handbook. load the module with:

from icoscp.station import station

classmethod station.Station()
The station object is primarily a data structure to store the associated metadata. The metadata is provided with specific and complex Sparql queries. It is possible to instantiate this class on its own, but we recommend using the convenience functions station.getIdList() station.get('StationID') station.getList() as described further below to create the station object. Once you have a created valid station object a list of attributes are available:

Attributes:


Station.country

Country code

  • Return STR

Station.data(level=None)

All associated data object for the station are returned. ICOS distinguishes data in terms of how processed they are.

- Data level 1: Near Real Time Data (NRT) or Internal Work data (IW).
- Data level 2: The final quality checked ICOS RI data set, published by the CFs, 
                to be distributed through the Carbon Portal. 
                This level is the ICOS-data product and free available for users.
- Data level 3: All kinds of elaborated products by scientific communities
                that rely on ICOS data products are called Level 3 data.
  • Return Pandas DataFrame

Station.eas

Elevation above sea level in meter.

  • Return FLOAT

Station.icosclass

Classification for certified ICOS stations. Please consult the ICOS Handbook for further information about the Class 1&2 certificate.

  • Return STR

Station.firstName

PI (Principal Investigator) First Name.

  • Return STR

Station.lastName

PI (Principal Investigator) last name

  • Return STR

Station.email

PI (Principal Investigator) email address

  • Return STR

Station.lat

Latitude for the station.

  • Return FLOAT

Station.lon

Longitude for the station.

  • Return FLOAT

Station.name

Returns the full name for the station.

  • Return STR

Station.project

ICOS Carbon Portal is a data portal from and for the ICOS community. However, the data portal does host more than ICOS data. The station association is listed here (if available)

  • Return LIST

Station.stationId

Set or retrieve the StationId

  • Return STR

Station.theme

For ICOS stations a 'theme' is provided. Please note that, a station may belong to more than one theme, but with different themes. For example the stationId "NOR" (Norunda, Sweden), will give you access to the atmospheric data products, whereas the stationId "SE-Nor" will return the Ecosystem data products.

AS for Atmospheric Stations
ES for Ecosystem Stations
OS for Ocean Stations
  • Return STR

Station.uri

Link to the landing page for the station. Because a station ID may be associated with more than one 'project' this returns a list of URI's

  • Return LIST

Station.valid

True if stationId is found.

  • Return BOOL

Convenience functions

The following three functions are recommended to get information about the available stations at the Carbon Portal and how to get a valid station object (or list of):

station.getIdList()

The station.getIdList takes several parameters:

project: str = 'ICOS' 
theme: list = None 
sort: str = 'name'
outfmt: str = 'pandas'
icon=None

The default call station.getIdList(), which is the same as

station.getIdList(project='ICOS', sort='name')

returns a Pandas DataFrame with columns:

['uri', 'id', 'name', 'icosClass', 'country', 'lat', 'lon', 'elevation', 'stationTheme', 'firstName', 'lastName', 'email', 'siteType', 'project', 'theme']

By default, ICOS certified stations are returned. If project is set to 'all', all known stations (to the Carbon Portal) are returned. By default, the DataFrame is sorted by the column name. You can provide any column name as sorting parameter. The 'id' of the record, can be used to instantiate a station. Hence, it is easy to adjust and filter these records and use the column 'id' as input for station.get().

The theme parameter can be set to either 'AS', 'ES' or 'OS', or a list with a combination of these strings. Here 'AS', 'ES and 'OS' are short for atmospheric, ecosystem and ocean stations.

Thus,

station.getIdList(theme='AS')

will return a DataFrame with all atmospheric ICOS stations.

If the optional argument outfmt='map' is provided

station.getIdList(project='ALL', outfmt='map', icon=None)

a folium map is created with, in this case, all stations (since we use project='ALL'). Stations without a fixed location (like measurements collected from instrumented Ships of Opportunity) will not be included in the map. Each marker in the map represents a station and contains station related information. A user can further customize the style of the map by providing the icon argument [None, 'flag', 'path/to/image.png'].

  • Return Folium Map

station.get()

station.get('stationID')

Provide a valid station id (see getIdList()) to create a Station object. NOTE: stationId is CaseSensitive.

  • Return Station Object

station.getList()

 station.getList(theme=['AS','ES','OS'], ids=None)

This is the easiest way to get a list of ICOS stations. By default, a full list of all certified ICOS stations is returned. You can filter the output by provided a list of themes OR you can provide a list of station id's. NOTE: If you provide a list of id's, the theme filter is ignored.

station.getList(['as', 'os'])

list with ICOS atmospheric and ocean stations

station.getList(ids=['NOR', 'HTM', 'HUN'])

list with stations NOR (Norunda), HTM (Hyltemossa), HUN (Hegyhatsal)

  • Return LIST[Station Objects]

Collection

This module supports to load a collection of digital objects. Data products (https://www.icos-cp.eu/data-products) or collections are an assembly for a specific theme, or project. For example the ICOS community assembled data to provide a base for the Drought anomaly in 2018. This dataset was then used to study the impact of this extreme event, which ultimately led to a series of publications available as 'theme issue' in The Royal Society. Subsequently, the data sets are now public available at the ICOS Carbon Portal (Drought-2018 ecosystem eddy covariance flux product for 52 stations and Drought-2018 atmospheric CO2 Mole Fraction product for 48 stations (96 sample heights).

Load the module with:
from icoscp.collection import collection

classmethod Collection(coll)
(where coll represents a pandas.DataFrame, similar to the output from .getIdList()). BUT only similar. We do NOT Recommend to instantiate this class directly. Please use the function .get(CollectionId). The Purpose of the class documentation is to provide you a list of attributes available, after the .get(CollectionId) return a collection object.


Attributes:


Collection.id

This is the ICOS URI (PID). A link to the landing page on the ICOS data portal.

  • Return STR

Collection.doi

If available, the official DOI in form of '10.18160/ry7n-3r04'.

  • Return STR

Collection.citation

For convenience the citation string provided from [https://citation.crosscite.org/] is stored in this attribute. If you like to have a different format, please have a look at .getCitation description below.

  • Return STR

Collection.title

  • Return STR

Collection.description

  • Return STR

Collection.info()

For convenience all the attributes above (id, doi, citation, title, description). You can choose the output format with fmt=["dict" | "pandas" | "html"]. The default is "dict".

info(self, fmt='dict')
  • Return FMT, default DICT

This returns a list of PID/URI of digital objects associated with the collection.

  • Return LIST[STR]

Collection.data

This returns a list of Dobj associated to the collection. Please refer to the module Digital Object above.

  • Return LIST[Dobj]

Collection.getCitation()

Collection.getCitation(format='apa', lang='en-GB')**

If the collection has a DOI, you will get a citation string from https://citation.crosscite.org/. You may provide any style & language parameters found on the website. Our default style is apa and language en-GB, which is stored in the attribute collection.citation. Use the function getCitation(), if you need a specific format & language adaption. Example to get a Bibtex styled citation: .getCitation('bibtex','de-CH')

Convenience functions

The following functions are recommended to get information about the available collections as well as creating an instance of a collection.

collection.getIdList()

collection.getIdList()

This will return a pandas.DataFrame, listing all available collections at the data portal. The DataFrame contains the following columns: ['collection', 'doi', 'title', 'description', 'dobj', 'count']. We would recommend that you pay close attention to the count. We have collections with many data objects associated. If you just want to play around, select a collection with less than 10 objects.

  • collection contains the PID/URI for the collection. This is the ID you need to provide for the .get(CollectionId) function. Please be aware that you need to provide the full URI.
    Example: .get('https://meta.icos-cp.eu/collections/n7cIMHIyqHJKBeF_3jjgptHP')
  • dobj contains a list (LIST[STR]) of all PID/URI associated data objects.
  • count tells you how many data objects are associated with this collection.

  • Returns a pandas DataFrame

collection.get()

collection.get(CollectionId)

Create a collection object. See the class method above for the attributes available in the collection object. The CollectionId must be either the full ICOS URI of the collection landing page or the DOI (if one is available). Not all collections have a DOI. Both information can be extracted with the function .getIdList() .The following to lines to create 'myCollection' yield the same result:

myCollection = get('https://meta.icos-cp.eu/collections/n7cIMHIyqHJKBeF_3jjgptHP')
myCollection = get('10.18160/ry7n-3r04')
  • Returns Collection

Sparql

At the ICOS Carbon Portal we store all data and metadata as linked data in a triple store. For more information about this approach refer to Semantic Web, Resource Description Framework (RDF), and Triple Stores.

This module is a simple interface to the SPARQL endpoint at the Carbon Portal. You can write your own queries and use the module to query the database or use some of the provided built-in queries.

Load the module with:
from icoscp.sparql.runsparql import RunSparql

classmethod RunSparql(sparql_query='', output_format='txt')
sparql_query needs to be a valid query. You can test a query directly at the online SPARQL endpoint at https://meta.icos-cp.eu/sparqlclient/?type=CSV. The output format is by default (txt/json) but you can adjust with the following formats ['json', 'csv', 'dict', 'pandas', 'array', 'html'].

Attributes:


RunSparql.data

If a query is set and the method .run() was executed, it returns the result from the SPARQL endpoint. If no data is available the method returns False (BOOL).

  • Return BOOL | STR

RunSparql.query = 'query'

Retrieve or set the query.

  • Return STR

RunSparql.format = 'fmt'

Retrieve or set the output format.

fmt = 'json', 'csv', 'dict', 'pandas', 'array', 'html'
  • Return STR

RunSparql.run()

This method actually executes the query and formats the result to the output format. If the sparql query is not executable because of syntax errors, for example, a TUPLE is returned (False, 'Bad Request')

  • Return TUPLE | FMT