All of ods_explore’s functionality can be accessed with an instance of opendatasoft.Opendatasoft
.
ods_explore.opendatasoft.Opendatasoft(subdomain='data', base_url=None, session=None, api_key=None, lang='en', timezone='UTC')
subdomain
- A subdomain used to create the base API URL, useful if the data portal being accessed is hosted on opendatasoft.com, eg. https://{subdomain}.opendatasoft.com.base_url
- A custom base API URL.session
- A request.Session
object with which to make API calls.api_key
- An Opendatasoft API key (to be attached to the session object), for accessing private datasets. Read more on generating API keys.lang
- The language used to format strings. One of: en
, fr
, nl
, pt
, it
, ar
, de
, es
, ca
, eu
, sv
timezone
- The timezone applied to datetime fields, as defined by the Unicode CLDR project.base_url
The resolved base API URL.
catalog
An instance of query.CatalogQuery
, the top-level querying interface, as described in Making queries below.
session
The session object.
All queries are created using the catalog
attribute of an instance of opendatasoft.Opendatadoft
. Four key endpoints in the Catalog API and Dataset API are supported:
# query datasets in a catalog
ods.catalog.datasets
# read one dataset
ods.catalog.dataset(dataset_id='doc-geonames-cities-5000')
# query records in a dataset, given its `dataset_id`
ods.catalog.dataset(dataset_id='doc-geonames-cities-5000').records
# read one record, given its `dataset_id` and `record_id`
(
ods
.catalog
.dataset(dataset_id='doc-geonames-cities-5000')
.record(record_id='24eec8bff4f5b55afdeeeacb326167ed6b1e933a')
)
The datasets
/records
attributes, and dataset()
/record()
methods, all return new instances of query.DatasetQuery
or query.RecordQuery
. With these, you can refine your search using any number of chainable methods, or retrieve results by calling a query evaluation method.
Since the methods below return new Queries, they’re chainable:
import ods_explore.language as lang
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.filter(country_code='CA', timezone='America/Vancouver')
.exclude(population__gt=lang.avg('population'))
.order_by('name')
)
This query adds a filter, exclusion, and ordering to the cities in the doc-geonames-cities-5000
dataset. The final result contains all Canadian cities in the America/Vancouver
timezone, except for those whose population is greater than the average population, in alphabetical order by name.
filter(*args, **kwargs)
Returns a new Query containing objects that match the given lookup parameters. The lookup parameters (**kwargs
) should be in the format described in Field lookups below. Multiple parameters are joined via AND in the underlying ODSQL expression.
If you need to execute more complex queries (such as parameters joined with OR), you can use Q()
objects or raw OSQQL (*args
).
exclude(*args, **kwargs)
Like filter()
, but returns a new Query containing objects that do not match the given lookup parameters.
select(*args, **kwargs)
Returns a new Query containing objects whose fields are limited to the given expressions. Expressions (*args
) can be field names, aggregation functions, scalar functions, or F()
objects, and can be combined with arithmetic operators.
To specify custom labels, use named expressions (**kwargs)
.
import ods_explore.language as lang
from ods_explore.query import F
# the name and population
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.select('name', 'population')
)
# double the population, labelled 'double_population'
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.select(double_population=F('population') * 2)
)
order_by(*args)
Returns a new Query with a given ordering. To indicate descending order, prepend field names with -
. To order randomly, use ?
.
# cities in alphabetical order by name
ods.catalog.dataset('doc-geonames-cities-5000').records.order_by('name')
# cities in descending order by population
ods.catalog.dataset('doc-geonames-cities-5000').records.order_by('-population')
# cities in a random order
ods.catalog.dataset('doc-geonames-cities-5000').records.order_by('?')
refine(**kwargs)
Returns a new Query containing objects that match the given facet values (**kwargs
).
A catalog’s available facets and a list of possible values for each facet can be enumerated by directly calling List facet values in the Explore V2 API. ods_explore does not currently provide an interface for this endpoint.
ignore(**kwargs)
Like refine()
, but returns a new Query containing objects that do not match the given facet values (**kwargs
).
Here, **kwargs
is compatible with the in
field lookup, so you may ignore multiple facet values at once.
get(as_json=False, **kwargs)
Returns results matched by the query as objects, or as dictionaries if as_json
is True
. For Queries that read one dataset or one record, a single object is returned, otherwise a list of objects.
Custom querystring parameters (such as limit
or offset
) can be added to the underlying API call with **kwargs
.
count()
Returns the number of results matched by the query.
exists()
Returns True
if the query contains any results, and False
if not.
iterator(batch_size=100, as_json=False)
Returns an iterator over results matched by the query as objects, or as dictionaries if as_json
is True
.
The number of results to retrieve per API call is adjustable with batch_size
.
all(batch_size=100)
Returns all results matched by the query as a list of objects.
The number of results to retrieve per API call is adjustable with batch_size
.
dataframe(batch_size=100, **kwargs)
Returns results as a Pandas DataFrame, passing **kwargs
to the underlying pandas.json_normalize()
call.
The number of results to retrieve per API call is adjustable with batch_size
.
first()
Returns the first object matched by the query.
last()
Returns the last object matched by the query.
aggregate(*args, **kwargs)
Returns a dictionary of aggregate values. Expressions (*args
) are aggregation functions that specify a value to be included in the output. To specify custom labels, use named expressions (**kwargs
).
The following are attributes and methods of Query instances.
url(**kwargs)
Returns the URL of the underlying API call that the query would make, useful for debugging ods-explore library code.
Custom querystring parameters (such as limit
or offset
) can be added to the underlying API call with **kwargs
.
decoded_url
Like url()
, but returns the decoded URL, with plus signs replaced with spaces and %xx
escapes replaced with their single-character equivalents.
Field lookups are how you specify the core of an ODSQL where clause. Using the key format <field name>__<field lookup>
, they’re passed as keyword arguments to the Query methods filter()
and exclude()
, and to Q()
objects.
Case-insensitive word containment.
# matches 'La Lima', 'Palos de la Frontera', and 'Shangri-La', but not 'Las Vegas'
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.filter(name__contains='la')
)
# matches 'Santiago de la Peña', 'La Puebla de Almoradiel', and 'Saint-Jean-de-la-Ruelle'
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.filter(name__contains='de la')
)
Exact match (the default lookup behaviour when no field lookup is used).
ods.catalog.datasets.filter(dataset_id='doc-geonames-cities-5000')
# is equivalent to
ods.catalog.datasets.filter(dataset_id__exact='doc-geonames-cities-5000')
Greater than.
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.filter(population__gt=500)
)
Greater than or equal to.
Less than.
Less than or equal to.
In a given iterable, usally a list or tuple.
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.filter(country_code__in=['CA', 'FR'])
)
In a geographical area (for geo_point fields only).
The literals, helpers, filter functions, and enums described below are provided in the ods_explore.language
module.
One of the following filter functions that should be used in conjunction with this field lookup. In each case, the first argument is a Geometry
literal that describes a geographical area. This is created with the geom(geometry)
helper, where geometry
is a WKT/WKB or GeoJSON geometry expression as a string or dictionary.
polygon(area)
Limit results to a geographical area.
geometry(area, mode=Set.Within)
Limit results to a geographical area, based on a given set mode.
circle(center, radius, unit=Unit.METERS)
Limit results to a geographical area defined by a circle.
Is null (accepts True
or False
).
ods-explore provides the following aggregation functions in the ods_explore.language
module, which can be provided as arguments to the aggregate()
query evaluation method.
avg(field)
Returns the average value of a numeric field.
count(field=None)
Returns the number of non-null values of a field, or the total number of results matched by the query if no field is provided.
envelope(field)
Returns the convex hull (envelope) of a geo_point field.
max(field)
Returns the maximum value of a numeric or date field.
medium(field)
Returns the median value (50th percentile) of a numeric field.
min(field)
Returns the minimum value of a numeric or date field.
percentile(field, percentile)
Returns the nth percentile of a numeric field.
sum(field)
Returns the sum of all values of a numeric field.
ods-explore provides the following tools in the ods_explore.query
module.
A Q()
object represents an ODSQL condition that can be used in filter()
and exclude()
. They make it possible to define and reuse conditions, and can be used to perform complex queries when combined with the logical operators &
(AND), |
(OR), and ~
(NOT).
# cities whose population is less than 6000 or greater than 7000
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.filter(Q(population__lt=6000) | Q(population__gt=7000))
)
An F()
object represents the value of an object field, and makes it possible to refer to its value without having to retrieve it from the catalog. They make it possible to define conditions based on field values, and can be combined with the arithmetic operators +
, -
, *
, and /
.
# select the average elevation (digital elevation model = dem) in meters (the default unit) and in kilometers
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.select(elevation_m='dem', elevation_km=F('dem') / 1000)
)
# cities whose population is less than their average elevation
(
ods
.catalog
.dataset('doc-geonames-cities-5000')
.records
.filter(population__lt=F('dem'))
)
The following are object representations of Opendatasoft entities, implemented as typing.NamedTuple
s, that many query evaluation methods return by default.
ods_explore.models.Dataset(attachments, data_visible dataset_id, dataset_uid, features, fields, has_records, metas, visibility)
attachments
- A list of dictionaries of available file attachments for the dataset.data_visible
- True
if the caller is authorized to view this dataset, and False
otherwise.dataset_id
- The dataset id.dataset_uid
- The unique dataset id.features
- A list of available features for the dataset.fields
- A list of dictionaries of field names and associated metadata.has_records
- True
if the dataset has at least one record, and False
otherwise.metas
- Metadata about the dataset, as a dictionary.visibility
- A string indicating whether the dataset is public (domain
) or private (restricted
).ods_explore.models.Record(id, fields, size, timestamp)
id
- The record id.fields
- The record data fields, as a dictionary.size
- The record size in bytes.timestamp
- The record’s creation time.