Query API¶
The toshi_hazard_store.query package provides the main public interface for querying hazard data from parquet datasets.
Quick Start¶
from toshi_hazard_store import query
# Query hazard curves
curves = query.get_hazard_curves(
location_codes=["-41.300~174.800"],
vs30s=[400],
hazard_model="NSHM_v1.0.4",
imts=["PGA"],
aggs=["mean"],
strategy="d1" # or "d2", "naive"
)
for curve in curves:
print(f"{curve.imt} at {curve.nloc_001}: {curve.values}")
# Query gridded hazard
gridded = query.get_gridded_hazard(
location_grid_id="NZ_0_1_NB_1_1",
hazard_model_ids=["NSHM_v1.0.4"],
vs30s=[400.0],
imts=["PGA"],
aggs=["mean"],
poes=[0.02, 0.1]
)
for grid in gridded:
print(f"Grid {grid.location_grid_id} at POE {grid.poe}: {len(grid.accel_levels)} locations")
toshi_hazard_store.query package¶
Query package for hazard data retrieval.
This package provides the main public interface for querying hazard data from parquet datasets. It includes:
- Main query functions: get_hazard_curves, get_gridded_hazard
- Data models: AggregatedHazard, IMTValue
- Location utilities: downsample_code, get_hashes
AggregatedHazard
dataclass
¶
Represents an aggregated hazard dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
compatible_calc_id |
str
|
the ID of a compatible calculation for PSHA engines interoperability. |
hazard_model_id |
str
|
the model that these curves represent. |
nloc_001 |
str
|
the location string to three places e.g. "-38.330~17.550". |
nloc_0 |
str
|
the location string to zero places e.g. "-38.0~17.0" (used for partitioning). |
imt |
str
|
the intensity measure type label e.g. 'PGA', 'SA(5.0)'. |
vs30 |
int
|
the VS30 integer. |
agg |
str
|
the aggregation type. |
values |
list[Union[float, IMTValue]]
|
a list of 44 IMTL values. |
Notes
This class is designed to match the table schema for aggregated hazard datasets.
Source code in toshi_hazard_store/query/models.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
to_imt_values()
¶
Converts the IMTL values in this object's values attribute from a list of floats to a list of IMTValue
objects.
Returns:
AggregatedHazard: this object itself.
Source code in toshi_hazard_store/query/models.py
96 97 98 99 100 101 102 103 104 105 | |
IMTValue
dataclass
¶
Represents an intensity measure type (IMT) value.
Attributes:
| Name | Type | Description |
|---|---|---|
lvl |
float
|
The level of the IMT value. |
val |
float
|
The value of the IMT at that level. |
Source code in toshi_hazard_store/query/models.py
54 55 56 57 58 59 60 61 62 63 64 | |
downsample_code(loc_code, res)
¶
Get a CodedLocation.code at the chosen resolution from the given location code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
loc_code
|
str
|
The location code in format 'latitude~longitude'. |
required |
res
|
int
|
Resolution in grid degrees to downsample to. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The downsampled location code. |
Examples:
>>> downsample_code('37.7749~-122.4194', 0.1)
'37.8~-122.4'
Source code in toshi_hazard_store/query/hazard_query.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
get_gridded_hazard(location_grid_id, hazard_model_ids, vs30s, imts, aggs, poes, dataset_uri=None)
¶
Retrieves gridded hazard from the parquet dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
location_grid_id
|
str
|
the grid identifier to query. |
required |
hazard_model_ids
|
list
|
List of hazard model identifiers. |
required |
vs30s
|
list
|
List of VS30 values. |
required |
imts
|
list
|
List of intensity measure types (e.g. 'PGA', 'SA(5.0)'). |
required |
aggs
|
list
|
List of aggregation types. |
required |
poes
|
list
|
List of probability of exceedance values. |
required |
dataset_uri
|
Optional[str]
|
optional URI for the dataset. Defaults to the THS_DATASET_GRIDDED_URI env var. |
None
|
Yields:
| Name | Type | Description |
|---|---|---|
GriddedHazardPoeLevels |
GriddedHazardPoeLevels
|
An object containing the gridded hazard data. |
Source code in toshi_hazard_store/query/datasets.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
get_hashes(locs, resolution=0.1)
¶
Compute a set of hashes for the given locations at the specified resolution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
locs
|
Iterable[str]
|
A collection of location codes in the format 'latitude~longitude'. |
required |
resolution
|
float
|
The resolution to compute hashes at (in grid degrees). Defaults to 0.1. |
0.1
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
Iterable[str]
|
A sorted list of unique location codes, downsampled to the specified resolution. |
Source code in toshi_hazard_store/query/hazard_query.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
get_hazard_curves(location_codes, vs30s, hazard_model, imts, aggs, strategy='naive', dataset_uri=None)
¶
Retrieves aggregated hazard curves from the dataset.
The optional strategy argument can be used to control how the query behaves:
- 'naive' (the default) lets pyarrow do its normal thing.
- 'd1' assumes the dataset is partitioned on
vs30, generating multiple pyarrow queries from the user args. - 'd2' assumes the dataset is partitioned on
vs30, nloc_0and acts accordingly.
These overriding strategies allow the user to tune the query to suit the size of the datasets and the
compute resources available. e.g. for the full NSHM, with an AWS lambda function, the d2 option is optimal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
location_codes
|
list
|
List of location codes. |
required |
vs30s
|
list
|
List of VS30 values. |
required |
hazard_model
|
str
|
the hazard model id. |
required |
imts
|
list
|
List of intensity measure types (e.g. 'PGA', 'SA(5.0)'). |
required |
aggs
|
list
|
List of aggregation types. |
required |
strategy
|
str
|
which query strategy to use (options are |
'naive'
|
dataset_uri
|
Optional[str]
|
optional URI for the dataset. Defaults to the THS_DATASET_AGGR_URI env var. |
None
|
Yields:
| Name | Type | Description |
|---|---|---|
AggregatedHazard |
AggregatedHazard
|
An object containing the aggregated hazard curve data. |
Source code in toshi_hazard_store/query/datasets.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |