Query API¶
The toshi_hazard_store.query package provides the main public interface for querying hazard data from parquet datasets.
Quick Start¶
from toshi_hazard_store import query
from toshi_hazard_store.model.constraints import ProbabilityEnum
# Query hazard curves
curves = query.get_hazard_curves(
location_codes=["-41.300~174.800"],
vs30s=[400],
hazard_model="NSHM_v1.0.4",
imts=["PGA"],
aggs=["mean"],
strategy="d1" # or "d2", "naive"
)
for curve in curves:
print(f"{curve.imt} at {curve.nloc_001}: {curve.values}")
# Query disaggregation aggregates
disaggs = query.get_disagg_aggregates(
location_codes=["-41.300~174.800"],
vs30s=[400],
hazard_model="NSHM_v1.0.4",
imts=["PGA"],
aggs=["mean"],
target_aggrs=["mean"],
probabilities=[ProbabilityEnum._10_PCT_IN_50YRS],
disagg_bins={
"mag": ["5.5", "6.5", "7.5"],
"dist": ["10.0", "50.0", "100.0", "200.0"],
"eps": ["-1.0", "0.0", "1.0"],
},
strategy="d2" # or "d1", "naive"
)
for disagg in disaggs:
print(f"{disagg.imt} at {disagg.nloc_001}, prob={disagg.probability.name}")
arr = disagg.to_ndarray() # reshape to N-D array over disagg_bins axes
print(f" shape: {arr.shape}")
# Query gridded hazard
gridded = query.get_gridded_hazard(
location_grid_id="NZ_0_1_NB_1_1",
hazard_model_ids=["NSHM_v1.0.4"],
vs30s=[400.0],
imts=["PGA"],
aggs=["mean"],
poes=[0.02, 0.1]
)
for grid in gridded:
print(f"Grid {grid.location_grid_id} at POE {grid.poe}: {len(grid.accel_levels)} locations")
toshi_hazard_store.query package¶
Query package for hazard data retrieval.
This package provides the main public interface for querying hazard data from parquet datasets. It includes:
- Main query functions: get_hazard_curves, get_gridded_hazard, get_disagg_aggregates
- Data models: AggregatedHazard, DisaggregationAggregate, IMTValue
- Location utilities: downsample_code, get_hashes
AggregatedHazard
dataclass
¶
Represents an aggregated hazard dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
compatible_calc_id |
str
|
the ID of a compatible calculation for PSHA engines interoperability. |
hazard_model_id |
str
|
the model that these curves represent. |
nloc_001 |
str
|
the location string to three places e.g. "-38.330~17.550". |
nloc_0 |
str
|
the location string to zero places e.g. "-38.0~17.0" (used for partitioning). |
imt |
str
|
the intensity measure type label e.g. 'PGA', 'SA(5.0)'. |
vs30 |
int
|
the VS30 integer. |
agg |
str
|
the aggregation type. |
values |
list[Union[float, IMTValue]]
|
a list of 44 IMTL values. |
Notes
This class is designed to match the table schema for aggregated hazard datasets.
Source code in toshi_hazard_store/query/models.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
to_imt_values()
¶
Converts the IMTL values in this object's values attribute from a list of floats to a list of IMTValue
objects.
Returns:
AggregatedHazard: this object itself.
Source code in toshi_hazard_store/query/models.py
96 97 98 99 100 101 102 103 104 105 | |
DisaggregationAggregate
¶
Bases: BaseModel
Aggregated disaggregation arrays across realisations.
Attributes:
| Name | Type | Description |
|---|---|---|
compatible_calc_id |
str
|
FK for hazard-calc equivalence. |
hazard_model_id |
str
|
NSHM hazard model identifier e.g. "NSHM_v1.0.4" (caller-supplied). |
bins_digest |
str
|
sha256[:16] over sorted axes + sorted bin centres (compatibility key). |
nloc_001 |
str
|
location string at 0.001° resolution e.g. "-38.330~175.550". |
nloc_0 |
str
|
location string at 1.0° resolution (used for partitioning). |
vs30 |
int
|
VS30 value in m/s. |
imt |
str
|
intensity measure type label e.g. "PGA", "SA(1.0)". |
target_aggr |
str
|
hazard-curve aggregation the disagg was conditioned on e.g. "mean", "0.5". |
probability |
ProbabilityEnum
|
ProbabilityEnum name supplied by caller e.g. "_10_PCT_IN_50YRS". |
imtl |
float
|
IML at which the disagg was computed. |
aggr |
str
|
aggregation type applied across realisations e.g. "mean", "0.1". |
disagg_bins |
dict[str, list[str]]
|
ordered map |
disagg_values |
List[float]
|
flattened disaggregation array over |
Source code in toshi_hazard_store/model/hazard_models_pydantic.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
pyarrow_schema()
staticmethod
¶
A pyarrow schema for aggregate disaggregation datasets.
Source code in toshi_hazard_store/model/hazard_models_pydantic.py
189 190 191 192 | |
to_ndarray()
¶
Reshape disagg_values into an N-D array with axes ordered by disagg_bins keys.
Source code in toshi_hazard_store/model/hazard_models_pydantic.py
183 184 185 186 187 | |
IMTValue
dataclass
¶
Represents an intensity measure type (IMT) value.
Attributes:
| Name | Type | Description |
|---|---|---|
lvl |
float
|
The level of the IMT value. |
val |
float
|
The value of the IMT at that level. |
Source code in toshi_hazard_store/query/models.py
54 55 56 57 58 59 60 61 62 63 64 | |
downsample_code(loc_code, res)
¶
Get a CodedLocation.code at the chosen resolution from the given location code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
loc_code
|
str
|
The location code in format 'latitude~longitude'. |
required |
res
|
int
|
Resolution in grid degrees to downsample to. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The downsampled location code. |
Examples:
>>> downsample_code('37.7749~-122.4194', 0.1)
'37.8~-122.4'
Source code in toshi_hazard_store/query/hazard_query.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
get_disagg_aggregates(location_codes, vs30s, hazard_model, imts, aggs, target_aggrs, probabilities, disagg_bins, strategy='naive', dataset_uri=None)
¶
Retrieves aggregated disaggregations from the dataset.
The optional strategy argument can be used to control how the query behaves:
- 'naive' (the default) lets pyarrow do its normal thing.
- 'd1' assumes the dataset is partitioned on
bins_digest, vs30. - 'd2' assumes the dataset is partitioned on
bins_digest, vs30, nloc_0and acts accordingly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
location_codes
|
list
|
List of location codes. |
required |
vs30s
|
list
|
List of VS30 values. |
required |
hazard_model
|
str
|
the hazard model id. |
required |
imts
|
list
|
List of intensity measure types (e.g. 'PGA', 'SA(5.0)'). |
required |
aggs
|
list
|
List of aggregation types. |
required |
target_aggrs
|
list
|
List of target hazard-curve aggregation types (e.g. 'mean', '0.5'). |
required |
probabilities
|
list
|
List of ProbabilityEnum members. |
required |
disagg_bins
|
dict[str, list[str]]
|
the bins dict defining the disaggregation topology. The bins compatibility digest is computed internally from this dict. |
required |
strategy
|
str
|
which query strategy to use (options are |
'naive'
|
dataset_uri
|
Optional[str]
|
optional URI for the dataset. Defaults to the THS_DATASET_DISAGG_AGGR_URI env var. |
None
|
Yields:
| Name | Type | Description |
|---|---|---|
DisaggregationAggregate |
DisaggregationAggregate
|
An object containing the disaggregation aggregate data. |
Source code in toshi_hazard_store/query/datasets.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | |
get_gridded_hazard(location_grid_id, hazard_model_ids, vs30s, imts, aggs, poes, dataset_uri=None)
¶
Retrieves gridded hazard from the parquet dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
location_grid_id
|
str
|
the grid identifier to query. |
required |
hazard_model_ids
|
list
|
List of hazard model identifiers. |
required |
vs30s
|
list
|
List of VS30 values. |
required |
imts
|
list
|
List of intensity measure types (e.g. 'PGA', 'SA(5.0)'). |
required |
aggs
|
list
|
List of aggregation types. |
required |
poes
|
list
|
List of probability of exceedance values. |
required |
dataset_uri
|
Optional[str]
|
optional URI for the dataset. Defaults to the THS_DATASET_GRIDDED_URI env var. |
None
|
Yields:
| Name | Type | Description |
|---|---|---|
GriddedHazardPoeLevels |
GriddedHazardPoeLevels
|
An object containing the gridded hazard data. |
Source code in toshi_hazard_store/query/datasets.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
get_hashes(locs, resolution=0.1)
¶
Compute a set of hashes for the given locations at the specified resolution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
locs
|
Iterable[str]
|
A collection of location codes in the format 'latitude~longitude'. |
required |
resolution
|
float
|
The resolution to compute hashes at (in grid degrees). Defaults to 0.1. |
0.1
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
Iterable[str]
|
A sorted list of unique location codes, downsampled to the specified resolution. |
Source code in toshi_hazard_store/query/hazard_query.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
get_hazard_curves(location_codes, vs30s, hazard_model, imts, aggs, strategy='naive', dataset_uri=None)
¶
Retrieves aggregated hazard curves from the dataset.
The optional strategy argument can be used to control how the query behaves:
- 'naive' (the default) lets pyarrow do its normal thing.
- 'd1' assumes the dataset is partitioned on
vs30, generating multiple pyarrow queries from the user args. - 'd2' assumes the dataset is partitioned on
vs30, nloc_0and acts accordingly.
These overriding strategies allow the user to tune the query to suit the size of the datasets and the
compute resources available. e.g. for the full NSHM, with an AWS lambda function, the d2 option is optimal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
location_codes
|
list
|
List of location codes. |
required |
vs30s
|
list
|
List of VS30 values. |
required |
hazard_model
|
str
|
the hazard model id. |
required |
imts
|
list
|
List of intensity measure types (e.g. 'PGA', 'SA(5.0)'). |
required |
aggs
|
list
|
List of aggregation types. |
required |
strategy
|
str
|
which query strategy to use (options are |
'naive'
|
dataset_uri
|
Optional[str]
|
optional URI for the dataset. Defaults to the THS_DATASET_AGGR_URI env var. |
None
|
Yields:
| Name | Type | Description |
|---|---|---|
AggregatedHazard |
AggregatedHazard
|
An object containing the aggregated hazard curve data. |
Source code in toshi_hazard_store/query/datasets.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |