Disaggregation¶
These models represent aggregated disaggregation arrays derived from OpenQuake PSHA engine outputs.
Disaggregation Aggregate¶
The core model for aggregated disaggregation data, stored as PyArrow parquet datasets.
Bases: BaseModel
Aggregated disaggregation arrays across realisations.
Attributes:
| Name | Type | Description |
|---|---|---|
compatible_calc_id |
str
|
FK for hazard-calc equivalence. |
hazard_model_id |
str
|
NSHM hazard model identifier e.g. "NSHM_v1.0.4" (caller-supplied). |
bins_digest |
str
|
sha256[:16] over sorted axes + sorted bin centres (compatibility key). |
nloc_001 |
str
|
location string at 0.001° resolution e.g. "-38.330~175.550". |
nloc_0 |
str
|
location string at 1.0° resolution (used for partitioning). |
vs30 |
int
|
VS30 value in m/s. |
imt |
str
|
intensity measure type label e.g. "PGA", "SA(1.0)". |
target_aggr |
str
|
hazard-curve aggregation the disagg was conditioned on e.g. "mean", "0.5". |
probability |
ProbabilityEnum
|
ProbabilityEnum name supplied by caller e.g. "_10_PCT_IN_50YRS". |
imtl |
float
|
IML at which the disagg was computed. |
aggr |
str
|
aggregation type applied across realisations e.g. "mean", "0.1". |
disagg_bins |
dict[str, list[str]]
|
ordered map |
disagg_values |
List[float]
|
flattened disaggregation array over |
Source code in toshi_hazard_store/model/hazard_models_pydantic.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
PyArrow Schema¶
The DisaggregationAggregate model can be converted to a PyArrow schema for dataset I/O:
from toshi_hazard_store.model.hazard_models_pydantic import DisaggregationAggregate
schema = DisaggregationAggregate.pyarrow_schema()
The schema includes:
compatible_calc_id(string) - Compatible calculation identifierhazard_model_id(string) - Model identifier (e.g., "NSHM_v1.0.4")bins_digest(string) - sha256[:16] compatibility key over sorted axes + bin centresnloc_001(string) - Location to 3 decimal places (e.g., "-41.300~174.800")nloc_0(string) - Location to 0 decimal places (e.g., "-41.0~174.0") for partitioningvs30(int32) - VS30 valueimt(string) - Intensity measure type (e.g., "PGA", "SA(1.0)")target_aggr(string) - Hazard-curve aggregation the disagg was conditioned on (e.g., "mean")probability(string) - Return-period probability as aProbabilityEnumname (e.g., "_10_PCT_IN_50YRS")imtl(float) - Intensity measure level at which the disagg was computedaggr(string) - Aggregation type across realisations (e.g., "mean", "0.1")disagg_bins(map of string → list of string) - Ordered map of axis name to bin-centre strings; key order defines the axis order ofdisagg_valuesdisagg_values(list of float32) - Flattened C-order disaggregation array overdisagg_binsaxes
Dataset Partitioning¶
Disaggregation aggregate datasets use Hive-style partitioning on bins_digest / vs30 / nloc_0:
<dataset_root>/
├── bins_digest=6028db096c3a9e62/
│ ├── vs30=400/
│ │ └── nloc_0=-41.0~174.0/
│ │ └── <uuid>-part-0.parquet
│ └── vs30=1500/
│ └── nloc_0=-41.0~174.0/
│ └── <uuid>-part-0.parquet
The bins_digest partition groups rows with identical bin topology, enabling efficient filtering when querying a specific disaggregation configuration. Use the d2 query strategy for large datasets to exploit all three partition levels. The bins_digest can be obtained from the disagg_bins with toshi_hazard_store.model.revision_4.extract_disagg_hdf5.compute_bins_digest.
Note that this partitioning is not enforced by append_models_to_dataset, it is left to the user to dictate the partitioning either at write time or (more usually) after running ths_ds_defrag.
Reshaping disagg_values¶
disagg_values is stored as a flat list. Use to_ndarray() to reshape it into an N-D array with axes ordered by disagg_bins:
from toshi_hazard_store import query
from toshi_hazard_store.model.constraints import ProbabilityEnum
bins = {
"mag": ["5.5", "6.5", "7.5"],
"dist": ["10.0", "50.0", "100.0", "200.0"],
"eps": ["-1.0", "0.0", "1.0"],
}
for disagg in query.get_disagg_aggregates(
location_codes=["-41.300~174.800"],
vs30s=[400],
hazard_model="NSHM_v1.0.4",
imts=["PGA"],
aggs=["mean"],
target_aggrs=["mean"],
probabilities=[ProbabilityEnum._10_PCT_IN_50YRS],
disagg_bins=bins,
strategy="d2",
):
arr = disagg.to_ndarray() # shape: (3, 4, 3) for mag × dist × eps
print(arr.shape)
The flat storage form is preserved as the canonical representation; reshaping is opt-in and allocates a numpy array on demand.
Constraint Enums¶
Probability Enum¶
Bases: Enum
Defines the values available for probabilities.
store values as float representing probability in 1 year
Source code in toshi_hazard_store/model/constraints.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |