Skip to content

Configuration

The toshi_hazard_store library uses PyArrow parquet datasets for storing and retrieving hazard data.

Run-time options are configured using environment variables, and/or a local .env (dotenv) file. See python-dotenv.

The .env file should be created in the folder from where the Python interpreter is invoked - typically the root folder of your project.

Dataset Configuration

The library requires at least one dataset to be configured:

Environment Variable Default Description
THS_DATASET_AGGR_URI (none) URI for aggregated hazard curves dataset (e.g., s3://bucket/path or /local/path)
THS_DATASET_GRIDDED_URI (none) URI for gridded hazard dataset (optional)
THS_DATASET_AGGR_ENABLED FALSE Set to TRUE to enable aggregated dataset features

For public NSHM data access, contact nshm@gns.cri.nz for credentials.

General Settings

Environment Variable Default Description
NZSHM22_HAZARD_STORE_STAGE LOCAL Deployment stage discriminator (e.g., TEST, PROD)
NZSHM22_HAZARD_STORE_NUM_WORKERS 1 Number of parallel workers for batch operations
NZSHM22_HAZARD_STORE_REGION us-east-1 AWS region

Cloud Settings (AWS)

If accessing NSHM data from AWS:

Environment Variable Description
AWS_PROFILE Name of your AWS credentials profile
AWS_ACCESS_KEY_ID AWS access key (for short-term credentials)
AWS_SECRET_ACCESS_KEY AWS secret key
AWS_SESSION_TOKEN Session token (required for short-term credentials)

For short-term credentials, see AWS documentation.

Example .env File

# Dataset configuration (required)
THS_DATASET_AGGR_URI=s3://your-bucket/path/to/hazard-curves
THS_DATASET_GRIDDED_URI=s3://your-bucket/path/to/gridded-hazard
THS_DATASET_AGGR_ENABLED=TRUE

# General settings
NZSHM22_HAZARD_STORE_STAGE=TEST
NZSHM22_HAZARD_STORE_NUM_WORKERS=4
NZSHM22_HAZARD_STORE_REGION=us-east-1

# AWS credentials (if using cloud storage)
AWS_PROFILE=your-profile-name

Dataset URIs

The library supports various URI formats:

  • Local path: /path/to/local/dataset
  • S3: s3://bucket/path/to/dataset
  • Other PyArrow-compatible filesystems supported by pyarrow.dataset

Datasets should be organized as PyArrow partitioned datasets with the following partitioning: - Aggregated curves: vs30 and/or vs30/nloc_0 - Gridded hazard: Standard partitioning