Beehive’s Climate Risk Modeling Documentation

This document describes Beehive’s physical risk models, reflects on how we build and maintain them today, and indicates how we plan to keep improving them.

Table of Contents:

Model Overview

Cyclone Model Methodology

Flood Model Methodology

Wildfire Model Methodology

Heat Wave Model Methodology

FAQs

Model Overview

Beehive assesses physical climate risk to support regulatory reporting and risk management. Risk assessments are built using government, academic, and open-source datasets, rather than proprietary data.  Currently, physical risks from four climate hazards are assessed: cyclones, floods, heat waves, and wildfires.  Additional hazards including sea level rise and drought are planned for release in the fall of 2025.  Assessments cover the geographic regions of Asia, Europe, the Middle East, North and South America, and Oceania.

Data Sources and Climate Projections
Data underlying the assessments is sourced from government and academic organizations including NASA Earthdata, FEMA, the European Commission's Joint Research Centre, and scientific journals.  Climate forecasting is conducted using CMIP6 simulation results, with a 20+ member simulation ensemble assembled to forecast each scenario and time horizon of interest.  The use of a matrix of scenarios and time horizons provides a framework for understanding the risk associated with different emissions pathways.

Scenario and Time Horizon Selection
Risk is assessed across three of the scenarios defined by the 6th Assessment Report of the Intergovernmental Panel on Climate Change (IPCC).  These scenarios specify both a socioeconomic narrative, and a radiative forcing level that is used for scenario simulations.  The radiative forcing describes the climate impact of the scenario’s greenhouse gas emissions profile.  The scenarios used here are: SSP1-2.6 (low emissions), SSP3-7.0 (medium-high emissions), and SSP5-8.5 (very high emissions), where SSP stands for Shared Socioeconomic Pathway, and where the number after the hyphen describes the radiative forcing level.  The forcing levels roughly correspond to Representative Concentration Pathway (RCP) scenarios that were used in earlier IPCC reports.  These scenarios represent a range of future emissions trajectories commonly used in climate risk assessment and regulatory frameworks. The assessments cover three time horizons: 1 year, 10 years, and 30 years forward.  The horizon choices provide both near-term operational risk insights and longer-term strategic planning perspectives.

Methodology Framework
Assessments are conducted at varying spatial resolutions, ranging from 400 meters in densely populated areas to 50 kilometers in unpopulated regions.  Each assessed region (e.g. North America) is meshed with cells sized within this range, and risk scores are assigned to each mesh cell.  For most hazard types, risk is quantified via decomposition into three components: (1) the expected frequency of hazard events, (2) the extent to which assets are exposed to those events, and (3) the expected loss of asset value, when assets are affected.  All three components are handled on a cell by cell basis.  Decomposing risk into these components provides a natural mechanism for integrating physical climate data and socioeconomic impact data.

For most of the hazard types considered here, risk is considered as an intensive quantity (loss per unit of asset value) rather than an extensive quantity (absolute monetary loss). This distinction is important for results interpretation.  An extensive measure might indicate that a location faces $500,000 in expected annual flood damage, while an intensive measure would express the risk as a 2% expected annual loss, relative to total asset value.  Intensive measures enable meaningful comparison across regions with different asset concentrations, and allow organizations to apply risk scores regardless of their specific asset values at each location. For example, a score indicating a 2% annual loss rate applies equally whether the underlying assets are worth $10 million or $100 million, whereas an absolute loss figure of $500,000 would represent a significantly different risk profile for these two cases. The intensive-based approach makes scores directly applicable for portfolio-level risk assessment and regulatory reporting across diverse asset types and geographies. But, the nature of this approach must be kept in mind when considering why regions with lower asset or population densities, such as a wilderness area, might receive higher scores than suburban or urban areas.  

Final risk scores range from 1 to 7, and represent relative risk compared to all scored locations from around the globe, for each hazard type.  A score of 1 describes below-average risk (typically below the 40th percentile of the global distribution), while a score of 7 describes top-tier risk (typically above the 97th percentile).

Cyclone Model Methodology

Following the decomposition approach described above, cyclone risk is calculated as the product of three factors: cyclone event frequency, asset exposure when those events occur, and expected losses to assets when they are affected.  Event frequency data is sourced from observed cyclone records in the IBTrACS archive, and those frequencies are forecast to future times using a composite of trend line forecasts from scientific literature.  Asset exposure is calculated by separating economic impact into wind-based and flood-based components, with assets assumed to be fully exposed to wind effects while flood exposure is determined using flood plain data matched to cyclone occurrence rates.  Asset loss expectations are calculated using established wind damage functions and flood damage methodologies, with damage parameters calibrated for each geopolitical region and cyclone basin.  Calculation and data source details are provided below.

Event Frequency

Historic cyclone data is sourced from the IBTrACS archive,

  • J. Gahtan et al. International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4r01. NOAA National Centers for Environmental Information (2024).  https://doi.org/10.25921/82ty-9e16

This archive contains one of the more comprehensive records of global cyclone activity, compiled from international meteorological organizations. IBTrACS data from 1980 through the present is used for risk assessment, and individual cyclone events are aggregated to build spatial density kernels describing the probability of cyclones of any category occurring at any global location.

Forecasting these cyclone occurrence rates forward through time and across SSP scenarios is particularly challenging, and incurs particular uncertainty relative to other hazard types.  CMIP-type global climate models have historically been run at resolutions that cannot capture the convective dynamics of cyclones.  This makes it difficult for global climate simulations to accurately reproduce historical cyclone occurrence rates, much less forecast how rates will change under different emissions scenarios.  More targeted approaches such as higher resolution regional climate models or reduced order cyclone-specific models, however, can offer insight into how cyclone frequency and intensity may evolve.  Discussions of the uncertainty ranges associated with cyclone frequency forecasting can be found in:

Beehive addresses this model uncertainty by using an ensemble of data sources to produce composite cyclone frequency and intensity forecasts.  The individual forecasts are usually organized by cyclone basin (e.g., the North Atlantic, or the South Pacific), and country-by-country differences are not considered.  Rather, single trend lines are applied to forecast how occurrence rates evolve for all countries in the basin.  In addition to data from the two Knutson references above, Beehive’s forecasting ensemble includes data from,

A composite of the trend lines from these papers is combined with IBTrACS historic occurrence rates to project cyclone occurrence rates, category by category (i.e. cyclone category 1 through category 5), out to each time horizon.

Asset Exposure and Loss Rates

The cyclone asset exposure calculations begin by separating damage into wind-based and flood-based components.  Assets within each mesh cell are assumed to be fully exposed to wind losses.  No consideration is given to building-by-building characteristics or wind hardening strategies.  Flood loss exposure utilizes the same flood plain hazard maps that are described in the Flooding risk type description below, with flood rasters selected so that the occurrence frequency associated with the flood map matches the expected cyclone occurrence frequency, category by category. This frequency matching approach tethers the flood exposure calculations to the forecast cyclone behavior.

Similar to the exposure calculations, the loss expectation calculations are distinct for the wind and flood damage components.  Wind damage losses are estimated using a wind damage function, where damage scales non-linearly as wind speed increases beyond a threshold value,

Damage parameters for this function are calculated for each geopolitical region and cyclone occurrence zone, using values from,

Wind damage estimates are calculated for each cyclone category at the median windspeed observed within that category.  

Calculations for the flood component of cyclone damage closely follow the methodology described in the Flooding section of this document, with two important differences.  First, only a single flood return period (frequency matched to the cyclone occurrence rate) is used for each cyclone category.  Second, flood damage results are scaled with the cyclone occurrence frequency before being combined with wind damage results.  This ensures that only flooding from cyclones is considered.

Properly weighting and combining the wind and flood components is a probabilistic exercise, since the exact contribution of each damage component varies from storm to storm, and since cyclone damage forecasting remains an active area of academic research, 

  • K. M. Wilson et al. Estimating Tropical Cyclone Vulnerability: A Review of Different Open-Source Approaches. In: Collins, J.M., Done, J.M. (eds) Hurricane Risk in a Changing Climate. Hurricane Risk, vol 2. Springer, Cham. (2022) https://doi.org/10.1007/978-3-031-08568-0_11

Here, the flood damage contribution increases with decreasing distance to the coast, and as the hurricane category decreases (i.e., the wind speed decreases).  But, the wind contribution is not allowed to drop below 30% of the value from the standalone wind damage function.  The mechanics of these calculations are applicable to cyclones of arbitrary occurrence frequency and intensity.  Consequently, they can be applied to the occurrence rates and intensities forecast for each scenario and time horizon of interest.  

The combined wind and flood damage rates are multiplied by the exposure rate and cyclone frequency, category by category, and then aggregated to determine a total intensive (loss per unit of asset value) damage expectation.  The expectations are collected from all global locations to determine the 1 through 7 scoring thresholds for cyclones, where an estimate below the 40th percentile of the global distribution is given a score of 1, and an estimate above the 97th percentile is given a score of 7.  The details of the scoring thresholds are shown in the table below.

Flooding Model Methodology

Following the decomposition approach described in the overview, flood risk is calculated as the product of three inputs: flood event frequency, the extent of asset exposure when a flood occurs, and the expected loss of value when an asset is flooded. Event frequency data is sourced from the return periods associated with flood hazard maps, and projected forward in time using precipitation data from CMIP6 simulations.  Asset exposure is determined at each flood frequency by analyzing the spatial extent of flooding within each mesh cell.  Asset loss expectations are calculated using empirical flood damage functions that relate the depth of a flood event to the economic damage it causes, per unit of asset value.  Details of the calculations and data sources used for each risk component are below.  

Event Frequency

Flood hazard maps are sourced from academic, research, and government organizations, and combined into a composite hazard map.  Beehive does not perform proprietary calculations to identify flood plains, and makes no in-house contributions to the composite map.  Taking a composite of data sources partially mitigates the uncertainty associated with any particular source.  For the most part, the individual hazard maps that are sourced provide water coverage and water depth estimates for discrete return periods (i.e., discrete flood occurrence frequencies).  The maps used in Beehive's assessments, which consider fluvial and coastal flooding, are:

Between 4 and 7 return periods (i.e. flood frequencies) are considered within each mesh cell.  For each return period, flood coverage and flood depth statistics are assigned to the cell by taking a composite of statistics from the component flood maps. 

After baseline flood frequencies and depths are established, they are forecast forward under each climate scenario using CMIP6 data.  First, the precipitation rates associated with each flood frequency are calculated from the baseline time period data.  Next, CMIP data from future time periods is analyzed to forecast how the frequency of those critical precipitation events will evolve.  Finally, the forecast precipitation frequencies are used to project the baseline flood frequencies forward, to each future time horizon and scenario of interest.

Asset Exposure and Loss Rates

Asset exposure is modeled using the spatial extent of flooding within each mesh cell, and the extent of flooding is determined from the composite flood plain map.  When the extent of the flooding in a cell nears 100% for a particular return period, all assets within that cell that are treated as being exposed to floods associated with that return period.  

Economic losses per unit of exposed asset value are calculated using flood damage functions.  These functions relate flood water depth to value loss.  A representative damage curve dataset is,

Additional flood exposure and loss rate data is sourced from FEMA's National Risk Index:

After losses have been calculated for each flood frequency, the data is aggregated over all frequencies to determine an overall expected annual loss rate.  This overall rate is scaled by a regional constant (i.e. a single constant is used for all of North America, or all of Europe), to ensure approximate consistency with reported cumulative annual flood losses from the region.  The loss rates are globally aggregated to determine thresholds for assigning the 1 through 7 risk scores.  A loss rate below the 40th percentile of the global distribution is assigned a score of 1, and a rate above the 97th percentile is assigned a score of 7.  The details of the scoring thresholds are shown in the table below.

Wildfire Model Methodology

The wildfire model follows the approach of decomposing risk calculations into three components, as described at the beginning of the document.  For this hazard type, the three components are: wildfire frequency, the fraction of assets that are exposed to a wildfire when one occurs, and the expected rate of asset loss when an asset is affected. Event frequency data is generated using machine learning models trained on wildfire occurrence data from historical satellite-based observations, and from wildfire simulation campaigns.  Once a model has been trained and can represent historical phenomena, climate data from CMIP6 simulations is used to run the model and generate results for each scenario and future time horizon of interest.  Asset exposure is calculated for each mesh cell by considering the typical size of wildfires that occur in the cell.  Baseline asset loss rates are derived from FEMA data, and are then scaled region-by-region to ensure consistency with the reported economic impact of wildfires within the region.  Details of the calculations and data sources used for each of the three risk components are below.

Event Frequency

Models that attempt to forecast wildfire occurrence rates on a global scale are typically developed using a reference database of observed fires, as well as parameterization data such as: 

  • Landcover type (e.g. urban, deciduous forest, desert, …)

  • Climate characteristics such as the the expected number of heat waves per year, the maximum expected time between precipitation events, or average soil moisture

  • Human factors such as proximity to population centers

  • For shorter time horizon models, weather data such as wind or humidity forecasts

Such data is available open-source, and Machine Learning (ML) frameworks can be trained on the data and used for wildfire forecasting at arbitrary global locations.  These sorts of ML approaches have been widely studied and applied in academic research, and examples include,

Beehive forecasts wildfire occurrence rates by training and then applying a similar ML model.  The quality of an ML wildfire approach depends on the quality and scope of the training data describing both fire occurrence and the environmental parameters that influence it.  The two wildfire data sources used here for training are,  

The FEMA dataset contains occurrence rates for the US that come from a simulation campaign, while the Global Fire Atlas dataset provides empirical wildfire occurrence rates derived from MODIS instruments on NASA's Terra and Aqua satellites.

Landcover input data for the model is sourced from the MODIS MCD12Q1 product:

Climate input data is sourced from the CMIP6 ensembles described in the modeling overview.  A wide variety of climate statistics might reasonably be chosen to describe the relationships between climate and fire occurrence, and Yu et al. provide a reference point for understanding some of these options, 

  • G. Yu et al.  Performance of Fire Danger Indices and Their Utility in Predicting Future Wildfire Danger Over the Conterminous United States.  Earth’s Future 11 (11) (2023).  https://doi.org/10.1029/2023EF003823

Here, risk is assessed over years and decades, rather than over the days and weeks associated with weather forecasts.  Consequently, the wildfire model is parameterized with annualized climate statistics.  The chosen statistics include,

  • arid periods per year where the daily cumulative precipitation is less than 0.1 millimeters for at least 20 days in a row

  • heat waves per year where the max daily temperature rises above 86F for at least 10 days in a row

  • average annual precipitation

  • average annual temperature

  • maximum consecutive days per year where the daily cumulative precipitation is less than 0.1 millimeters

  • minimum of the 2-month rolling average of relative humidity

  • minimum of the 2-month rolling cumulative precipitation

After training on climate, landcover, human proximity, and wildfire occurrence data, the ML model can forecast wildfire occurrence frequencies for all SSP scenarios and time periods of interest.  

In addition to the frequency forecasting model, a second model with identical input parameters is trained to forecast average fire size.  The fire size training data is taken from the global fire atlas referenced above, and this second model is used during the calculation of asset exposures. 

Asset Exposure and Loss Rates

Asset exposure calculations are made by forecasting the expected average size of fires occurring within a mesh cell.  The expected fire size is used to estimate the fraction of assets within the cell that would be affected by a wildfire event.  Because intensive rather than extensive losses are considered (see the discussion in the overview), the cumulative monetary value of assets within a mesh cell is not part of the exposure calculation.

A baseline loss rate per affected asset is sourced from the FEMA National Risk Index, under the assumption that rates remain relatively consistent globally.  However, regional scaling (i.e. a single scaling constant is used for all of North America, or all of Europe) is applied to ensure consistency with empirically observed annual monetary losses from wildfire within the region.  This scaling is a simple but significant accounting for the fact that local economic and building characteristics, including building or zoning codes, can influence loss rates per affected asset.

The overall expected wildfire loss rate for each cell is the product of these exposure and per-asset loss rates, and the forecast annual wildfire frequency.  The overall rates (i.e. the annual expected loss per unit of asset value) are aggregated globally to determine the thresholds for assigning 1 through 7 risk scores.  As with most other risk types, an estimate below the 40th percentile of the global distribution of loss rates is given a score of 1, and an estimate above the 97th percentile is given a score of 7, as shown in the table below.

Heat Wave Model Methodology

Assessment of heat risk differs from the three-component economic impact approach used for the other hazard types.  The choice to use a different framework is driven by the variety of mechanisms by which heat can impact business models and operations.  For example, an organization with members who perform significant outdoor work might have fundamentally different economic exposure to heat risk than an organization whose members work largely indoors.  And, heat exposure is not typically priced via insurance markets with the volume that cyclone, flood, and wildfire exposures are priced.  Instead of considering the economic impact of higher temperatures, then, the heat assessment provides information about the likelihood that assets will experience physical heat stress.  

While heat risk is assessed here using only physical climate information, some important context for this risk type is that a variety of researchers and practitioners do go a step further and connect temperature changes to economic impact.  Representative studies include, 

Further context is that the details of how heat affects human anatomy make it challenging to define a one-size-fits-all heat metric for use in assessments.  Consequently, researchers have proposed a wide variety of heat stress variables.  Examples include,

Many choices can qualify as reasonable, when approaching the concept of heat risk or heat stress.  The approach here, in an effort to capture generally applicable rather than precision-fit information, is to use a relatively straightforward composite definition, and assess how that stress evolves under the chosen scenario and time horizon matrix.

Physical Heat Stress Assessment

Unlike other hazard types where CMIP data provides a major but only partial component of the metric used for quantitative scoring, heat risk forecasting relies exclusively on CMIP6 simulation results.  Beehive implements a composite metric built from heat and humidity statistics that include:

  • days per year with a max wetbulb temperature over 75F

  • days per year with a max temperature over 96F

  • days per year with a max temperature over 99F

  • heat waves per year, where the max daily temperature rises 5% or more above the average max temperature from the previous month for at least 4 days in a row

  • average annual temperature

  • maximum of the 1-month rolling average of temperature

  • maximum of the 1-month rolling average of the max daily temperature

  • maximum of the 1-month rolling average of the wetbulb temperature

A weighted average of these statistics is calculated at each mesh cell. The approach captures multiple dimensions of heat stress, from absolute temperature thresholds to sustained heat wave conditions and humidity effects that influence human and infrastructure tolerance.

Risk Scoring

The heat metric thresholds used to assign 1 through 7 scores to mesh cells are more uniformly distributed than the thresholds for wildfires, floods, or cyclones. A heat risk score of 1 indicates the heat metric is in the bottom 10% of all global cells that have been scored, while a heat score of 7 indicates the heat metric is in the top 16% of all cells that have been scored. The remaining thresholds are distributed roughly uniformly between these bounds, as shown in the table.

FAQs

Modeling Approach

Which climate scenarios and time horizons does Beehive consider?

Beehive considers the following Shared Socioeconomic Pathways (SSPs) and time horizons:

  • SSP1-2.6 (low emissions), SSP3-7.0 (moderate), and SSP5-8.5 (high)

  • 1 year, 10 years, and 30 years from the year 2025, when the assessments were most recently updated

What are Shared Socioeconomic Pathways (SSPs), and how does Beehive use them?

SSPs are climate change scenarios.  They are conditional versions of the future associated with specific societal greenhouse gas emissions scenarios, and are used to understand how risk might vary across those scenarios.  Beehive uses three SSPs to model how hazard risk evolves under different emissions futures, with SSP1 representing a scenario with relatively low emissions, and SSP5 representing a scenario with very high emissions.  

What modeling techniques does Beehive use?

Beehive applies machine learning, statistical analytics, and ensemble climate modeling, depending on the hazard type.

How does Beehive estimate hazard frequency and severity?

Hazards are projected using CMIP6 simulations. Severity is assessed using loss functions (for floods and fires) and physical conditions (for heat).

How are Beehive’s physical risk scores or financial impact estimates calculated?

Quantitative and intensive economic losses (meaning, loss per unit of asset value rather than net loss) are estimated on a cell-by-cell basis, for risk types other than heat.  These economic loss estimates are a function of modeled hazard frequency, asset exposure, and asset vulnerability, as described in the model descriptions above.  For all risk types, the appropriate quantitative economic or physical metric is converted to a global relative score of 1 through 7 by sorting all metric values for each hazard type, and then setting cutoff percentiles within the sorted distribution which define the 1 through 7 scoring buckets.

What are the key assumptions behind Beehive’s physical risk models?

  • Climate variables influence hazard likelihood

  • Precipitation trends influence flood probabilities

  • Asset exposure is modeled in an aggregate fashion, rather than building-by-building

  • SSPs are an appropriate framework for evaluating climate-related risk ranges

What climate-specific datasets does Beehive use, and how does Beehive handle uncertainty in these climate projections?

Beehive uses Coupled Model Intercomparison Project Phase 6 (CMIP6) climate projection data, hosted by Google Cloud Public Datasets and accessed via zarr-consolidated stores (https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv).  The uncertainty associated with individual simulations and specific time periods is reduced by using an ensemble of 20 to 25 CMIP6 simulations, and by using a time window of plus or minus two years around the time horizon of interest, when forecasting climate statistics.  Even with this treatment, the climate projections are inherently statistical and probabilistic in nature.  They are useful for forecasting and understanding trends and risk ranges, but not for deterministically predicting the future.

What are the known limitations of Beehive’s current modeling approach?

  • Not yet building-specific

  • No economic correlation model for heat risk

  • Flash flooding not yet modeled

  • U.S.-derived loss data is extrapolated globally, when necessary

Validation and Accuracy

How does Beehive validate its models?

Model results are compared against historical records (e.g., FEMA, Global Fire Atlas) and academic benchmarks.

Does Beehive compare model results to historical disaster data or financial losses?

Yes—loss ratios and historical event data are used for backtesting and calibration.

How does Beehive ensure its methodology aligns with industry and scientific standards?

Beehive builds on established data sources, government and academic frameworks (e.g., CMIP, JRC, FEMA), and peer-reviewed methods.

How does Beehive plan to improve model accuracy over time?

Planned improvements include:

  • Flash flood and drought modeling

  • Higher spatial resolution

  • Global loss function coverage

  • Building-level modeling

  • Enhanced CMIP6 post-processing

Updates and Governance

How often does Beehive update its physical risk models?

Annually, with major updates released semiannually.

Who at Beehive is responsible for model governance and quality assurance?

The Chief Data Officer, supported by Beehive’s data science and climate modeling team.

Usage and Integration

How do customers typically use Beehive’s physical risk insights?

  • Climate disclosures

  • Enterprise risk management

  • Real estate, insurance, or supplier screening

  • Employee safety planning

How does Beehive support climate disclosure requirements under TCFD, CSRD, or California SB 261?

By providing time-bound, scenario-based physical risk data and audit trails suitable for reporting. Beehive also provides a transition risk assessment product and a reporting tool to generate a TCFD-aligned report compliant with global climate risk regulations.

What outputs do Beehive’s models generate?

  • 1–7 risk scores by region, hazard, time, and scenario

  • Audit trails showing the underlying data used in the scoring calculations

  • Graphs, charts, and other visuals to understand a company’s risk exposure

What is included in Beehive’s audit trail, and why is it important?

Audit trails show the drivers of risk scores, including climate trends, hazard frequency, and exposure assumptions. This transparency supports customer trust, precise adaptation planning, and external disclosures.

Can users filter results?

Yes—by hazard, geography, scenario, and time horizon.

Does Beehive offer APIs or data exports for further analysis?

While Beehive does not offer an API, customers can access raw data through file downloads.

Legal, Privacy, and Disclaimers

Do Beehive’s models comply with major regulatory frameworks?

Yes. They are designed to support TCFD, CSRD, and SB 261 requirements.

Are model outputs legally binding or provided for decision support?

Outputs are decision-support tools, not legal guarantees. They are based on best-available science and data sources. The climate risk assessments and projections provided by Beehive are based on complex models and data analysis techniques that attempt to predict future climate-related events and risks. These projections are inherently uncertain and subject to numerous variables beyond our control.

Beehive does not guarantee the accuracy of any climate risk projections or assessments. The risk categorizations provided are best estimates based on available data and modeling techniques at the time of the assessment.