
Models & Analytic Components
At the core of our value proposition is a unique model development and lifecycle management process which produces robust analytics that are easy to maintain, govern, use and integrate
The Modelry framework captures and parametrizes the best development practices into a consistent yet flexible process. The core building blocks are Analytic Components which are individually validated and documented in out Modeling Standards document. Users can deploy them out of box with our pre-defined parameter rules or adapt them as needed.
Internal and external data inputs used in the development process. Examples include:
- Historical transaction level attributes as well as losses and/or credit migrations
- Macroeconomic data (including regulatory scenarios as applicable)
- Simulation scenarios
- Operational assumptions
- Etc..
- Geography (state, MSA, ZIP)
- Type (office, retail, etc..)
- Construction or Income Producing (NOI)
- Sponsor history / strength
- LTV, collateral, DCR
- Term, price
- Interest rates forecast
- GDP, unemployment at state level
- Cap rates at MSA or ZIP code level
Internal and external inputs typically include historical transaction / obligor level data, performance and losses, macroeconomic data and scenarios, etc..
For a post-COVID model assessment some additional considerations apply:
- Data needs to be segregated into pre and post pandemic
- Additional external / industry data generally has to be incorporated since the post-pandemic period is quite short
- In many cases, structural scenarios and synthetic data points need to be used as supplements
Structured Modeling Framework
Hover over each component to learn more
Input data for the execution of the models in the suite. This will typically include portfolios in canonical form (prepared in the data pre-processors) and either scenarios for forecast models or application data for originations, transaction history for behavioral models, etc..
Data preprocessing components include a suite of tools to homogenize and prepare various types of datasets for modeling. Some of the standardized items included in this step are:
- Transforming raw input data into canonical data structures that can be processed by the analytic and reporting engines
- Patching gaps using variety of smoothing techniques (e.g. arithmetic and geometric attribution, exponential and weighted smoothing)
- Removing outliers and other data defects
- De-trending time-series data (using spectral analysis, ARIMA, GARCH, etc..)
- Attaching data tags that are persisted throughout the process and facilitate pivoting and aggregation at the end
Data preprocessing components include a suite of tools to homogenize and prepare various types of datasets for modeling.
Additional functionality specific for a COVID-19 suitability assessment incudes:
- Inclusion of external data where previously none was used
- New blending schemes for internal and external data
- More sophisticated weighting schemes that can handle multiple distinct phase transitions
Our modeling framework is designed to incorporate business assumptions as parameterized constraints in the development phase, rather than the usual ad-hoc overlays.
In addition to making all related models consistent among themselves, these assumptions are fully captured into the validation and governance process, as well as automatically inserted into documentation.
A comprehensive review of models post-COVID requires an assessment of the suitability of most business rules and assumptions, separating them into the following two groups:
- Temporary changes during the lockdown that are expected to revert back to normal quickly (e.g. spikes in credit-card late payments became a poor indicator of subsequent defaults during the pandemic but the correlation will almost certainly revert to previous levels)
- Secular changes that will take a long time to unwind, if ever (e.g. usage patterns of office real estate)
A model suite is a set of integrated models and model components that cover a particular analytical space - usually a product
A banking example would be a Commercial and Industrial (C&I) suite that includes:
- Credit scorecards (PD, LGD, EAD)
- Behavioral scorecards
- Loss forecasts (baseline, stress, lifetime)
- Balance dynamics & origination volumes
- Pricing
Pre-processed "data components" are stored in production databases and are used by any modeling component that needs them. Most of the items stored here are
- Product specific data that will be used by a particular suite (e.g. historical portfolio characteristics)
- Macroeconomic data such as stress test time series that have been homogenized and transformed, and are used by any component that runs scenarios
Statistical analysis of the pre and post COVID-19 data cohorts creates the first decision point. If both data sets are sufficiently similar, then proceed to running the model under various parameter combinations. Otherwise, the following choices apply:
- Go back and re-process the input data
- This will generally work if the differences are not extreme. A sample approach here would be to change the blending algorithm of internal and external data, change the weighting schemes, etc.. Note that this would usually mean inserting new rules into the "context specific" hooks of the modeling machinery
- Attempt to run anyway
- If data re-processing does not produce better results, running the existing models under a wide set of parameters may lead to the right solution
- Stop and consider alternatives
- If the extent of the difference is such that the existing model clearly cannot work in the new regime, then an entirely new model or approach needs to be considered
Components that include all primary data science techniques:
- Ordinary least squares
- Weighted OLS
- Kalman filter
- Log-Linear models
- Multivariate (Logit / Probit)
- Poisson Models
- Neural networks
- Deep learning
- Decision trees
Statistical tests, with defined parameters, soft and hard passing thresholds control the modeling "loop" through a scoring and weighting scheme. Typical examples include:
- Residual analysis
- Goodness of fit (R-squared)
- Discriminatory power (Kolmogorov-Smirnov, Gini)
- Correlation / auto-correlation analysis
- Stationarity tests (Dickey-Fuller (DF), modified DF, KPSS)
Additional tests check against constraints imposed by business or context specific assumptions.
Statistical tests, with defined parameters, soft and hard passing thresholds control the modeling "loop" through a scoring and weighting scheme
Because of this structured AI development approach, we are able to insert any number of additional tests, or finetune existing ones specifically for the COVID-19 assessment exercise
Preliminary model outputs are processed through a series of performance tests, which can be quantitative or in some instances qualitative rules. If all tests are successful, the outputs are stored in a production database. If not, any of the following may happen, depending on the nature of the test and the issue:
- Model goes back for redevelopment or recalibration
- Overlays are applied to results
- Outputs are accepted with conditions and/or limitations on their use
A critical aspect of our analytical framework is the ability to incorporate context-specific components into the core of the development machinery. This enables out "tree trunk" approach to building model suites: a common core "trunk" that incorporates the risk drivers for that particular product, and "branches" that are then adapted for specific use
A common example is a data weighting component that can, for instance, increase the weight on stress periods for stress test models, while emphasizing recency for the models used for day-to-day business decisions
A critical aspect of our analytical framework is the ability to incorporate context-specific components into the core of the development machinery
For the purpose of assessing models' post-COVID suitability and remediating deficiencies we use additional contexts that represent pre and post pandemic environments as well as stable long-term blends.
The Modeling Engine

All modeling process components are parameterized code representations of documented standards enforcing consistency by construction
They are pre-validated and can be used out-of-the box
Examples of Supported Model Suites
Commercial Real Estate (CRE) portfolios are typically subdivided into 3 main categories:
- Income Producing (also know as Stabilized):
- Loans are used to repay the mortgage on the property and are supported by stable operating income (NOI)
- Construction:
- These loans are used to fund the construction or major enhancements of properties which have no or reduced NOI. Like other types of Project Finance drawdowns are scheduled to match project requirements and repayments are based on expected income generated after completion
- Land:
- Usually a small portion of any portfolio, they carry the highest levels of risk and are the least amenable to statistical modeling
Additionally, the residential sub-segment (e.g. multifamily) and in particular the residential construction portfolios tend to present unique data and modeling challenges.
While first lien mortgages are typically the largest component in a bank's residential loan portfolio, other products may include second lien mortgages, home equity loans and lines of credit (HELOCs) as well as reverse mortgages. An additional complication is the splitting of servicing, and occasionally sub-servicing into separate product lines, often owned by an institution other than the either the originator or the owner of the loan.
In designing The Modelry's residential suite we also took into consideration that mortgage portfolios are very frequently securitized and we therefore ensure that the underlying credit components of our suite can be directly fed into structuring and waterfall layers used to model structured products.
The data used to develop CRE models encompasses a large number of different drivers that vary substantially by segment. The list below is a small subset of some of the key attributes for commercial properties:
- Property data:
-
- Geography (state, MSA, ZIP)
- Type (office, retail, etc..)
- Projected income, key tenants
- Loan & Obligor:
-
- Sponsor history / strength
- LTV, collateral, DCR
- Facility term, price
- Macroeconomic:
-
- Interest rates forecast
- GDP, unemployment at state level
- Cap rates at MSA or ZIP code level
Multifamily and residential construction segments have additional unique measures such as projected rentals and absorption rates.
Historical mortgage data tends to be some of the most consistent among bank products, and numerous industry datasets available as well. Some of the key attributes are:
- Property data:
-
- Location (state, MSA, ZIP)
- Appraisal values
- Use (owner occupied, primary)
- Loan & Borrower:
-
- Income, ability to pay, FICO
- LTV, balances, delinquencies
- Loan age / time to maturity
- Macroeconomic:
-
- Interest rate projections
- GDP, unemployment at state level
- Yield curve slope
Methodologies
Our CRE suite is composed of multiple segmented sets, all consistent with each other (e.g. risk on construction loans approaches its stabilized version as the project nears completion). All segments share the following features
- All commercial models are underpinned by a full canonical data model for parties (sponsors, guarantors, etc..), facilities, collateral (property), and covenants, which enables them to handle complex, structured, multi-layered transactions
- Because internal CRE data within banks tends to be some of the most unreliable (poor collection practices, lots of "one-off" features, few defaults), the use of some external data is recommended. Our suites includes a set of tools and processes to optimally "cut" external data (including data from CMBS) and blend it with internal
- CRE models tend to rely on a large number of expert-judgement assumptions, which in our framework are baked into the data that goes into the statistical machinery, rather than ad-hoc overlays
- The core modeling process involves the computation of losses first, which are then split into the regulatory required PD and LGD components and approach that we have found far superior over decades of experience
At the core of our residential suite is a parametric multinomial logistic regression that computes transition probabilities among various states (current, progressive levels of delinquencies, REOs) also known as roll-rates and includes prepayments as a competing risk.
- Additional key components:
-
- Generation of critical secondary variables used in models such as burnout and media effect
- Documented process for blending internal and external data
- Differential weighting schemes applied to development data that enable the same models to be used for base and stress scenarios
- Portfolio generators allow easy creation of simulated portfolios for testing
Originations |
|
Portfolio Management |
|
Regulatory & Capital |
|
Originations |
|
Portfolio Management |
|
Regulatory & Capital |
|