top of page

Taming the Giants: Large-Scale Modelling and how can Surrogate Models be the right move

09 Jul 2025

Large-scale models can take ages to run, slowing down decision-makers and frustrating the users. Surrogate models offer a solution to this challenge. Surrogate models are simplified, faster alternatives trained on input-output data from the original model. While they aren’t physics-based, they can mimic complex models closely and deliver results far more quickly.

Large-scale modelling is developing and using computational models to simulate systems which are very complex in nature.  It’s all about managing complexity, as you have lots of variables and many scenarios with often very time-consuming computations. These systems would normally require the processing of large amounts of data (or variables) within wide ranges to represent real-world systems at significantly large scales, both temporally or spatially and so they require substantial computational power or time to solve. There are two types of large-scale models. The first type involves machine learning or statistical models trained on vast datasets - think on the lines of predictive models trained on millions of datapoints or high-dimensional data. Such models are used in many fields, ranging from finance, marketing, and bioinformatics. The second type is complex mechanistic or first-principles (physics-based models) which are based on physical or chemical laws and rules, and are often formulated as systems of differential equations and these can be used in engineering, environmental modelling, climate science, fluid dynamics or food process simulations. 


Let's take Climatic Modelling for instance, these simulate the Earth's atmosphere, oceans, land surfaces and ice and such models use fundamental laws of physics to predict how climate variables like temperature, rainfall or wind patterns change over time. Since these must cover the entire globe over decades or even centuries, they require huge computational resources. 

Another example would be Computational Fluid Dynamics (CFD) in Food Processing, designing process such as spray drying or extrusion in food manufacturing.  CFD models are used to simulate how fluids, such as air, steam or liquids, move and transfer heat or mass. These models are based on the Navier-Stokes equations and require fine-grained spatial and temporal resolution to capture key details. Running a single scenario can take hours or days especially if the geometry and chemistry is complex or the material properties are complex and vary with conditions such as temperature or pressure. 


So, if you've ever worked with large-scale modelling, whether that's handling vast datasets or complex, physics-based models, you’ll know that solving or training these models can take anywhere from a few minutes to several weeks, if not more. This time lag can be frustrating, especially for end users who may not fully understand why the results take so long. Often, this becomes a barrier to adoption.


However, the good news, is that there is possibly a way around this!


Surrogate models, are simplified mathematical versions of your original model, constructed using the outcomes of simulations from that full-scale model. By running the original model under a variety of starting conditions or inputs, you collect a range of outputs. Provided the underlying model is robust, these input-output pairs can be used to train a new, much faster model that mimics the behaviour of the original. While this surrogate model won't be rooted in physical laws, it will be built on sound data generated from a model that is.


That being said, two critical questions arise, the first being how many original simulations you need to execute, and will it take so long that building the surrogate model becomes no longer practical or feasible? The answer to this depends on several factors, mainly the complexity of your model and the breadth of the input space that you want to explore. If you're dealing with many variables across wide ranges, the effort required might be substantial.


Still, it could be worthwhile. Surrogate models can offer results orders of magnitude faster than the full models. Building one isn’t always straightforward, but if it makes your work more accessible and widely used, it might just be the right move.

bottom of page