Introduction

Climate forcing associated with the atmospheric aerosol has been singled out by Intergovernmental Panel on Climate Change (IPCC) assessments as contributing the largest uncertainty to total radiative forcing estimates (e.g., [18]). Particularly stubborn is the problem of aerosol effects on the amount and brightness of warm, shallow clouds that cover large areas of the oceans and provide a cooling effect that partially offsets the warming associated with greenhouse gases. Our uncertainty in the radiative forcing associated with aerosol–cloud interactions (“indirect effects”) is further exacerbated by our uncertainty in how these warm clouds will respond to a warming climate (the “cloud feedback” problem) and how they relate to climate sensitivity.

In spite of abundant evidence of aerosol influences on the amount and radiative properties of shallow liquid water clouds, the quantification of these effects has proven elusive. The goal of this paper is to outline reasons why the task is so difficult, and to review recent progress in the understanding of aerosol–cloud interactions and quantification of their radiative forcing. We will organize our discussion around estimates of the climate forcing of the anthropogenic aerosol, conventionally expressed as an effective radiative forcing (ERF) relative to the preindustrial climate state. We will restrict our discussion to aerosol–cloud interactions in warm (liquid-phase) clouds, given their tight connection to other unresolved climate change questions such as cloud feedbacks and climate sensitivity; we refer the reader to [78] for a comprehensive discussion of mixed- and ice-phase aerosol–cloud interactions (ACI). We will not cover questions associated with the abundance, size distribution, composition, and anthropogenic fraction of the atmospheric aerosol, since many other excellent papers have already done so (e.g., [106, 108]). We will also not deal with forcing associated with aerosol–radiation interactions that act on large spatiotemporal scales by changing regional circulation patterns (e.g., [17, 115, 133]). In addition to reviewing progress, we will discuss fundamental challenges inherent to quantification of ERFaci and suggest new conceptual ways of thinking about aerosol–cloud–climate forcing. Our hope is that these might facilitate progress towards reducing, or at the very least understanding, sources of uncertainty so that modeling and observational programs can be designed accordingly.

State of the Field

We begin our review of the current state of the field with a summary of why the ACI problem is so persistent in the face of currently available methods of estimating ERFaci, what those methods are, and what progress has been made since the Fifth IPCC assessment (AR5).

Why Are ERFaci Estimates so Challenging?

A Problem of Scales

Like dragons of yore, the ERFaci problem uses scales to resist its attackers. The first scale is energetic: the ERFaci is approximately two orders of magnitude smaller than the shortwave cloud radiative effect (e.g., [153]). The path to a 1% effect goes partly through large perturbations that occur rarely, or over limited areas (shiptracks, closing of open cells; [47]), and small perturbations that occur frequently, posing challenges for observability. For example, [122] indicate that shiptracks, the most eminently observable manifestation of ACI, exert a paltry 0.5 mW m−2 of forcing globally. The challenge is therefore to determine the meteorological conditions under which aerosol perturbations manifest as energetically significant, along with their geographical coverage and frequency of occurrence.

The second set of scales is spatiotemporal: the scales relevant for ACI range from the microscale through cloud-process scales for cloud-top turbulent entrainment and cloud updrafts. However, the aerosol perturbations at cloud-scale affect the regional and global circulation, and these regional- through global scale changes feed back as meteorological influences on cloud processes [112, 133, 150, 151]. This means that constraining ERFaci requires understanding the microscale, the cloud process scale, and the global scale, as well as the interactions between scales.

Tools

General circulation models (GCMs), our primary tools for quantifying the radiative forcing associated with aerosol–cloud interactions, are particularly ill-suited to resolve the level of detail of aerosol and cloud processes that matters for quantification of forcing. Process models that focus on the details of aerosol–cloud interactions at fine scales (order 10s of meters) have proven very useful for elucidating the underlying physical processes within the highly coupled system, but they lack the ability to incorporate the large range of spatiotemporal scales that are required to assess these effects regionally and globally. The community thus navigates between the Greek mythological sea monsters of Scylla and Charybdis, maneuvering between quantification based on models that incorporate the appropriate spatiotemporal scales but lack the necessary process-level details, and models that resolve the details but are unable to provide quantification at climate-relevant spatiotemporal scales.

Data from airborne, space-based, and ground-based platforms are key to improving models of all scales. Observing systems have progressed significantly in recent decades, yielding copious amounts of the data that are crucial to advancing our understanding. Observations are not without problems; here too the question of measurement scale rears its head. In situ measurements, while providing great detail, typically represent very small volumes of the atmosphere. Ground-based remote sensing, using instrument arrays that simultaneously measure cloud and aerosol properties, provide long-term records of larger volumes, but with the drawback of uncertainties inherent to remote sensing (e.g., [31]). At larger scales, satellite-based observing systems provide global coverage of key aerosol and cloud properties, but at reduced spatial resolution and with the added complication that it is difficult to retrieve aerosol and cloud properties simultaneously in the same column (e.g., [28, 62]).

Inferring Forcing from Present-Day Conditions

A significant challenge to evaluating ERFaci is that, by definition, ERFaci addresses the present-day (PD) forcing relative to a preindustrial (PI) base state. Our lack of observations and lack of knowledge of the PI aerosol leaves much uncertainty (e.g., [24]). For example, assumptions about PI drop concentration N d or aerosol concentration N a strongly contribute to uncertainty in forcing estimates [23, 57]. Recent and planned field experiments are focusing on the remote clean marine environment in search for proxies for PI conditions (e.g., [92]).

Penner et al. [105] and Ghan et al. [42] have explained the difference between satellite-based and GCM estimates of ERFaci as resulting from the satellite-based analyses’ use of PD variability as a proxy for the PD minus PI change in aerosol, which they claim results in a significant low bias in ACI metrics. As shown by [52], this bias can be mitigated by using the aerosol index (the product of aerosol optical depth and the Angström exponent), which is a better proxy for anthropogenic N a than the particularly problematic aerosol optical depth (AOD); satellite–model ERFaci differences can be further reduced by avoiding numerical issues related to the small changes in N d in relatively pristine areas.

Equifinality

The climate system (or parts thereof) is an open system and therefore characterized by “equifinality”, i.e., similar outcomes may be achieved with different representations of model processes [12].

It has been demonstrated that models with equally plausible but different process representations can match observations of the global-mean temperature record, or in the case of fine-scale models, observations of basic cloud field properties (e.g., [13]). The question that follows is whether equifinality could present an obstacle to our ability to constrain the system through observations of the atmospheric state and processes. And, in the face of equifinality, how can we determine which models will have the most predictive power? This theme will be discussed further in “Improving Observational Constraints on GCMs” and “Blending Modeling Approaches”.

Abstraction of Physical Processes

To represent ACI in GCMs, a specific parameterization has to be implemented for each indirect effect (specifically for warm clouds, an aerosol activation scheme for the “first indirect effect”, an N d -dependent precipitation scheme for the “second indirect effect”); as the tools available shape the science and analysis, this association between parameterizations and aerosol effects has shaped the thinking about ACI in GCMs, leading to the “n-th indirect effect” paradigm often criticized by the process-scale community. The first indirect effect (or albedo effect) is conceptually well defined, and observationally based, although not straightforward to quantify in GCMs because of their lack of resolution of updrafts (e.g., [152]); the second indirect effect (or lifetime effect), however, encompasses numerous hypotheses, none of which is directly linked to cloud lifecycles, nor well-rooted in observations [138]. It is largely a GCM construct that removes condensate from the atmosphere based on the the degree to which “autoconversion” of cloud water to rain water is sensitive to the aerosol. Real clouds experience subtle responses to aerosol perturbations as a result of adjustments, at a range of scales, of the cloud system to the aerosol, and to its constantly adjusting environment. In some cases, internal adjustments within the system absorb or “buffer” the system to aerosol perturbations, typically by reducing the cloud amount (e.g., [19, 138]). In other cases, the system is very sensitive to the aerosol, e.g., in stratocumulus clouds where the absence or presence of aerosol can determine whether the cloud field takes on an open or closed cellular structure (e.g., [130]).

The aforementioned adjustments occur at scales of hours, and are difficult to observe because of a fundamental problem of attributing the change in cloud amount to an external agent in a system that adjusts to its environment, and at the same time modifies its environment. The typical approach to quantifying the lifetime effect is via observations of the response of cloud liquid water path (\(\mathcal {L}\)) to a change in N a (\(d\mathcal {L}/dN_{a}\)) based on regression of highly aggregated data (e.g., [26]). Whether this approach adequately reflects the true responses, averaged up to the observational scale, is unknown.

A consequence of categorizing indirect effects into the albedo and lifetime effects is that the community often focuses on processes that are amenable to parameterization at the expense of others. For example, cloud-scale mixing with the environment receives little attention, in part because it is difficult to parameterize in GCMs (e.g., [95, 114, 162]), and yet the details of how mixing is represented could be 2–3 times more important with respect to cloud brightening than is the aerosol brightening of clouds [59].

Inferring Process from Snapshots

Observations of the aerosol–cloud system, be they in-situ, satellite-, or ground-based, undoubtedly provide important constraints on the realism of model simulations and, with well-quantified measurement errors, a means to improve models. But a somewhat neglected aspect of the model–observation comparison is that observations provide snapshots of the system at various stages of its evolution. From these fragmentary (in time) pieces of information, bolstered by very large statistics that permit binning in what we believe to be the cloud-controlling variables, we attempt to infer process understanding of the evolving aerosol–shallow cloud system. Studies have used revisits of a scene by the same or different satellites (e.g., [49, 50, 86, 93]), but the revisit intervals are coarse compared to cloud-adjustment timescales. [118] have used a combination of polar-orbiting and geostationary satellites, but for many aerosol–cloud studies, geostationary satellites have had insufficient spatial and spectral resolution until recently.

Thus, an essentially temporally evolving system with an inherent memory is studied with a Markovian, “snapshot-in-time” methodology, which assumes that processes are related to the current state of the system, and have no memory of past states. Observations have been demonstrated to be useful for inferring knowledge of aerosol–cloud microphysical processes (e.g., [76, 77, 83, 100, 134,135,136]). An important question is whether the non-Markovian nature of the system presents some fundamental limitation to the extent to which one can develop a deeper understanding and quantification of aerosol–cloud processes with snapshots. While this is not the topic of the current paper, we advocate delving deeper into this issue and adapting our measurement approaches so that we not only constrain models, but also maximize our understanding of physical processes. New geostationary satellites such as Himawari and GOES-16, with their multi-spectral capabilities and much improved spatiotemporal resolution, will surely be illuminating.

Current ERFaci Estimates

Broadly, the tools available for studying ACI are in situ observations, space- and ground-based remote sensing, and a range of models spanning fine-scale models to GCMs. Seinfeld et al. [128] provide a thorough summary of the specific strengths as well as shortcomings of each tool at particular spatiotemporal scales. One of the promising developments in the recent past has been the increasing use of combinations of tools to mitigate these shortcomings and maximize strengths in ERFaci estimates. Promising progress has also been made towards understanding the systematic discrepancy between GCM-based and observationally based ERFaci estimates and on more rigorously understanding the sensitivity of GCM ERFaci to modeling assumptions. In this section, we summarize studies from these promising categories. We do not attempt to be exhaustive in summarizing GCM-based estimates; [78] and [79] provide an excellent compilation of such studies since AR5.

Combinations of multiple tools can take the form of constraining model behavior to observations (e.g., [52, 110, 148]) or of using a combination of observations and reanalysis to address the problem of concurrently observing aerosol and cloud from satellite (e.g., [26, 90, 91]). (We discuss [110] and [148] in greater detail in the context of emergent constraints; see “Improving Observational Constraints on GCMs”). Chen et al. [26] and McCoy et al. [90, 91] have used aerosol fields from chemical transport models to avoid the problems of collocated aerosol and cloud retrievals. Chen et al. [26] separate their forcing estimate into an “extrinsic” (due to adjustments in cloud fraction) ERFaci contribution of −0.46 ± 0.31 W m−2 and an “intrinsic” (due to changes in N d and \(\mathcal {L}\)) ERFaci contribution of −0.49 ± 0.33 W m−2; [90, 91] report an RFaci (i.e., the forcing due to N d change only) of −0.97 W m−2.

The above-cited studies only partially address the difficult problems of causality and meteorological confounding by small-scale variations in humidity that are correlated with both cloudiness and AOD [28, 62, 109]. Gryspeerdt et al. [51] take a significant step forward by using N d as an intermediary variable to reduce the meteorological component of a causal relationship between aerosol and cloud fraction. Their estimate of the contribution to ERFaci due to the cloud fraction response of liquid clouds is −0.48 (−0.1 to −0.64) W m−2.

As discussed in “Why Are ERFaci Estimates so Challenging?”, progress is being made on understanding the discrepancy between GCM and observational estimates of ERFaci, which was large in AR5 (ERFari+aci = −0.93\( ~\text {to}-{}0.45 \text {~W~m}^{-2}\) with a median of −0.85 W m−2 for studies using the satellite record, compared against − 1.68\(\text {to} -{}0.81 \text {~W~m}^{-2}\) with a median value of − 1.38 W m−2 for GCM studies; [18]). Gryspeerdt et al. [52] show that the choice of N a proxy can significantly reduce the discrepancy; their best estimate of RFaci based on a GCM-observation combination is −0.4 W m−2. Christensen et al. [28] and Neubauer et al. [102] take a different approach, investigating the effects of reducing near-cloud biases in satellite aerosol observations consistently between observations and modeling. This simplification of ACI, where the effect of clouds on aerosols is reduced, succeeds at bringing the GCM and observations into agreement and leads to a reduction in the intrinsic ERFaci to −0.28 ± 0.26 W m−2 from −0.49 ± 0.18 W m−2 when no removal of near-cloud aerosol observations is performed. However, the distant aerosol field can also be expected to have less causal connection with the aerosol that perturbed the cloud; the resulting forcing estimate should probably be considered an upper (i.e., least negative) bound.

Considering the difficulties arising in the interpretation of multimodel intercomparisons (see “On the Suitability of ERFaci as a Global Synopsis of ACI”), we believe modeling studies that carefully analyze the sensitivity of model projections to assumptions underlying the parameterized processes to be useful. Notable studies of this type have characterized the GCM ERFaci sensitivity to parameterized precipitation [40, 116], turbulence [38, 101], and aerosol processing by cloud [101, 102]. The value of such studies is to point out the sources of uncertainty in model projections, perhaps leading towards tighter constraints; we discuss this possibility in greater detail in “Improving Observational Constraints on GCMs”. Nevertheless, results from such exercises should not be overinterpreted, since, if the models do not resolve important aerosol and cloud processes adequately, they likely do not reflect the correct susceptibility to aerosol perturbations. [101] and [82] have investigated the effect of increasing model resolution on ACI; [82] do not provide ERFaci estimates, but do indicate that higher model resolutions favor a stronger droplet-number response and a weaker precipitation response, in line with process-scale evidence (see “Improving Observational Constraints on GCMs”) and our sentiment that susceptibilities will be incorrect if processes are not resolved adequately. Confidence in these sensitivity studies may be enhanced through the increased use of subgrid-scale representation and unified convection/ turbulence/ cloud schemes [40].

The range of reported ERFaci estimates remains large, as does the uncertainty in those studies that provide an uncertainty estimate. Different methodologies are used for estimating ERFaci, all of which have known deficiencies; definitions of uncertainty abound (statistical versus systematic, with different systematic effects considered by each study, and different methods used to estimate each systematic effect). This makes it highly nontrivial to synthesize the various results into a combined ERFaci estimate. In AR5, the final estimate of ERFaci was based on subjective expert judgment. We advocate against this subjective method, which obscures the assumptions and hypotheses that enter the estimate, making them hard to falsify (e.g., [55]). For future assessments, we hope that a more formal, traceable approach such as the Bayesian formalism in [139] will be adopted (see “Hypothesis Refutation”).

On the Suitability of ERFaci as a Global Synopsis of ACI

Tight constraints on aerosol ERFaci are desirable primarily for two reasons: first, to aid estimates of climate sensitivity from the historical record; and second, to provide insights into a future where aerosol emissions will be drastically reduced. These two “customers” of the ERFaci “product” have slightly different requirements, so it is worthwhile thinking carefully about whether a single scalar ERFaci relative to PI aerosol, which is so often taken as the holy grail of the aerosol–cloud community, is in fact the most suitable output to provide.

Because of the historical entanglement of aerosol and greenhouse gas forcing, estimating the climate sensitivity from the observational record requires knowledge of the historical aerosol forcing (e.g., [4, 64]). To quantify the uncertainty in climate sensitivity resulting from a given aerosol forcing estimate requires knowledge not only of the forcing but also of its uncertainty. This raises the question of what specific form the ERFaci uncertainty estimate should take: do confidence intervals suffice, or do we need to constrain the shape of the ERFaci probability density function (PDF) as well? In light of the importance of the uncertainty estimate, we also need to think carefully about the methods by which it is derived. One source is GCM intermodel spread (e.g., [36, 42, 52, 145, 161]), but this source is plagued by problems of errors versus uncertainty, representativeness of the model diversity [66], and common lineages of model components ([6], and references therein); further, model spread may be biased by requiring the model ERFaci to lie within the consensus range [119]. Attempts to set bounds with more rigorous methods are discussed in “Hypothesis Refutation”. A second way in which forcing and feedbacks are entangled is through aerosol feedbacks, where a warmer climate system produces higher natural aerosol emissions and thus potentially increased aerosol radiative forcing [41, 72] or where the faster hydrological cycle in a warmer climate leads to increased aerosol scavenging and thus potentially decreased aerosol radiative forcing (e.g., [97]).

The PD aerosol forcing relative to PI is often used for insights into the aerosol forcing in a clean-air, but not necessarily a decarbonized, future. These insights are imperfect for two reasons. Even with stringent emissions controls, the anthropogenic fraction of total aerosol may still be large compared to preindustrial conditions [8]. Furthermore, in the future climate, the perturbations to cloud induced by greenhouse gas forcing and global warming will be even larger than in the present, and so (in the language of [38]) will the importance of the “C” relative to “A” in ACI. As pointed out by [98] and [79], ACI modulate cloud fields that are also affected by greenhouse gas-warming-induced changes; these changes will become increasingly large in the future climate and are neglected in the current method of diagnosing aerosol forcing in GCMs.

Thus, we should keep in mind the intended use of the aerosol forcing estimate and that the aerosol forcing problem and the feedback problem cannot be neatly compartmentalized.

Reducing the ERFaci Uncertainty

In light of the continuing large uncertainty on ERFaci, we now discuss specific (and published) ideas that we believe have the potential to reduce that uncertainty. We then offer a conceptual framework that might assist in achieving this goal.

The Scale Gap in GCMs

Recognizing that ERFaci is determined on scales from the cloud-process scale to the general-circulation scale is essential to grasping the magnitude of the ACI problem. Conceptually, the simplest method would be to increase GCM resolution sufficiently to represent cloud-process scales \(\sim \mathcal {O}\)(10 m); assuming exponential increases in computing power continue at the current rate, computational capability will have caught up with this resolution in half a century, when aerosol forcing may well be irrelevant and climate sensitivity can be determined from direct temperature measurements [96, 121]. Furthermore, it is likely that, even then, uncertainty due to parameterized microphysics will continue to be an issue, much like in fine-scale models currently.

In the interim, GCMs more frequently include subgrid-scale representations of clouds using Eddy Diffusivity/ Mass Flux [131] or high-order closure schemes [15, 74] that have demonstrated success in improving shallow convection. When coupled with microphysical models, they have the potential to provide better representation of ACI. In addition, the multiscale modeling framework (an embedded cloud resolving model within a GCM grid box) [48, 104, 111, 117] provides a way forward. Vertical resolution is a persistent problem for shallow clouds, particularly stratocumulus, and new ideas have been proposed to locally apply an adaptive vertical mesh [85, 159] when stratocumulus conditions are expected.

A central question is what resolution is required to capture the climatically relevant aspects of aerosol–cloud interactions. At what scale do we need to resolve clouds and aerosol–cloud interactions to quantify ERFaci? How important is resolution for quantifying closely related aspects such as cloud feedbacks? The answers may only emerge through concentrated efforts to improve the representation of clouds and aerosol–cloud interactions in GCMs.

Ensuring Scale Awareness in Model–Observation Intercomparisons

Scale has consequences for model–observation comparisons. Too frequently, little attention is paid to this issue, rendering comparisons of questionable value. An exercise simulating comparison between perfect model output (200-km scale) and perfect observations (10-km scale) shows biases of between 30 and 160% due entirely to different spatial sampling of models and observations; such errors are often larger than measurement errors in real observations [124]. Similarly large biases can occur when care is not taken to ensure temporal synchronicity in data [125]. Furthermore, a lack of consideration of the diurnally biased sampling associated with polar-orbiting satellites can bias results significantly [93, 123]. The spatial heterogeneity of observations also needs careful consideration. Data is often discontinuous in space and time, requiring geospatial statistical methods to address data heterogeneity when comparing to regularly gridded model output [123].

Scale also influences the calculation of susceptibilities, whether they are calculated based on observations or model output. Harmonizing the analysis scales between observations and modeling, while recognizing the relevant process scales, is important if one is to conduct meaningful comparisons of ACI metrics (susceptibilities) [89].

The unequivocal conclusion of these studies is that observational studies, and attempts to constrain models observationally, must pay careful attention to scale for their conclusions to be of value.

Improving Observational Constraints on GCMs

An aspect of equifinality that GCMs must contend with is that the model parameters that most strongly control the PD state of a model are not always the ones that determine its sensitivity to anthropogenic greenhouse gas or aerosol forcing: the PD state tends to be controlled by overall process rate scale factors, whereas sensitivity to forcing tends to be controlled, inter alia, by preindustrial aerosol properties and process rate sensitivities to aerosol [23, 57, 80]. In our opinion, it is an open question whether this implies a fundamental limitation on the ability of PD observational constraints to constrain climate projections. Our hope is that a more suitable choice of observations that constrain model ERFaci can be identified, presumably ones that constrain the sensitivity of models to PD variability in aerosol. This section offers suggestions of what such observational constraints might be.

Emergent Constraints

One potential way forward is using “emergent constraints” (ECs), which relate an unobservable sensitivity of the climate system to a modeled state of the system that is also observable. The idea is that if a physically understandable and robust relationship exists between sensitivity and system state in the model world, then observations of the system state can be used to quantify the sensitivity. The first EC to be considered for ERFaci [110] looked at cloud macroscale property responses to aerosol perturbations, and their relationship to ERFaci. Another proposed EC relates the susceptibility of probability of precipitation (Spop) to ERFaci [148]. [65] have formulated a set of guidelines for identifying robust ECs: strong physical basis, robustness to choice of model ensemble, no obvious multiple influences, and reasonably high correlation between predictor and predictand. Whether the ECs of [148] and [110] meet these criteria is the subject of debate [42, 75]: the ECs are not based on well-resolved clouds and so their applicability may be questionable; in addition, the ECs are predicated on the “lifetime effect”, which in GCMs presupposes an increase in cloud lifetime with increasing aerosol. The debate highlights that effort to address the shortcomings of the Spop-based EC is well spent and that the inadequate treatment of subgrid-scale clouds and precipitation in GCMs requires urgent attention. Thus, a partial answer to the question posed in “The Scale Gap in GCMs”, “what resolution is required to capture the climatically relevant aspects of aerosol–cloud interactions?” might lie in the robustness of an EC to an increase in model and process resolution.

Process-Based Observational Constraints

GCM estimates of ACI adjustments are highly sensitive to the parameterization of warm-rain collection processes (e.g., [38, 94, 107]); these detailed drop–drop interaction processes are lumped into “autoconversion”, representing the collision of cloud droplets to form rain, and “accretion”, representing the collection of cloud droplets by rain embryos. Autoconversion schemes use power-law fits in liquid-water mixing ratio and N d to observational or process-scale modeling datasets [63, 155], with “enhancement” corrections applied to account for the discrepancy between the available GCM gridbox mean mixing ratio and the high-liquid part of the subgrid variability that drives warm rain in reality; at the same time, the autoconversion scheme, in its capacity as a sink for low cloud, is a popular mechanism for tuning the TOA shortwave flux [45, 58, 80, 88, 119].

In addition to the autoconversion scheme, the precipitation production partitioning between autoconversion and accretion also strongly affects ACI adjustments [40, 94, 116]. Conventional precipitation observations (total precipitation amounts) are not sensitive to this partitioning; more process-oriented observations are needed to constrain the ERFaci sensitivity to precipitation parameterizations. Due to its ability to resolve the vertical structure of precipitating clouds and its ability to detect drizzle, CloudSat is particularly suitable for providing such observational constraints, for example through the “Contoured Frequency of Optical Depth Diagram” (CFODD) method [141,142,143].

Observational Record as Constraint

Using the observational record of global-mean surface temperature over the 20th century as a constraint on aerosol radiative forcing has been attempted in many ways but is fraught with problems. The most immediate problem is that the observational record is a superposition of internal variability, greenhouse gas ERF, and aerosol ERF (e.g., [4, 64, 126]), and hence a constraint on each component individually cannot be extracted from the observational record of a single variable. More advanced methods attempt to use episodes of faster or slower warming in the historical record to derive multiple constraints from a single variable (e.g., [137]), noting the hemispheric asymmetry in aerosol forcing and the logarithmic increase in ACI with aerosol concentration. Examination of the zero-dimensional ACI model underlying the [137] ERFaci limits shows that GCMs predict a much more linear relationship between global-mean ERFaci and global-mean aerosol emissions [73, 113] due to the fact that the PD climate system still contains large regions where the ACI is far from saturated. With a linear relationship, the [137] constraint weakens to ERF\(_{\text {ari+aci}} > -~1.6\text {~W~m}^{-2}\), which is not a meaningful narrowing of the consensus range. This point is also illustrated by the experience that models with a historical aerosol forcing sufficiently strong to produce a counterfactual cooling during the second half of the 20th century lie far outside the accepted range of ERFaci [46], again providing no more stringent a constraint than ERF\(_{\text {ari+aci}} > -~1.6\text {~W~m}^{-2}\).

Translating Insights from Process-Scale Studies into Constraints on Global ERFaci

We address the problem of attempting to parameterize subgrid-scale cloud processes based on the information available at the GCM grid scale, as has been pursued by the cloud feedback community [20]. To enable such an approach, it would first be necessary to establish a community consensus on “robust” aerosol effects based on observations and process-scale modeling, as well as the GCM-scale conditions under which they apply. We propose the following list of cloud responses to increases in aerosol amount that, based on a synthesis of observations and fine-scale modeling, exhibit some robustness:

  1. 1.

    Increases in aerosol result in brighter clouds, ceteris paribus [144]. Questions still remain regarding the degree of brightening, which is influenced by aerosol size and composition.

  2. 2.

    Under clean conditions, cloud amount increases in response to increasing aerosol; this includes deepening of clouds and an increase in cloud fraction [3, 27, 30, 54, 68, 127, 160]. The aerosol helps stabilize a system likely to precipitate (a colloidally unstable system). The corollary is that systems increasingly approaching colloidal instability tend to move rapidly towards a low cloudiness state [7, 35].

  3. 3.

    Counteracting effects often create a buffered system; evaporation–entrainment–sedimentation feedbacks offset the deepening and increased cloud amount. This typically occurs in more polluted conditions [2, 21, 56, 149, 158]. Analysis of very large data sets sometimes shows no clear aerosol signal in the cloud radiative effect [129], and even massive aerosol perturbations from effusive volcanoes may not result in \(\mathcal {L}\) and cloud fraction responses large enough to be detected above the meteorological noise over a few months [84].

  4. 4.

    Absorbing aerosol above cloud increases marine stratocumulus cloudiness [22, 60, 154]; absorbing aerosol in the boundary layer decreases cloudiness via reduction in surface fluxes over land and stabilization [1, 34, 70, 71, 132].

Rather than applying one-size-fits-all rules, it might be possible to translate the above responses into improved constraints on ERFaci by using the GCM-resolved fields to identify situations where we expect certain effects to dominate over opposing effects. For example, when aerosol concentrations are low and the boundary layer is deep (both conditions that GCMs might be trusted to diagnose reasonably well), process-scale modeling predicts large susceptibility of cloud fraction and liquid water path by precipitation suppression; conversely, in shallow, polluted boundary layers, process-scale modeling predicts negative susceptibilities due to enhanced cloud-top evaporation. We can envision different ways of feeding this process-scale knowledge back into the GCM; none is entirely satisfying, but at the very least, one could enforce a qualitatively correct GCM response to aerosol perturbations, increasing confidence in the model projections.

Untangling Versus Embracing Co-variability of Aerosol and Meteorology

Because meteorology is a primary driver of cloudiness and small changes in meteorology can have a large influence, aerosol effects on the cloud system are difficult to separate from meteorological “noise”. Typically, aerosol effects are examined within subsamples of data sorted by cloud controlling metrics such as lower tropospheric stability, or pressure vertical velocity at 500 hPa, which tends to sort the data by cloud regime (e.g., [76, 87]). While this is a useful approach, these metrics are imperfect, and small differences matter. An alternative might be to focus more effort on understanding the relationship between meteorology, aerosol, and cloudiness, i.e., to quantify the co-variability of meteorological variables among themselves and with the aerosol. The former would help limit the meteorological parameter space, defining distinct cloud regimes and, hence, reduce the number of degrees of freedom in the system. (This is the rationale behind metrics like lower tropospheric stability.) The latter would identify the conditions under which the aerosol is likely to have a more significant impact, opening the way to quantification of the frequency at which these conditions occur. Idealized cloud resolving models have in fact shown that the detectability of ERFaci depends on the co-variability of meteorological and aerosol conditions [32].

Routine modeling at observational sites that are well equipped with key cloud, aerosol, and radiation parameters are an attractive approach to improving model parameterizations (e.g., [99]). An added benefit is that they provide a wealth of data to quantify aerosol and meteorological co-variability with both observations and observationally validated process model output. Proposed field experiments to evaluate solar radiation management [157], in conjunction with modeling, would provide valuable data to quantify the aerosol–cloud radiative effect in well-characterized settings.

How might one address susceptibility in this new framework? Practically speaking, a particular cloud type would be defined by key meteorological parameters. For the stratocumulus regime, this might consist of potential temperature 𝜃 and specific humidity q t profiles, their respective jumps at the boundary layer top (Δ𝜃, Δq t ), and the boundary layer depth H. The typical observed range of stratocumulus \(\mathcal {L}\), cloud base height, and cloud top height restricts the range of variability of the meteorological drivers that needs to be considered. Focusing then on the co-variability of aerosol and meteorological perturbations replaces the more traditional susceptibility (e.g., \(d \mathcal {A} / d N_{a}\), where \(\mathcal {A}\) is planetary albedo) by a broader definition \(d\mathcal {A}/dX\) where X ∈{𝜃,q v 𝜃q v ,H,N a }. In other words, individual susceptibilities are replaced with the local slope of \(\mathcal {A}\) (or other relevant cloud property) in six-dimensional parameter space that should be convolved with the co-variability of meteorological and aerosol properties.

As with uni-dimensional susceptibility metrics, one still has to admit the possibility that the co-variability of the n parameters of interest changes with time. This topic would incorporate issues of (likely) positive cloud feedbacks and therefore a reduced cloud amount that could be perturbed by the aerosol through microphysical processes.

Simple Models

While the atmospheric science community has spent many decades developing models of ever-increasing complexity, complementary efforts have invested in alternative, and often much simpler, heuristic models, employing empiricism (e.g., [29]), network approaches [14, 44], simple computational frameworks [39, 156], or energy budget considerations ([25, 137], see below).

Pioneering efforts to consider simple dynamic system analogs to the complex atmospheric system date back to [81], who reduced the Navier–Stokes equations to three coupled differential equations and demonstrated the chaotic nature of weather systems. More recently, models comprising a few coupled differential equation models have been applied to microphysical processes [67, 146, 147], and to convection [16, 103], among others. These “simple” models are particularly useful since they have a small number of variables and free parameters, can be run many times, and are amenable to deeper understanding.

Increasingly, output from models of varying complexity is being analyzed in the dynamical systems language of stability and bifurcation [7, 9, 35, 69, 147], leading to a broader view of system susceptibility, e.g., the concept of “buffering” applied to aerosol–cloud interactions [138] can be interpreted as system stability (see below).

Addressing Model Uncertainty

In attempting to assess the importance of modeled processes, or the uncertainty in parameterizations, it is often standard practice to compare simulations that successively withhold individually selected processes. This approach has limited quantitative power because in coupled systems, it is the interaction between processes that determines model sensitivity. Statistical emulators, essentially sophisticated interpolators of an n-dimensional surface, allow a more robust assessment of the importance of individual processes [23, 61].

Another approach to assessing the uncertainty of model forecasts is through the use of stochastic representation of processes (e.g., [11]). The system is decomposed into slow (resolved and predictable) and fast (unresolved and unpredictable) scales, where only the statistical properties of the latter need to be represented. Berner et al. [11] argue that stochastic parameterizations “have the potential to trigger noise-induced regime transitions, and modify the response to changes in the external forcing,” linking them closely to the concepts of system stability/bifurcation described above.

Blending Modeling Approaches

Two different philosophical approaches to studying complex systems have been proposed: the Darwinian approach has its roots in ecology and emphasizes system complexity. The complex system is broken down into individual components, each of which exhibits its own complexity. In the case of ACI this manifests as calculation of susceptibilities, which in aggregate, reflect the system wide behavior. In contrast, the Newtonian approach, which is rooted in physics, places emphasis on simplified equations, system-wide behavior, and emergent patterns. It breaks the system down into as small a number of parts as necessary for the problem at hand [53]. The benefits of merging these approaches has been discussed elsewhere [32, 53], and we reiterate that sentiment here. As an example of how these approaches differ, and how they might be merged, we consider how they apply to the calculation of ERFaci. Figure 1 describes the methodology and conceptual approach in schematic form. The Darwinian approach breaks down aerosol influences on planetary albedo \(\mathcal {A}\) into a long chain of expansions of derivatives (susceptibilities), each term requiring quantification if one is to constrain a model (e.g., [42]). ACI metrics that quantify cloud microphysical responses to changes in the aerosol are one such example (lowest panel in Fig. 1). The Newtonian system-wide view encourages simplified equation sets and meta-analyses that provide valuable context and constraints, e.g., the \(\mathcal {A}\)–cloud fraction relationship [10], or perhaps emergent behavior, or system attractors/bifurcation (upper panels in Fig. 1). Between the Darwinian ACI metrics and the Newtonian \(\mathcal {A}\)–cloud fraction analysis lies fertile ground for additional analyses. Examples include (1) the radar reflectivity Z–cloud optical depth τ c phase diagram [143], which sheds light on microphysical processes like the balance of condensation vs. collision–coalescence growth, (2) measurements that elucidate the radiative properties of a cloud field very directly such as the probability distribution function of up- or downward shortwave irradiance [120] or the cloud radiative forcing/effect–\(\mathcal {L}\) phase diagram [129], and (3) cloud field properties such as cloud size distributions. Similar analyses of independent components/parameters would, when compounded, provide confidence in the predictive power of the model.

Fig. 1
figure 1

Schematicdescribing an integrated approach to constraining ERFaci. The Newtonian view focuses on the aerosol–cloud systemwide behavior. The Darwinian approach focuses on details of elements of the aerosol–cloud system. A merging of Newtonian and Darwinian approaches can be achieved via a balance between metrics that constrain processes at various levels of detail: (i) traditional aerosol–cloud interaction metrics (e.g., [33, 37, 42]); (ii) (joint) PDFs that illuminate physical processes [120, 143]; (iii) high-level analyses that address key ERFaci-relevant components [10, 129]; and (iv) emergent constraints [110, 148]. Analyses of all components can be derived from both observations and modeling. Comparisons should be done with appropriate attention to spatiotemporal scales. The right-hand side of the column conveys the emergence of a spiral-like motif embedded in a complex pattern

We refer back to “Equifinality”, where we posed the question, “in the face of equifinality, how can we determine which models will have the most predictive power?” It would appear that analyses of as many orthogonal components of the system as possible, including higher-order statistics, would be the best path. The choice of components will depend on the model resolution and the fields of primary interest, but ideally, they would blend Newtonian and Darwinian approaches.

Hypothesis Refutation

Given the current inability of any given tool to represent the full spectrum of scales involved in determining the ERFaci, estimates necessarily rely on a synthesis of evidence from different approaches. For the equilibrium climate sensitivity, [139] have proposed a method of “developing and refuting physical storylines (hypotheses) for values outside any proposed range”. The specific recommendation here would be to make an argument for a certain upper bound on \(\left |\text {ERF}_{\text {aci}}\right |\) and then identify the conditions under which that upper bound is no longer justified. The advantage of this approach is that all lines of evidence can be taken into account in the ERFaci bound, and that the contribution of each can be quantified via Bayesian inference [5]. Without this latter capability, it is difficult to falsify the assumptions and hypotheses that are implicit in the ERFaci estimate through further research.

Simple models such as [137, 140], and [43] have a prominent place in this line of thinking, as the consequences of assumptions for ERFaci in these models are traceable and falsifiable. In an illustration of the falsifiability inherent in such models, the assumptions of [137] have been shown to be too simple to capture a key aspect of the physical system [73]; in an illustration of the traceability, reasonable parameter assumptions were shown by [43] to lead to an ERFaci spread far wider than the consensus range.

Guiding Principles

Persistent themes in this overview have been careful attention to scale, detail, and model complexity, all of which are linked. We conclude by outlining a number of guiding principles that we suggest be considered when embarking on model improvements and model–observation comparisons.

Attention to Scale

Scale pervades this document in a variety of ways. As a start, we recall the tension between process resolution and climate-relevant spatiotemporal resolution, with no modeling system able to provide climate forecasts and at the same time adequately resolve small-scale processes relevant to ERFaci. Increasing computational power might resolve this issue to some extent, as in the case of global cloud resolving models that use the multimodeling framework, but these approaches are not a panacea, and regardless, we still foresee heavy competition for computational resources between earth system components and process representation within these components. In the latter case, there is a pervasive tension between the need for high spatial resolution to resolve small-scale motions and microphysical process resolution.

Scale emerges again when considering the mismatch between parameterizations that are often calculated based on theory or fine spatial scale models, and the coarse grid-mean model fields that drive them. This problem is of particular concern when processes are nonlinear with respect to climate model resolved fields (e.g., collision–coalescence sensitivity to cloud water content).

In a similar vein, the use of observations for model evaluation deserves special attention to often neglected spatiotemporal aggregation scales to ensure that comparisons are meaningful.

Balance in Process Representation

Computing advances have favored inclusion of increasingly detailed process representation into models that lack the infrastructure (resolution of cloud dynamics) to resolve said processes. Such detail is often inconsistently applied, with undue attention to representing processes for which parameterizations exist and insufficient (or perhaps appropriate, given our level of understanding, or GCM deficiencies) attention to other less accessible processes. Detailed cloud microphysics representation should be balanced with similar detail in other coupled components such as dynamics or radiation so that the system is well represented by its individual components. Advances in high-order closure representations of GCM subgrid processes provide such a balance for ACI studies.

Without this balance, there exists the danger that we will overinterpret model susceptibilities to well-studied processes and neglect other important but poorly represented processes. Moving away from traditional first and second indirect effects, and thinking more broadly about the coupled dynamical and microphysical interactions underlying ERFaci, would seem worthy of pursuit. Finding new observational constraints will be equally challenging.

Balance in Darwinian vs. Newtonian Emphasis

The multidisciplinary nature of the climate system would benefit from a thoughtful approach to balancing disciplinary detail with the broader multidisciplinary approach. “Reducing the ERFaci Uncertainty” section and Fig. 1 outline the different philosophical approaches to studying complex systems: the Darwinian approach with its emphasis on detailed study of individual system components and the Newtonian approach with its emphasis on the system-wide behavior. We urge a balance in these approaches for maximum benefit. Processes should be presented with enough complexity to capture their essence, with careful consideration of the level of detail applied to other coupled components. An excessive focus on complexity can be detrimental to understanding of how the broader system works; and taking heed of emergence, patterns, stability, or bifurcations provides an important context for whether detail matters.

This approach does not detract from the desirability of detailed models to advance understanding of specific processes; nor does it suggest that GCMs be simplified excessively, or that we should not strive to raise the level of detail of coupled system components.

Models of Varying Complexity

The rapid increase in computational power has driven model development towards complexity over simplicity to the degree that model output is increasingly difficult to interpret and causal relationships are nigh impossible to identify, let alone quantify. As with other complex systems that incorporate a very large number of coupled processes, the temptation is to compare simulations with and without a given process to assess the importance of said process. This approach is of little value because it is the combination of processes and their couplings that determines model sensitivity. Statistical emulators are proving useful for addressing this issue.

Models of a range of complexities, including dynamical system models and heuristic models, are sorely needed to filter forcing estimates derived from complex models. While such models are simple and imperfect representations of the complex system, they are accessible and can be easily tested to identify susceptibilities to various inputs. In conjunction with models of different levels of complexity, they provide additional lines of evidence that might lead to a consistent picture and help constrain ERFaci.

Untangling vs. Understanding Aerosol and Meteorological Drivers

Quantification of ERFaci has often been posed as a problem of untangling aerosol-driven from meteorologically driven changes in ERFaci. Models, particularly those that resolve the relevant processes, allow one to change aerosol and meteorological inputs in idealized settings and are therefore useful for untangling these drivers. From the observational perspective, the problem is far more complex, given the sensitivity of the shallow cloud system to small changes in temperature and humidity profiles. The “untangling” is further complicated by the fact that the system is constantly adjusting to these drivers at a range of spatiotemporal scales, and yet, as noted above, observations typically comprise snapshots of an evolving system with revisit times much longer than some of the adjustment timescales.

An alternate approach is to shift the focus from untangling aerosol from meteorological drivers to understanding the co-variability of these drivers; identifying commonly occupied parameter space could reduce the dimensionality of the problem and could affect the strength and even detectability of ERFaci. Routine large eddy modeling in conjunction with routine observations at supersites might prove particularly useful for assessing such co-variability and its impact on the aerosol–cloud radiative effect, particularly if such efforts are focused in key cloud regimes.