2026-06-02

Bayesian vs. Frequentist MMM Is a False Dichotomy

The Bayesian-vs-frequentist MMM debate has gotten religious, and it mostly misses the point. Both methods solve the same structural equation, both regularize, and neither can identify effects from observational data alone. Here's what actually changes — and the three questions to ask any MMM vendor.

I built this whole piece because of one statement that really pissed me off.

I'd been watching the online debate about Bayesian versus frequentist MMM, and to be fair, in statistics this stuff can get incredibly religious. In the MMM world specifically there are some very strong frequentist critiques, and one of them was that there's "no uncertainty" in an MMM. That was the line that made me sit down and do the whole lecture, the simulations, the entire research. Because it's wrong. But it's wrong in an interesting way, and untangling why it's wrong tells you almost everything you need to know about measuring marketing.

So let me share the conclusion up front: it's a false dichotomy. We're both trying to solve the same structural equation. We're both using regularization — we just package it differently. We're both facing the same data constraints. And neither can reliably identify specific effects from observational data alone.

The real question was never "Bayesian or frequentist?" The real question is: how much of this is data, and how much is assumptions?

We're all solving the same equation

If you're not deep in the MMM world, here's the setup. We have Y — our outcome, so sales, revenue, number of orders. Then we have a collection of estimates on channel effectiveness, and a collection of estimates on controls and seasonality. That's the structural equation. Everyone's solving it.

On the Bayesian side — which is most of the industry now, PyMC-Marketing and Google's Meridian are both fully Bayesian — you set up priors on each of these variables and you sample the posterior. On the frequentist side, Meta is the holdout in open source: Robyn is basically Ridge regression. And here's the thing people forget: it can be mathematically proven that a Ridge regression is equivalent to a Bayesian regression when the Ridge penalty (lambda) corresponds to a specific prior, namely a Gaussian prior.

Think about what Ridge actually does. It penalizes coefficients so that if channels are multicollinear, it cuts those coefficients back toward zero. On the Bayesian side you're doing exactly the same thing, just in a different way. In my simulations — 150 observations, 5 channels — if you set the Ridge lambda and the prior to be equivalent, you get exactly the same estimates. Identical.

So when the frequentist world says "Bayesians are cheating via priors," my answer is: you can get mathematical equivalence between both methods. At the core, we're both just trying to regularize our parameters.

They have assumptions. Just different ones.

This is my most common annoyance with people running frequentist approaches: the claim of "no assumptions." That's wrong. They have assumptions. They just have different assumptions.

The critique of priors is really a critique of regularization — the idea that by setting priors, we're pushing the model in a specific direction. And yes, that's true. That's the point. We have a lot of bias in our data, and we know it for a fact. Most marketing budgets are endogenous: there's a circular relationship between budget and sales, and most channels move in the same direction anyway. That's structural. So I need to trade bias for variance. That trade-off is exactly what justifies regularization — i.e., the use of priors.

Robyn does the same thing. That's what the Ridge penalty is for — to regularize the data so we reduce bias and get more accurate estimates. The only real difference is that we Bayesians are a little more upfront about our assumptions, because we have to actually express them. The drawback is that it's a little easier for us to cheat if we really want to.

Which brings me to the critiques that do hold up, because I'm not here to pretend the Bayesian side is clean.

The critiques that have a kernel of truth

Half-normal priors can't reach zero. Most channel priors use a half-normal, which forces coefficients positive — intuitively, marketing either has zero effect or a positive effect, never negative. Fine. But the half-normal has a problem that's even admitted in Google Meridian's own documentation: you can never really get an estimate of zero. In my simulation the channel's true effect was zero. The confidence intervals covered zero — good — but the model's point estimates were not zero. This means you can push weak effects upward slightly and get a sense of channel effectiveness that isn't really there. This is exactly why you always need to experiment.

Priors can dominate. There's truth to the "priors overwhelm the data" critique. Even Jin et al — one of the seminal Google papers behind Bayesian MMM — admit priors have a big impact on the posterior. And MMM is a small-data problem and a highly-correlated-data problem. Do the math: 8 channels × 4 parameters each (ad stock, saturation, beta, standard deviation), plus intercept, trend, seasonality, controls, noise. In my example that's 150 observations divided by 46 parameters — 3.3 observations per parameter. That's why geo data helps so much; you're adding observations and variability per parameter. But it does mean you should be careful. If Bayesians want to cheat, an overly tight prior is how they do it. My rule: don't set overly tight priors. If the prior has enough wiggle room in its standard deviation, the data will move it.

Attribution data as a prior is circular. This is the one to really watch. The whole point of the MMM is to contrast and potentially contradict last-click attribution and platform ROAS — to be an independent source of estimation. So if you use last-click attribution or platform ROAS as a tight prior, your MMM doesn't give you truth, it launders your attribution data. You've built a very fancy model that justifies your pre-existing numbers. In one contrived example I set a tight prior around platform ROAS and, of course, the model spat back 4.8 — incredibly tight to the prior. You just made a mirror, not a model.

What you should calibrate from instead: geo experiments, lift experiments, regular A/B testing. Causal experiments with enough power that you get tight confidence intervals. And even then, be careful not to make the resulting priors overly tight.

The bane of our existence

So if methodology isn't the binding constraint, what is? Data quality. And specifically: multicollinearity is the bane of our existence.

It doesn't matter whether you're Bayesian or frequentist. Multicollinearity gets you either way — it just manifests differently. Take a contrived case where TV and digital are highly correlated week over week (uncommon for TV maybe, but very common between digital channels — paid search and Meta and affiliates moving together). Both methods react: you can't separate the channel effects cleanly. In the Bayesian world your posterior gets incredibly wide. In the OLS world your confidence intervals blow up and your standard errors inflate. Same problem, different part of the machine. The end result is identical: you can't tell the channels apart.

And there's a related issue both worlds acknowledge: non-identifiability. That's fancy causal-inference wording for "some parameters we can't really estimate, because different combinations produce identical model outputs." In MMM the big one is the interplay of ad stock, saturation, and the beta coefficient. The Google papers explicitly warn about this. Meta handles it in Robyn by generating hundreds of models and looking at how spread out the recommendations are — though notice that introduces its own assumption, namely that the more you spend on a channel, the more it should account for.

The honest takeaway: as modelers, we need to be aware of where our assumptions live, and where we're treating model output as truth when we actually can't identify it. There's no statistical technique — MMM, ML, AI, doesn't matter — that can separate effects if the data can't tell them apart. Sometimes you need strategic spend variation, channel pauses, on/off testing. I know it's hard to convince a marketing manager to pause a channel. Do it anyway.

Prior vs. posterior is THE diagnostic

If you're in the Bayesian world, there's one diagnostic that matters above everything else: prior versus posterior.

You can get it from any Bayesian framework. If your prior and your posterior are overlapping, the model is just echoing your assumptions (or echoing your experiments). If your posterior has moved away from the prior, the data is speaking. On TV data you might see a clear gap between prior and posterior — good, the data is informative. On out-of-home you might see the prior dominating — a warning sign.

This is why setting priors needs to be explicit. In Meridian the priors are documented and fairly wide — the ROAS prior is lognormal running from roughly 0.2 to 9, so you've got 95% probability between something like 1.5× and 7×. You're not constraining the model much; the only real constraint is "can't go negative." Pair this with a prior sensitivity analysis: a well-identified channel converges regardless of the prior; a poorly-identified one is highly sensitive to it. Google recommends exactly this — change the prior massively and watch what happens to the channel's posterior. And remember, marketing effectiveness shouldn't jump all over the place across a quarter unless the world or your strategy fundamentally changed. Stability is a signal.

The three questions to ask any MMM vendor

Most of us are working with MMM vendors, and building a proper MMM is hard. So here are the three questions — for any vendor, or for your internal team:

Where did your priors (or your regularization parameters) come from? If the answer is in-platform data or attribution, that's a red flag.
How tight are your priors, and how much did the posterior move? If they're really tight and the posterior barely budged, you have a problem.
What experiments validated the channel estimates — and if none yet, what's your test plan? A lot of vendors implement the model and then forget to hand the client a test plan, even though some coefficient conclusions rest entirely on assumptions.

Show your work

Here's where I land. The person who picks "the right method" matters far less than the person who is honest about their assumptions and shows the diagnostics — even a translated version for the non-statisticians in the room. Robyn is a valid, battle-tested model; if you choose it, fine, as long as we're honest about the assumptions and the outcomes.

Because we all have assumptions. Some are valid, some aren't. We sometimes have to make decisions without all the data under our belt, so assumptions are necessary. The job isn't to pretend they don't exist — it's to be explicit about them, and to keep asking the only question that actually matters: how much of this outcome is driven by the data versus our assumptions?

We're statisticians, not magicians. The model can't conjure causal identification out of correlated observational data. But an honest model, with wide-enough priors and a real test plan behind it, beats a confident mirror every single time.

This is the kind of thing I go deep on in my marketing-science course and consulting work — how to actually read MMM diagnostics, design the experiments that calibrate them, and tell the difference between a model and a mirror. If that's useful to you, come find me at marketingscience.dev.