Bad Political Arguments — American Thinker thinks the California(!) election was rigged for Biden

Timothy Bond
10 min readJul 2, 2021

Before I even start discussing this I want to acknowledge that there’s often very little point in digging into people’s elaborate numerical discussions of “evidence for election fraud” — people mostly have their existing beliefs on this stuff and aren’t really moving those beliefs, and those beliefs have very little to do with evidence anyway, and a lot more to do with political allegiance and so forth.

Having said that, I did notice this particular article was apparently getting enough traction to wind up on the front page of Memeorandum at some point this afternoon, which is way, way too much attention for something that’s so on-its-face ridiculous, so I decided to put on the ol’ hazmat suit and go wading into it.

If you didn’t click the link above, the article is in the American Thinker and it’s entitled “Did Biden Really Win California?” and goes on attempt to make the case that there’s statistical evidence of fraudulent results. On a side note, I don’t really know the publication very well — Wikipedia describes it as a “daily online magazine dealing with American politics from a conservative viewpoint”, and helpfully links to other articles with claims about the legitimacy of the election that I probably won’t take the time to discuss today.

Anyway, let’s get into the meat of the argument. The entire piece is based on Benford’s Law, which is a mathematical observation that the lead digit (i.e., first number) of values in a large data set tends to follow a certain distribution, where the most common value is 1 (30.1% of the time) followed by 2 (17.6%), followed by 3 (12.5%), and so on, until 9 is the least common (4.6%). If you want to dig into the math here you can follow the link to Wikipedia, but suffice to say I agree with the author of the piece that Benford’s Law is in fact a valid observation, and people do use it to detect fraud or made-up data — if you make up a bunch of “random numbers” they tend to actually not resemble real-world data and this is certainly one way in which you can test that.

However, there are — shockingly — some issues in their analysis.

First, the author checks the leading digits of total vote tallies, per precinct, for Biden and Trump, and finds that it’s… basically exactly what Benford’s Law would predict. So, that isn’t any evidence of fraud (which they acknowledge), although they nevertheless point out that the Biden totals Biden line is slightly lower in the 2, 3, and 4 digits and slightly higher in the 6, 7, and 8 digits:

From the original article

Pointing this out is basically worthless — you already admitted the variation here was meaningless, so now you’re just pontificating on the random noise.

Then, they do the same comparison with in-person voting results:

Likewise from the original article

Again, no evidence of fraud, basically exactly what you’d expect. And then of course, they check the vote-by-mail results:

Likewise from the original article

And now we have something weird, where the distribution of democratic votes in precincts by mail-in-ballots doesn’t fit as well. This of course goes along with the frequent insistence that heavy voting by mail (during the pandemic) allowed rampant fraud, and the author now feels they’ve basically provided strong evidence for this.

I have several problems with this.

My first problem is I want to reproduce these graphs from the raw data and they don’t link to it at all, they just say it was “released by Dr. Shirley N. Weber, Secretary of State (also release by Alex Padilla, Secretary of State through January 28, 2021)”. I spend a while on the Califoria Secretary of State website and I so far cannot find the raw data above (the best I’ve found so far is county-specific data that doesn’t break out mail-in ballots). So that’s not great. But I will continue under the assumption that the data they present above is accurate, because it actually doesn’t really matter that much.

The second problem is they basically don’t explain anything about a Chi-squared Test and what it indicates, other than that the Democrat vote count numbers only get a value of 87.1%, which is obviously much lower than the value of 99.7%. But it’s important to specify what that percentage actually is, which the author did not do, probably because it makes their argument look extremely stupid on its face.

As this UPenn tutorial indicates, the % value here is “how likely are you going to see results this extreme, if the distribution of the data is due to chance?” In other words, if there’s no voter fraud, and these numbers just reflect actual, real voting totals, and we expect them to follow Benford’s Law, what percentage of the time will they be at least this far from the exact curve we predict? Obviously we should expect a bit of random noise — reality doesn’t perfectly match these sorts of mathematical curves — and this is just one way of saying exactly how far apart the two are. A high number — like the ~100% number given in the first graph — means that there’s absolutely nothing unusual about the real vs. predicted data — (about) 100% of the time, they will be at least that different. The lower the number gets, the more weird it is — if you found data that would only happen by chance 1% of the time, that would seem fairly suspicious.

As with most statistical tests, we tend to draw a line around “5%” and call that “statistically significant”, meaning, since we would only see results this far out due to chance 5% of the time, that’s probably a good point to start seriously thinking that the results are not due to chance (meaning, in this case, that they could very well be due to fraud)

So with that in mind, let’s glance up again and see that the ‘suspicious’ vote-by-mail data for Democrats had a value of… 87.1%. So, 87.1% of the time, purely random data will be at least this weird-looking, even in the absence of fraud. That’s more than six elections out of seven we should expect to see data that differs from the Benford’s Law predictions by at least this much.

That’s utterly meaningless. The statistical test literally just told you that this is not suspicious data. And yet that’s basically the centerpiece of this entire article, which — for reasons passing understanding — apparently lots of people are reading and talking about.

Although, there are more problems.

My third problem is that this is even worse because the author performed three tests — one on the totals, one on the in-person data, and one on the mail-in data. So, we took three bites at the apple and only one of them came up… not even really actually suspicious. The more times you run a statistical test, the more likely you are to get some kind of a false positive on it — although again, this isn’t even a “false positive”, it’s a “false still-basically-pretty-much-negative”.

That’s the same point this XKCD is trying to make — if you test if jellybeans cause acne, and find out they don’t, and then you re-test twenty specific colors of jellybeans, probably one of those tests is eventually going to come up with a number that happens “only 5% of the time by chance”, because you took 20 chances. In this case they took three chances and still only came up with one number that looked, again, basically perfectly normal, by their own not-well-explained admission.

Now, I do have a fourth problem with this, which is that the entire premise of it is incredibly stupid. As the title of the article indicates, the author apparently thinks that Donald Trump might have actually, in reality, won the Presidential Election in California? That’s on-its-face ridiculous. Here’s FiveThirtyEight’s final election forecast — there’s a nice graph halfway down that basically lists every state in order of how red/blue they are, and California is only beaten out by Maryland, Massachusetts, Hawaii, Vermont, and the District of Columbia, with Biden polling at 64.1% compared to Trump’s 34.1%.

The final vote totals, according to the American Thinker article, were 64.9% Biden, 35.1% Trump. So, pretty much exactly what we were expecting.

Why would anybody get involved falsifying vote totals for Biden in California, the apparently-fifth-most-Democrat-leaning state in the nation? (I’m honestly surprised it wasn’t closer to #1, but I guess California and New York have the advantage of being both very liberal and very big and so get more press.) To engage in some kind of elaborate fraud — which, again, the author has helpfully told us there is actually not really any evidence for — would be a huge and totally pointless risk, where if you succeed you definitely won’t accomplish anything because Biden was going to get all of California’s electoral votes anyway, and if you fail and get caught you presumably spend a significant amount of time in prison.

Incidentally, I think the author omitted third-party votes from this, which is why both Trump and Biden have slightly higher numbers than the FiveThirtyEight prediction (and they actually sum to 100%). If you make the equivalent adjustment with the FiveThirtyEight predictions, you get an expected total 34.7% for Trump, 65.3% for Biden, which means Trump actually outperformed the polling — which was also true nationally, and which is yet another reason all of these arguments about a stolen election are completely bogus — all of the polls suggested that Trump was going to lose, and he lost, and he lost by less than the polls suggested, so if you’re going to be suspicious of anything (which, to be clear, you shouldn’t be), your default suspicion should be that people were stuffing Trump votes, not Biden votes. But again, to be 100% clear, I don’t think that happened (the polls are usually off by about 1–2%).

Getting back to the article — the whole thing is just a complete abuse of numbers. It actually keeps going for a while, and while I think I’ve basically made my point, I do feel like including another snippet:

As we saw little or no likely fraud in the In-Person chart, let’s look at the candidates’ precinct VBM tallies to answer questions 3 and 4. The following chart, showing voting data by candidates’ precinct tally size, is from official VBM tally data:

Again, from the original article

I ask the reader, does this chart, for which the official statewide ballot total is nearly 2:1 in favor of Biden, look right to you? The blue, Biden, bars are much higher than 2:1 for the higher candidate precinct totals. What happened here? Did someone, somewhere, stuff the Vote-By-Mail ballot tallies with Biden votes generating the much higher Biden VBM totals?

I mean, c’mon, dude, “does this look right to me”? First of all, that’s just an appeal to intuition — I already know there’s nothing statistically weird about this because you yourself accidentally said so earlier without making it clear that that’s what you were saying.

But also, do you imagine I have a version of this graph in my head that I’m comparing this to? I can barely figure out what this graph says at all! The x-axis is “size of precinct” and the y-axis is “total number of votes for Biden/Trump in precincts of that size”, and the scale of the x-axis changes twice. How could I possibly have some reference idea about this?

But okay, let’s see if I can come up with some intuition as to what you’re telling me is suspicious on this incredibly convoluted graph. Precincts with larger numbers of voters in them voted for Biden at a higher rate than precincts with lower numbers of voters in them (by mail, at least), and this distinction only gets more dramatic as you go to even-larger precincts.

So… precincts with higher numbers of voters in them are presumably those in major cities. Precincts with lower numbers of voters in them are probably rural. Here’s the population density of the United States:

Originally posted on Reddit here

Here’s the by-county election results (for 2016, just in case you’re worried it’s cheating to use the “fraudulent” ones from 2020):

From Vox

So… yeah. Unsurprisingly, as anybody who follows politics at all (and indeed, certainly anyone who has even a remote chance of running across my obscure, very infrequent blog) knows, city voters tend to be more Democrats, country voters tend to be more Republicans. That’s exactly what I would predict. And that’s exactly what happened.

This becomes even more obvious when the author then suggests a complete audit of the election in the precincts where there’s the biggest gaps between Trump and Biden:

Likewise from the original article

Oh, gee, you mean San Francisco and LA and San Diego aren’t hotbeds of Trump support? I am again shocked to learn that. Truly, this is what fraud totally doesn’t look like.

So, to briefly summarize, the American Thinker (truly not a fitting name under the circumstances) is claiming somebody rigged one of the most famously deep-blue states in favor of Biden, even though Trump outperformed the polls there, because one subcategory of the votes shows a completely non-significant statistical difference from what we would expect (so non-significant that we should expect results this extreme 87.1% of the time, or more than six out of seven times), and therefore we should do an in-depth audit to make sure that Trump didn’t actually do way better in such places as Los Angeles and San Francisco.

And for some reason, lots of people are reading and talking about this article. I can only cross my fingers that they, like me, are only talking about how very dumb it is.

--

--