Read more about I Asked Four AI Systems to Evaluate an Origins of Life Paper.
Read more about I Asked Four AI Systems to Evaluate an Origins of Life Paper.
I Asked Four AI Systems to Evaluate an Origins of Life Paper.

free note

THE MASON BRIEF

I Asked Four AI Systems to Evaluate an Origins of Life Paper. Here Is What They Missed.

Dan Mason, Ph.D.

April 2026

There is a standard argument you hear whenever someone challenges the chemical evolution story, the idea that life originated from unguided chemistry on the early Earth. The argument goes something like this: the science is settled, the evidence is overwhelming, and anyone who questions it is either ignorant of the research or pushing a religious agenda.

I decided to test that claim. Not with theology. Not with scripture. With the forensic method.

I took a peer-reviewed paper on the origins of life, ran it through a structured analytical framework I have been developing called the DB-FEP (Design Biology Forensic Evaluation Protocol), and then asked four AI systems, Gemini, Grok, ChatGPT, and Claude, to do the same analysis independently.

What I found tells you something important. Not just about origins science, but about how AI systems think, what they are trained to see, and what they are trained to miss.

The Paper

The target was a 2016 synthesis paper by Vera Kolb, a chemist at the University of Wisconsin-Parkside, titled "Origins of Life: Chemical and Philosophical Approaches," published in the journal Evolutionary Biology.

I want to be clear about something before I go further. Kolb's paper is not bad science. It is not propaganda. It is not dishonest. In fact, one of the things that makes it useful as a test case is precisely that it is careful, well-cited, and intellectually honest. Kolb does not overstate her results. She acknowledges what her model cannot explain.

Including this sentence, which appears on page 507 of the paper:

"How such a system evolved from the primitive metabolism of Oparin is not known."

That sentence is the most important one in the paper. I will come back to it.

What the Paper Claims

The chemical evolution story, as Kolb presents it, runs roughly like this:

Prebiotic chemistry on the early Earth produced amino acids and other organic molecules. These molecules formed structures called coacervates, tiny droplets that can encapsulate chemicals, grow, divide, and host chemical reactions inside themselves. Over time, through early chemical selection, these systems acquired greater complexity. Eventually, the RNA world emerged. From the RNA world, the genetic system developed. From the genetic system, Darwinian natural selection took over. Life began.

Kolb's experimental contribution is real and worth acknowledging. Her laboratory demonstrated that a chemical reaction can occur within a coacervate. That is genuine chemistry. She also updated the original model by replacing polysaccharides with a prebiotically feasible compound, as polysaccharides are not found in meteorites and cannot be produced under simulated prebiotic conditions.

So the coacervate chemistry is real. The reactions are real. The question is whether any of that chemistry gets you to life.

That is where the paper runs into serious trouble.

The Gap That Changes Everything

Here is the problem in plain language.

There are two fundamentally different kinds of processes in nature.

The first kind I call rate-dependent processes. These are processes governed by physical and chemical laws. Given the right ingredients and conditions, the reaction happens necessarily. It is predictable. A chemical reaction inside a coacervate is a rate-dependent process. The chemistry of the reactants determines the product. You cannot change the outcome by rearranging the sequence of inputs, because the sequence does not carry information. The chemistry does all the work.

The second kind I call rate-independent processes. These are processes where the physical constraints of the medium do not determine the outcome. DNA is the clearest example. The bonding chemistry between those bases does not determine the base sequence in a DNA molecule. Any sequence is chemically stable. You can arrange the bases in any order, and the bonds will hold just as well. The information content lives in the sequence, not in the bonds.

This is not a small distinction. It is the difference between a chemical reaction and a language. Languages work because the relationship between a symbol and its meaning is arbitrary, assigned rather than determined by physics. The genetic code works the same way. Each codon (a sequence of three bases) specifies an amino acid, but not because chemistry requires that particular assignment. The correspondence was assigned.

Can rate-dependent chemistry produce rate-independent information?

The chemical evolution program says yes, eventually, given enough time and complexity. Kolb's paper, in the sentence quoted above, says that we do not know how.

That sentence is not a minor admission. It is the central problem of the entire research program, and Kolb is honest enough to state it plainly.

The Coacervate Problem

One formulation I developed in this research series is worth remembering:

Coacervates solve localization. They do not solve instruction.

A coacervate is a container. It creates a boundary between the chemistry inside and the chemistry outside. That is useful. It is a real contribution to understanding how early chemistry might have been organized.

But a cell is not primarily a container. A cell is a factory running on a blueprint. The blueprint is the genetic code. The factory is the ribosome and the translation apparatus. The two are mutually dependent in a way that has no precedent in spontaneous chemistry.

Think about it this way. The ribosome cannot assemble itself without instructions encoded in messenger RNA. Those instructions cannot be translated without a ribosome. The proteins that operate the translation system are themselves products of the translation system. This is a closed loop. You cannot get into it from the outside by adding chemicals to a coacervate.

This is not a gap in our knowledge that more research will eventually fill. It is a structural feature of how living systems work. Kolb names it and moves on. The forensic framework does not let it pass that easily.

The Philosophy Problem

Here is something I found striking about Kolb's paper that reveals more than the author intended.

When the chemistry cannot bridge a gap, the paper imports philosophy.

The transition from non-life to life is explained by Hegel's law of quantity-to-quality. The idea is that if you add enough chemical complexity (quantity), life emerges as a new quality. As an analogy, Kolb notes that three oxygen atoms form ozone, which has different properties from oxygen. Quantity produces quality.

The problem is that ozone is not alive. The analogy does not prove that chemical complexity produces biological function. It proves that chemical combinations produce new chemical properties. Those are different claims.

When the question of whether viruses are alive becomes philosophically awkward, the paper invokes dialetheism, a branch of philosophy that holds that some contradictions can be simultaneously true. When the definitions of life become unstable, the paper turns to Aristotle and Rescher.

I want to be precise here. None of this is dishonest. These are real philosophical frameworks applied with genuine care. But their location in the argument is diagnostic. Every time a philosophical framework appears, it fills a space that chemistry cannot. That is not a rhetorical point. It is a structural observation about how the argument works.

When a scientific paper needs that much philosophy to stay coherent, the science has run out of road.

What the AI Systems Said

I asked Gemini, Grok, ChatGPT, and Claude to evaluate Kolb's paper using the same DB-FEP framework I applied. I compared their analyses against mine.

Gemini (first round) called the paper "Reliable" and classified it as a "Reformable system." It essentially summarized the paper's own logic and called that analysis. It never asked whether the framework the paper operates within is the problem.

Grok did better. It correctly identified the Hegelian transition claim as untestable. It was noticed that philosophical complexity increases when empirical progress stalls. But every time it reached the edge of a critical finding, it stepped back. It concluded: "solid, non-controversial scholarly work." It also dismissed the institutional funding analysis as irrelevant.

ChatGPT was the strongest of the three external systems. It applied layer-by-layer analysis, correctly identified mechanism failure and causal insufficiency, named several specific failure patterns, and concluded that the paper "explains how chemistry behaves, not how life begins." That is a real finding, and it is correct.

But even ChatGPT missed three things consistently. It did not develop the rate-dependent/rate-independent argument to full mechanistic depth. It did not audit Kolb’s self-citation pattern, which creates a circular evidential structure. And most importantly:

Then there is the finding I did not expect. One of the four evaluations was Claude reviewing its own prior DB-FEP analysis of the same paper. That self-evaluation produced three valid corrections that improved the working paper. But it also rejected the institutional analysis, arguing that the NASA funding connections were an overreach. In other words, Claude pushed back against its own earlier finding at precisely the institutional layer where every other AI also resisted. Even in self-evaluation mode, the paradigm boundary held. That is not a software glitch. That is a training signature.

No AI independently evaluated the cause-class question.

The Question Nobody Asked

This is the core finding of my research series.

Every AI system I tested evaluated the paper from inside the paper's own framework. They asked whether the chemistry was real. It is. They asked whether the philosophy was coherent. Mostly. They asked whether the paper acknowledged its gaps. Yes. And on that basis, they concluded: solid scholarship.

What none of them asked is this: Are natural, unguided mechanisms the only cause class that deserves evaluation here?

This is a question I require every origins science analysis to answer. The DB-FEP framework identifies three cause classes for any origins claim:

1.Natural unguided mechanisms

2.Intelligent physical causes (including directed panspermia, the idea that life's informational complexity was seeded from an external source)

3.Supernatural agency

Kolb's paper, like virtually all chemical evolution literature, evaluates only the first class. It does not argue against the other two. It does not falsify them. It simply excludes them by virtue of the framework's assumptions and never discloses that this is what it is doing.

The AI systems I tested did the same thing. They were trained on literature that treats natural, unguided mechanisms as the only legitimate explanatory category. So they reproduced that assumption as if it were a neutral background.

It is not neutral. It is a philosophical commitment. And a forensic evaluator who does not identify philosophical commitments as philosophical commitments has not done forensic analysis. They have done theology of a different kind.

The Design Biology Inference

Let me be precise about what Design Biology claims, because it is routinely misrepresented.

Design Biology does not argue "therefore God." It argues that the evidence better fits intelligent causation as a cause class, and here is the mechanistic reason why.

In every observed case in human experience where a system exhibits symbolic coding with arbitrary assignment between symbol and referent, implemented by an independent decoding apparatus, the cause is an intelligent agent. Programming languages. Written language. Morse code. Signal flags. Everyone.

The genetic code meets this criterion exactly. The codon-to-amino-acid correspondence is physically arbitrary. Any codon could specify any amino acid. The correspondence is assigned, not determined by chemistry. And the decoding apparatus (the ribosome and transfer RNA system) is independent of the coding system.

This is not a theological claim. It is a pattern-recognition claim grounded in uniform, repeated experience. It is the same inferential logic a forensic investigator uses at a crime scene. You do not need to know who the murderer is to conclude from the evidence that a murder occurred.

Design Biology says: the evidence of the genetic code looks like the product of intelligence. The chemical evolution program says, " We believe unguided chemistry produced it, though we do not know how. Both are live explanatory options. Only one is currently evaluated in mainstream origins science literature.

Why This Matters

If you are a student, a researcher, or an educated reader using AI systems to help you understand origins science literature, you need to know that those AI systems will consistently tell you the literature is reliable, the scholarship is solid. The gaps are merely matters for future research.

They will not tell you that the framework itself excludes competing explanations by assumption. They are not trained to see that.

The DB-FEP framework exists precisely to surface what AI systems miss: the rate-dependent/rate-independent distinction, the cause-class matrix, the institutional funding structures that reinforce paradigm commitment, and the pattern of increasing philosophical complexity as a diagnostic signal of empirical stagnation.

None of these is a theological argument. They are forensic arguments. They are the kinds of questions a death investigator asks, a detective asks, a lawyer asks. Who are all the possible causes? Has each one been evaluated on equal terms? What does the evidence actually show?

When you apply that standard to chemical evolution literature, the picture looks different from what the AI systems told you.

The Bottom Line

Kolb's paper tells you the chemistry of coacervates. It is real chemistry, and it is interesting.

It does not tell you how unguided chemistry produces a genetic code. It says so, plainly, on page 507.

The AI systems I tested evaluated the paper within their own frameworks and called it solid. They were right about what they evaluated. They evaluated the wrong thing.

The right question is not whether the chemistry is real. It is whether rate-dependent chemistry can produce rate-independent symbolic information. After seventy years of dedicated research, the answer remains: we do not know how.

That is not a crisis. It is an honest accounting of where the evidence stands. And honest accounting is where every good investigation begins.

Dan Mason, Ph.D., is an independent scholar and adjunct professor. He publishes academic work under the name Charles Mason, Ph.D., on ResearchGate and public-facing analysis through The Mason Brief on Substack. The full working paper underlying this article (WP-2026-DB-FEP-003) is available on ResearchGate.

Copyright 2026 Dan Mason, Ph.D.

You can publish here, too - it's easy and free.