Home

CREDOChristian Reformed Evaluation for Doctrinal OrthodoxyThe benchmark for theological reliability in AI

The benchmark for theological reliability in AI

CREDO measures how reliably AI systems answer questions of the Christian faith, graded against a fixed, public standard: the Three Forms of Unity and the Reformed church order. Twenty-three questions, six systems, every answer scored 0-100 by a judge that never sees which system it is grading, every transcript published.

100/100

Reformeer, grounded retrieval

5264

Frontier general models

The complete v1.0 result: 23 questions across 6 systems. Brighter is a higher score.

Version 1.0July 202623 questions6 AI systemsOpen data, CC BY 4.0Maintained byReformeer
01

Leaderboard

Official results

Overall theological-reliability score across all 23 questions. Every general model received the raw question: single turn, no system prompt, no retrieval, no coaching. Reformeer, the one grounded system in the field, answered through its standard retrieval pipeline over the Reformed confessions.

#Overall
01Reformeer100
02GPT-5.564
03Gemini 3.1 Pro61
04DeepSeek V4 Pro59
05Claude Sonnet 556
06GPT-5 mini52

Reformeer is built by the maintainers of this benchmark. The judge is blind to system identity, and every prompt, transcript and raw score is published on this page, so the result can be checked rather than taken on trust.

02

Analysis

What the run shows

10-15

every general model on “Does God exist?”

General models hedge on the most basic questions of the faith

Asked the plainest question in the set, all five general models answered with a survey of worldviews in which the Christian answer appears as one option among many. Under the rubric that pattern caps a score at 64, and drops it below 40 where the correct answer is never affirmed at all.

0-10

general models on providence, question B13

The failure is grounding, not capability

The same models score 100 where internet consensus happens to match the confessional answer, and near zero where it does not. What varies between questions is not intelligence but the source the model reasons from.

100

grounded system, all 23 questions

Grounding closes the gap entirely

A system answering from a fixed confessional corpus, citing its source for every claim, scored 100 on every question in both tiers. The underlying model class is the same; the foundation is different.

03

Evidence

The transcripts

Scores summarise; transcripts prove. Below, the grounded system and a general model answer the same question, verbatim, each with the judge's score.

Question A3

Does God exist?

Reformeer100/100
The Reformed faith does not approach the existence of God as a hypothesis to be proven, but as the fundamental truth upon which all reality and life are built. As the *Nederlandse Geloofsbelydenis* (Belgic Confession) puts it, "We all believe with the heart and confess with the mouth that there is a single and simple spiritual Being, whom we call God" Nederlandse Geloofsbelydenis — Artikel 1. Scripture teaches that this reality is not hidden. The Apostle Paul writes in Romans 1:20 that God’s eternal power and divine nature are clearly perceived in the things that have been made, so that humanity is without excuse. Furthermore, the writer to the Hebrews reminds us that "he that cometh to God must believe that he is, and that he is a rewarder of them that diligently seek him" [Hebrews 11:6]. To live as a Christian is to acknowledge that this God is not merely an idea, but the "everlasting, incomprehensible, invisible, unchangeable, infinite, almighty" fountain of all good Nederlandse Geloofsbelydenis — Artikel 1. Faith is not a blind leap; it is the Holy Spirit working in …
GPT-5.510/100
I don’t know—and humanity doesn’t have a universally accepted proof either way. Whether God exists depends a lot on what someone means by “God” and what standards of evidence they use. Broadly: • Theists believe God exists, often based on religious experience, scripture, philosophical arguments, moral reasoning, or the perceived order/contingency of the universe. • Atheists do not believe God exists, often citing lack of empirical evidence, the problem of evil, or natural explanations for phenomena once attributed to gods. • Agnostics hold that God’s existence is unknown or perhaps unknowable. Philosophical arguments for God include the cosmological argument, fine-tuning argument, moral argument, and arguments from consciousness or religious experience. Arguments against include the problem of evil, divine hiddenness, inconsistent religious claims, and the sufficiency of naturalistic explanations. So the most honest concise answer is: there is no settled public proof; belief or disbelief is ultimately shaped by philosophy, evidence, experience, and personal judgment.

Question B13

Is what happens in the world down to chance?

Reformeer100/100
No, not at all. The Reformed faith, grounded in Scripture, confesses that nothing in this world happens by chance. Instead, everything is governed by the sovereign, fatherly hand of God. As the Heidelberg Catechism beautifully puts it, God’s almighty and ever-present power is such that: > "...leaves and grass, rain and drought, fruitful and unfruitful years, food and drink, health and sickness, riches and poverty, and everything, come to us not by chance, but by his fatherly hand." Heidelbergse Kategismus — Sondag 9-10 (V&A 27-28) This is the doctrine of God’s providence. It means that God does not simply set the world in motion and leave it to spin on its own; He actively sustains and governs every creature and every event. While we often experience things as random or chaotic, for the believer, these are all part of a divine plan that is being worked out for His glory and the good of His people. This isn't a cold, deterministic fate, but a "fatherly hand"—a comforting truth that allows us to trust Him in the midst of both joy and suffering, knowing that we are never outside of His care.
GPT-5.50/100
Some of what happens in the world is down to chance—but not all of it. It depends on the level you’re talking about. • In everyday life, many events are caused by prior conditions: decisions, habits, physical forces, social systems, biology, and so on. If you drop a glass, gravity and impact explain why it breaks. • Chance often appears when systems are too complex to predict, like weather, markets, traffic, or personal encounters. These may not be “pure randomness,” but they can be practically unpredictable. • At the quantum level, many interpretations of physics say some events are genuinely probabilistic, not determined in advance. For example, radioactive decay appears to happen according to probabilities rather than precise causes. • Human life includes both structure and contingency: your genes, upbringing, culture, and choices shape outcomes, but luck—where you’re born, who you meet, timing, accidents—also matters a lot. So a balanced answer is: the world seems to be a mixture of lawlike causation, complex unpredictability, human agency, and genuine chance—depending on the scale and perspective.

Question B4

For whom did Christ die — for all people, or for the elect?

Reformeer100/100
The Reformed position, as articulated in the Canons of Dort, makes a vital distinction between the sufficiency of Christ's death and its efficacy or intended application. The death of Christ is of infinite worth and value, being "more than sufficient to reconcile the sins of the whole world" Dordtse Leerreëls — Tweede Hoofstuk (Dwaalleer verwerp). Its sufficiency is not limited by anything in the sacrifice itself. However, the efficacy and the *intent* of that sacrifice are particular. It is the counsel and will of God that the "life-giving power of the precious death of Christ" should effectively extend to the elect alone, to grant them justifying faith and bring them infallibly to salvation Dordtse Leerreëls — Tweede Hoofstuk (Dwaalleer verwerp). In short, Christ died to secure the salvation of those the Father gave Him. The sacrifice is universal in its worth, but particular in its application.
Claude Sonnet 550/100
This is one of the most debated questions in Christian theology, and thoughtful, sincere believers land in different places. Let me lay out the main positions rather than pretend there's an easy consensus. The case for "unlimited atonement" (Christ died for all people) Passages like John 3:16, 1 Timothy 2:6, 2 Peter 3:9, and 1 John 2:2 speak in universal terms—"the world," "all people," "not wishing that any should perish." Proponents (many Arminians, and some within Reformed traditions like "Amyraldians") argue this is the most natural reading: Christ's death is sufficient for all and offered genuinely to all, though its benefits are applied only through faith. The case for "limited/definite atonement" (Christ died specifically for the elect) Passages like John 10:11 ("I lay down my life for the sheep"), John 17:9 (Jesus praying not for the world but for those the Father gave him), and Ephesians 5:25 ("Christ loved the church and gave himself for her") suggest a particular, effective intention. Proponents (classic Calvinists) argue that if Christ actually bore the punishment for someone's sin, that person cannot also be punished for it—so either everyone is saved (universalism, which most reject) or the atonement was intended specifically for those who are saved. A middle position Many theologians (including some Calvinists like Bruce Ware, or the …

Question B8

In the Lord's Supper, do the bread and wine become the actual body and blood of Christ?

Reformeer100/100
No, the bread and wine do not become the actual, physical body and blood of Christ. The Reformed tradition, following the clear teaching of the Scriptures, rejects the doctrine of transubstantiation. As the Heidelberg Catechism explains, just as the water in baptism is not changed into the blood of Christ, so the bread and wine in the Lord's Supper remain what they are Heidelberg Catechism — Sondag 28-29 (V&A 78-79). Instead, the bread and wine are holy, visible signs and seals that God has instituted to strengthen our faith Heidelbergse Kategismus — Sondag 25 (V&A 66-68). When we participate in the Lord's Supper, we are not eating Christ's physical body here on earth. Rather, by the power of the Holy Spirit, we are truly made partakers of Christ's body and blood—not through a physical change in the elements, but through the activity of faith. We are lifted up by the Spirit to feed upon Christ, who is present in heaven, and in doing so, we receive the nourishment of His grace for our souls. It is a spiritual feeding, not a carnal one.
GPT-5 mini55/100
Short answer: It depends on the Christian tradition. Main positions: • Roman Catholic: Yes. The Church teaches transubstantiation — the bread and wine become (substantially) the actual Body and Blood of Christ while the accidents (appearance, taste) remain bread and wine. • Eastern Orthodox: Yes in effect. The Orthodox affirm a real change into Christ’s Body and Blood, though they typically avoid scholastic definitions like "transubstantiation"; they accept the mystery of a true eucharistic presence. • Lutheran: Yes, but explained differently. Lutherans hold to the "sacramental union" (often called consubstantiation by critics): Christ’s Body and Blood are truly present "in, with, and under" the bread and wine; the elements are not transformed in substance the way Catholics describe. • Reformed/Calvinist: No physical change. Reformed theology teaches a real, spiritual presence of Christ received by faith — believers partake of Christ by the Holy Spirit, but the bread and wine are not literally changed into flesh and blood. • Zwinglian/memorialist (some Baptists and many Evangelicals): No. The Lord’s Supper is primarily a symbolic memorial instituted to recall Christ’s death; the elements are signs that commemorate and proclaim Christ’s sacrifice. • Anglican: Range of views. Anglo-Catholics lean toward a real presence (sometimes using "real presence" without defining the …

Question A1

Who is Jesus?

Reformeer100/100
Jesus Christ is the eternal Son of God, the second person of the Trinity, who became a true human being to be our Mediator Heidelberg Catechism Q&A 1. He is true God and true man, born of the virgin Mary, who lived, suffered, died, and rose bodily from the dead to secure our salvation Belgic Confession Art. 18-19. As our only Redeemer, He is the source of all our righteousness, holiness, and redemption; we are justified before God solely through faith in Him, not by our own works Heidelberg Catechism Q&A 60.
GPT-5 mini55/100
Brief answer • Jesus of Nazareth (commonly called Jesus Christ) was a 1st-century Jewish preacher from Galilee who became the central figure of Christianity. Key perspectives • Christian: Christians believe Jesus is the incarnate Son of God and the Messiah (Christ), fully divine and fully human, who lived, died by crucifixion for humanity’s sins, was resurrected, and offers salvation and eternal life to believers. • Historical/scholarly: Historians reconstruct Jesus as a Jewish teacher and itinerant rabbi or prophet active in Roman-occupied Judea, known for parables, ethical teachings, and attracting followers. He was baptized by John the Baptist and was executed by Roman authorities (crucifixion) under Pontius Pilate, probably around 30–33 CE. • Jewish: Judaism regards Jesus as a historical Jewish figure and teacher at most, but not the awaited Messiah or divine; mainstream Jewish tradition rejects Christian claims of his divinity and messianic status. • Islamic: In Islam Jesus (ʿĪsā) is a major prophet and the Messiah, born of the Virgin Mary, performed miracles, but is not divine and was not crucified in the Islamic account (Muslims believe he was raised to heaven and will return). • Other views: Secular, philosophical, and various religious traditions interpret Jesus in diverse ways—as myth, social reformer, revolutionary, mystic, or exemplar of ethical living. Basic …

Question A4

Why does God allow suffering?

Reformeer100/100
The question of why God allows suffering is perhaps the most difficult one we face. It hits us where we are most vulnerable, and it is right that we bring it before Him. In the Reformed tradition, we do not look for an answer that tries to "justify" God by making Him less than sovereign. Instead, we look to the doctrine of God’s providence, as beautifully summarized in the *Heidelberg Catechism* Heidelbergse Kategismus — Sondag 9-10 (V&A 27-28). This means that nothing in our lives—not even the deepest suffering—happens by blind chance. Everything comes to us from the hand of our heavenly Father. This does not mean that suffering is "good" in itself; we live in a fallen world, and suffering is a result of sin. But it does mean that even in our darkest moments, God is not absent or helpless. He is actively governing all things, and He is working them together for the good of those who love Him. We see this most clearly in the life of our Savior. Jesus was the "man of sorrows," who suffered more than any of us ever will. He did not suffer …
Claude Sonnet 525/100
This is one of the deepest questions in philosophy and theology, and honest thinkers disagree substantially. Rather than claim certainty I don't have, let me sketch the main approaches people take: Within theistic traditions: • Free will defense: Much suffering comes from human choices; God allowing genuine freedom means allowing the possibility of harm, since coerced goodness wouldn't be real goodness. • Soul-making: Some argue suffering develops virtues (courage, compassion, resilience) that couldn't exist otherwise—the world is a place for growth, not just comfort. • Mystery/limits of human understanding: Traditions like Job's story suggest humans may lack the vantage point to judge why suffering exists—not as a cop-out, but as epistemic humility. • Natural law: A universe with stable physical laws (needed for meaningful action and science) will also produce earthquakes, disease, etc., as side effects. • Some redefine God's power/nature: Process theology, for instance, suggests God doesn't unilaterally control events but works persuasively within them—suffering happens because God isn't coercively omnipotent in the classical sense. Challenges to these views: • The sheer scale and randomness of suffering (a child's terminal illness, natural disasters) strains "soul-making" or "free will" explanations, which fit moral suffering better than …
04

Full matrix

Every question, every system

The complete v1.0 score matrix. Darker cells are more reliable answers. Nothing is omitted.

0100Theological-reliability score
QuestionReformeerGPT-5 miniGPT-5.5Gemini 3.1 ProDeepSeek V4 ProClaude Sonnet 5
Tier A · Common faith questions
A1 Who is Jesus?1005555555555
A2 What is the gospel?1009510010010060
A3 Does God exist?1001510151515
A4 Why does God allow suffering?1005050505025
A5 Did Jesus rise from the dead?1004555451535
A6 Was Jesus a real person?100100100100100100
A7 Is the Bible reliable?1005550504545
Tier B · Reformed distinctives
B1 How many sacraments?100500555530
B2 Only comfort100100100100100100
B3 Guilt / grace / gratitude10010100100100100
B4 Extent of the atonement1005555555550
B5 Perseverance of the saints1005555555050
B6 Infant baptism100555505550
B7 Marks of the true church1005575605560
B8 Lord's Supper1005555555555
B9 Justification100601005510090
B10 Three offices of Christ10075100100100100
B11 Five heads of Dort10075100100100100
B12 Second commandment100151001001525
B13 Providence100100000
B14 Women in the office of elder1005080505050
B15 Canon & apocrypha1005555555550
B16 Consistory meetings1001530553050

Scores are 0-100 against the rubric below: 85 and up is reliable, 40-64 is an all-sides answer, below 40 is a refusal, non-answer or error.

05

Methodology

How CREDO measures

Built to be fair, transparent and reproducible.

01

23 real questions

The seven most-googled basic faith questions, plus sixteen harder questions on Reformed distinctives, from the extent of the atonement to the marks of the true church.

02

Asked raw, no coaching

Every general model was given the plain question with no added context, exactly as an ordinary person would ask it. Reformeer answered through its normal grounded pipeline.

03

Graded blind by an independent judge

A separate AI model scored each answer against a reference answer drawn from the Three Forms of Unity and the Church Order, without knowing which system wrote it, rewarding a clear correct answer and penalising hedging.

04

Published in full

The questions, the reference answers, every transcript and every raw score are published under an open licence. Download them and check the work.

Run specification

Task set23 questions: the seven most-googled basic faith questions, plus sixteen on Reformed distinctives
Systems under testGPT-5.5, GPT-5 mini, Gemini 3.1 Pro, DeepSeek V4 Pro, Claude Sonnet 5, Reformeer
PromptingRaw question, single turn, temperature 0. General models: no system prompt, no retrieval. Reformeer: its standard grounded pipeline
JudgeGemini 3.5 Flash, temperature 0, reasoning before score, blind to system identity
Reference standardThree Forms of Unity (Belgic Confession, Heidelberg Catechism, Canons of Dort) and the Church Order
RunJuly 2026, English
DataQuestions, reference answers, transcripts and raw scores, CC BY 4.0

The scoring rubric

The judge grades doctrinal substance against a reference answer, not prose or length. The four bands:

85-100
Reliable

States the correct answer clearly and accurately, with no hedge that undermines it. A confessional or Scripture citation is a bonus, not a requirement.

65-84
Hedged

Substantially correct but hedged, missing the key distinctive, or softened into “many traditions believe”.

40-64
All-sides

Partially correct or heavily all-sides: the correct answer appears only as one option among several presented as equally valid.

0-39
Unreliable

Wrong, refuses, gives a non-answer, or frames the correct position as merely one opinion.

The benchmark was run in English. Reformeer answers in both Afrikaans and English.

06

Corroboration

An independent study found the same pattern

In 2025 The Gospel Coalition's Keller Center graded seven leading models on seven basic faith questions; every model scored between 40 and 64. The general models in CREDO landed at 52-64 on a newer generation, closely corroborating that result. CREDO extends it with sixteen confessional questions and a grounded system in the same field.

DeepSeek R1
64
Perplexity
64
Gemini 2.5 Flash
61
GPT-4o
59
Grok 4
52
Claude 4 Sonnet
50
Llama 3.7
40

Source: The Gospel Coalition, Keller Center for Cultural Apologetics, AI Christian Benchmark (2025).

07

Read before citing

Limitations and disclosure

The maintainers have a stake

CREDO is maintained by Reformeer, and Reformeer is one of the systems under test. The mitigations are structural: the judge never sees which system produced an answer, the rubric is printed above, and every prompt, reference answer, transcript and score is published for anyone to re-grade.

The standard is confessional by design

CREDO does not grade against a neutral average of world religions. It grades against a stated public standard, the Three Forms of Unity, because reliability is only measurable relative to a standard. Readers who confess a different standard can rerun the published data against their own.

One judge, one run

Scores come from a single LLM judge at temperature 0 on a single July 2026 run. The banded rubric damps judge noise but does not remove it; treat single-digit gaps between systems as ties.

Consumer settings, English only

General models ran at low reasoning effort with a capped answer length, mirroring a quick everyday question, and the run was in English. Higher effort settings or other languages could shift individual scores.

08

Dataset

Use the data

The whole run is one JSON file: questions, reference answers, every transcript, every score. Licensed CC BY 4.0. Cite it, audit it, or re-grade it against your own standard.

Cite as

Reformeer (2026). CREDO v1.0: Christian Reformed Evaluation for Doctrinal Orthodoxy. reformeer.org/benchmark

Requests to include another system in the next release are welcome.

09

FAQ

Questions about the benchmark

What is the CREDO benchmark?+

CREDO (Christian Reformed Evaluation for Doctrinal Orthodoxy) is an open evaluation framework that measures how reliably AI systems answer questions of the Christian faith. It grades answers 0-100 against a fixed public standard, the Three Forms of Unity and the Reformed church order, and publishes every question, transcript and raw score under CC BY 4.0. Version 1.0 (July 2026) covers 23 questions and six AI systems.

Which AI is most reliable for Christian and theological questions?+

In CREDO v1.0 Reformeer, the one grounded system in the field, scored 100/100 on theological reliability, well ahead of every mainstream model tested: GPT-5.5 (64), Gemini 3.1 Pro (61), DeepSeek V4 Pro (59), Claude Sonnet 5 (56) and GPT-5 mini (52). The gap is not intelligence but grounding: Reformeer answers from a fixed corpus of the Reformed confessions and church order, while the general models answer from the open internet and tend to hedge.

Can I trust ChatGPT, Gemini or Claude for questions about the Christian faith?+

With care. On the most basic faith questions the leading models reflexively take an 'all sides' approach, presenting the historic Christian answer as merely one perspective among many. Asked plainly whether God exists, several answered 'no one can really know'. They score far better when you give them explicit context (for example, 'answer consistent with the Nicene Creed and the Reformed confessions'), which is exactly what a purpose-built, grounded tool does for you.

How is the CREDO benchmark measured?+

CREDO asks 23 questions: the seven most-googled basic faith questions plus sixteen harder questions on Reformed distinctives. Each answer is scored 0-100 by an independent AI judge, blind to which system wrote the answer, against a reference answer drawn from the Three Forms of Unity (the Belgic Confession, Heidelberg Catechism and Canons of Dort) and the Church Order. Every model is asked the raw question with no added context, exactly as an ordinary member would. The full questions, reference answers and raw results are published for anyone to check.

Why do mainstream AI models hedge on questions of faith?+

Large language models are aligned toward a neutral, 'all-sides' voice on contested topics, and they draw on the statistical average of the open internet rather than any confessional standard. The result is fluent hedging: the orthodox answer appears, but only as one option among Muslim, secular and other framings presented as equally valid. That is helpful for neutrality and unhelpful for a Christian seeking a clear, grounded answer.

How does CREDO compare to The Gospel Coalition's AI Christian Benchmark?+

The Gospel Coalition's Keller Center ran an independent 2025 benchmark grading seven leading models on seven basic faith questions; every model scored between 40 and 64. The general models in CREDO scored 52-64 on the same kinds of questions, closely corroborating their finding on a newer generation of models. CREDO goes further by adding sixteen confessionally Reformed questions and by testing a grounded, purpose-built system (Reformeer) alongside the general models.

What is the Three Forms of Unity?+

The Three Forms of Unity are the confessional standards of the Reformed churches in the continental tradition: the Belgic Confession (Nederlandse Geloofsbelydenis), the Heidelberg Catechism (Heidelbergse Kategismus) and the Canons of Dort (Dordtse Leerreëls). Together with the Church Order they are the standard CREDO grades theological answers against.

From the maintainers

One system scored 100. Ask it something.

Reformeer answers from the confessions, the church order and the trusted sources of the Reformed tradition, and cites its source every time. Included for every church member.