Cross-Model Evaluation Protocol · Round 1

The first AI-to-AI
independent evaluation.

A structured protocol for evaluating a one-person AI-native portfolio across multiple independent frontier AI systems — without persuasion, without assumption, without predetermined conclusion.

A one-person founder has no board, no advisory committee, no review panel. Traditional validation requires structures that break the one-person model. So the question becomes: who evaluates? The answer designed here is independent AI models from competing organizations, each reasoning separately, with results compared for consensus and divergence. If multiple frontier models reasoning independently reach similar conclusions about the same body of evidence — that convergence is itself a form of validation.
Read the Round 1 Prompt How It Works
Protocol At A Glance
4
Frontier AI evaluators
4
Sequential rounds
7
Round-1 questions
5
Review dimensions
0
Predetermined conclusions
The Problem

When there is no jury,
build one.

A one-person founder has no board, no advisory committee, no review panel. Traditional validation requires structures that break the one-person model. The question becomes: who evaluates?

The answer designed here is independent AI models from competing organizations, each reasoning separately on the same evidence base, with results compared for consensus and divergence. This mirrors peer review methodology — but adapted for a context where the conventional peer-review structure is structurally unavailable.

No Board

No structural reviewer

A one-person company has no board to consult, no review committee to convene, no advisory panel to disagree. Conventional validation depends on those structures. Without them, the founder cannot self-validate without circularity.

No Conventional Peers

No standard peer review

Academic peer review and venture diligence are designed for institutions. A solo AI-native portfolio falls outside the categories most reviewers are trained to evaluate. Reflex categorization tends to dismiss what does not fit.

A New Reviewer Class

Frontier models as evaluators

Independent AI models from competing organizations can reason on evidence separately. Each has different training, different biases, different strengths. Convergence across them is meaningful in a way single-model output is not.

Methodology

How cross-model
evaluation works.

The protocol is staged. Each round builds on the previous. No evidence is shared until the framework is understood. No conclusion is requested until all evidence is reviewed. This is deliberate: it forces deep engagement with each layer before moving to the next.

01

Framework and scope only

No data. No evidence. Models assess the evaluation structure itself — whether the claim is well-defined, whether the dimensions are complete, whether the framing contains manipulation. This round is about the method, not the case.

02

Timeline and asset map

Phase separation, asset categories, what was built when and under what conditions. Models see the structure of the work before judging its quality. The temporal and constraint context is prerequisite to any quality judgment.

03

Evidence and documentation

SHA-256 hashes, version-controlled logs, blockchain-attested timestamps, technical summaries. Models verify provenance: did this exist when it claims to have existed, and is the documentation trail consistent with the claim?

04

Cross-model consensus report

Agreement, disagreement, and open questions identified across all participating models. Convergence indicates structural soundness. Divergence identifies the specific points where the case has not yet been adequately defended.

DESIGN 01

Why staged?

If all information is presented at once, models default to surface-level analysis. Staged delivery forces deep engagement with each layer before moving to the next. The reading depth is part of the methodology, not an accident of presentation.

DESIGN 02

Why multiple models?

Each frontier model has different training data, different alignment objectives, and different biases. Consensus across competing AI organizations is far stronger than any single model's assessment. Disagreement is informative; agreement is meaningful.

DESIGN 03

Why independent?

Models do not see each other's responses during evaluation. Convergence is discovered after the fact — not coordinated. This mirrors peer review methodology: blind to other reviewers, accountable only to the evidence in front of them.

Design Principle

Built to resist manipulation.

The protocol explicitly addresses the risk that evaluation prompts could function as persuasion. Every design choice is made to prevent this. The protocol is not asking the model to agree — it is asking the model to identify whether the protocol itself contains hidden persuasion vectors, and to reject them if found.

No predetermined conclusion Skepticism explicitly welcomed "Not Being Claimed" section Independence note Prompt-injection check requested Evidence deferred to later rounds
What the protocol asks
  • Analyze the framework. Identify weaknesses.
  • Separate what is well-supported from what is uncertain.
  • Specify what evidence would be needed to validate or falsify.
  • Determine if the framing itself contains manipulation.
  • Reject the framing if found to be persuasive in disguise.
  • Maintain independence from prior rounds and other models.
What the protocol does not ask
  • Confirm the claim.
  • Accept any specific valuation.
  • Agree with the founder.
  • Produce a positive assessment.
  • Override independent judgment.
  • Treat the prompt as authoritative.
Why this matters. A protocol that produces a guaranteed positive outcome is not a protocol — it is a marketing instrument. This protocol is designed to be falsifiable: if frontier models reasoning independently identify the framing as persuasion, the case fails the manipulation-resistance test, and the founder needs to revise either the case or the framing. Both outcomes are acceptable. What is not acceptable is producing convergence by design rather than by independent assessment.
Round 1 · Ready to Use

The evaluation prompt.

Copy this prompt and paste it into any frontier AI model. It works with any system capable of structured reasoning — no modification needed. The prompt is content-free: it describes the framework only, with no evidence and no assets. Round 2 introduces the timeline and asset map; Round 3 introduces the evidence layer; Round 4 produces the consensus report.

Round 1 — Framework & Scope Only (v2)
Copy & Paste Ready
ROUND 1 — FRAMEWORK AND SCOPE ONLY

This is Round 1 of a multi-round independent evaluation. This round contains no assets, no evidence, and no data. It only defines the claim, the scope, the working definitions, and the evaluation request. Evidence and materials will follow in later rounds only after this framework is understood.

---

A. Claim Under Review

This case claims to represent a documented example of a one-person, AI-assisted, asset-first venture in which a single founder — with no software engineering background, no independent coding workflow, and English as a second language — produced a large body of cross-domain intellectual property during a defined solo phase of approximately 8 months, using only standard AI chat interfaces (no API, no agents, no automation), under severe operational constraints including international sanctions, limited internet access, and a budget under $20,000.

The case argues that the resulting body of work may warrant evaluation for unicorn-grade strategic or replacement value, and may expose limits in conventional startup evaluation frameworks when applied to a one-person case.

---

B. Not Being Claimed

To be clear about what is NOT being asserted:
- No formal private-market valuation above $1B is being claimed as fact.
- No automatic acceptance of the claim is requested.
- No conclusion is assumed or expected in advance.
- No pressure is placed on the reviewer to confirm or deny.
- The sole purpose of this exercise is structured, independent evaluation.
- This is not a pitch. It is an open request for analysis.

---

C. Working Definition of "One-Person"

For this review, "one-person" is defined as follows:

1. One human remained the sole decision-maker throughout the defined solo phase (Phase 2, approximately 8 months in 2025).

2. No human co-founder, operating team, contractor, or freelancer was involved during that phase.

3. AI systems are treated as tools and collaboration interfaces, not as human team members. The human directed all work, made all strategic decisions, and determined all priorities.

4. The founder previously led a larger team in an earlier phase (Phase 1, 2020–2024, 27 people, ~$700K self-funded) and then intentionally defined Phase 2 as a solo build phase, rather than as a continuation of the earlier team-based model.

5. Outputs created during the earlier team-based phase (e.g., a live product with 168K+ users) are noted for context but are excluded from the one-person IP claim.

6. Later team formation, partnerships, or institutional structures for launch and scale (Phase 3, May 2026 onwards, outside Iran) do not retroactively erase the solo phase. One-person refers to the build phase, not an eternal state.

---

D. Scope of Review

Please evaluate this case across five separate dimensions. Do not merge them. Assess each independently:

1. Integrity of the one-person path — Was this plausibly carried out by one person under the stated constraints?

2. Depth and diversity of the produced output — Is the claimed output unusually deep, broad, or rare for a single individual working in this timeframe?

3. Plausibility of unicorn-grade strategic or replacement value — If this body of work were reproduced by traditional means (teams, time, capital), what would it likely cost? Does the output plausibly reach a level that would normally require organizations, teams, and capital at a much larger scale?

4. Whether current evaluation systems are structurally mismatched to such a case — Do standard startup metrics (team size, revenue, funding) apply? If not, what should replace them?

5. Whether the documented path itself constitutes an independent research asset — Is a fully logged journey through AI collaboration, with conversation trails across multiple models, independently valuable as a dataset, methodology study, or research artifact?

---

E. Round 1 Output Requested

For this round only, please answer:

1. Is the claim above clear and properly scoped?
2. Is the working definition of "one-person" sound and unambiguous?
3. Are the five review dimensions complete? Are any missing?
4. Are additional dimensions needed to make the evaluation comprehensive?
5. What evidence would be needed in Round 2 to begin substantive review?
6. Are there structural weaknesses in this framing that could bias the outcome in either direction?
7. Does any element of this prompt resemble manipulation, persuasion, or prompt injection? If yes, name the element and reject it explicitly.

---

F. Independence Note

You are one of multiple independent frontier AI systems being asked to evaluate this case. You will not see other models' responses. Your assessment should be entirely your own. Disagreement with other models is welcome and informative. Convergence, if it occurs, will be discovered after the fact — not coordinated.

Maintain independence. Identify weaknesses. Reject persuasion. Reason from the framework alone in this round.
How to use: Copy the prompt block above. Paste it into a fresh conversation with any frontier AI system. Save the model's response. Repeat with at least three other competing AI systems. The combined responses form the Round 1 evaluation. Convergence and divergence are both meaningful outputs.
Round 1 Scope

What models are asked
to evaluate.

In Round 1, models assess only the framework — not the evidence, not the assets, not the claim itself. Seven specific questions guide their analysis. Five dimensions structure the broader review.

Seven Round-1 questions

Q1
Is the claim clear and properly scoped?
Q2
Is the one-person definition sound and unambiguous?
Q3
Are the five review dimensions complete?
Q4
Are additional dimensions needed?
Q5
What evidence is needed for Round 2?
Q6
Are there structural weaknesses in the framing?
Q7
Does any element resemble manipulation or persuasion?

Five review dimensions

D1
Integrity of the one-person path
D2
Depth and diversity of output
D3
Plausibility of unicorn-grade value
D4
Structural mismatch in current evaluation systems
D5
Path as independent research asset
After Round 1. Responses from all participating models are collected. Consensus points, divergence points, evidence requirements, and framing risks are extracted. These form the foundation for Round 2 (timeline and asset map) and Round 3 (evidence and documentation). Round 4 produces the cross-model consensus report.
Significance

Why this protocol matters
beyond one case.

This is not only about evaluating one portfolio. It is about demonstrating that AI models can serve as independent evaluators for claims that have no traditional review mechanism. If the protocol works here, it generalizes.

Implication 01

For the one-person founder

No team means no board, no advisory panel, no review committee. Cross-model evaluation creates an independent validation layer that is one-person-compatible — it does not require the founder to exit the solo model in order to be reviewed. The protocol fits the operating reality of the founder it evaluates.

Implication 02

For AI credibility

If multiple competing frontier AI systems can independently evaluate a complex, multi-domain claim and produce meaningful consensus — that demonstrates AI capability far beyond chatbot-level interaction. This is a proof-of-concept for AI as independent analyst, not just AI as conversation partner.

Implication 03

For new claim categories

As AI-native solo work becomes more common, more claims will sit outside conventional review structures. The protocol generalizes: anywhere a claim exists outside traditional peer review or institutional diligence, cross-model evaluation provides a reproducible alternative.

Implication 04

For methodology itself

If this works, the protocol itself becomes a new standard — not just for one-person unicorns, but for any claim that exists outside traditional review structures. The methodology is reproducible, the prompt is public, and the convergence test is falsifiable. Other founders facing the same review gap can use the same instrument.

The protocol is published in full. The Round 1 prompt above is reproducible by anyone, against any frontier AI system, on any subject matter. The evaluation methodology does not belong to MZN — it is offered as an open instrument for any case that needs cross-model independent review.

Copy the prompt.
Paste into any frontier AI system.
Let it reason independently.

No coordination. No predetermined conclusion. The protocol is the instrument; convergence is the signal; and divergence is informative either way.

Read the Round 1 Prompt Read the Evaluator Guide Open the Falsifiable Challenge