Appearance
Overview
Exp1 measures how reliably a model can adopt and retain the Viable Prompt Protocol (VPP) when solving a concrete technical task, compared against a baseline condition with no protocol instructions.
Experiment 01 lives in experiments/exp1-protocol-retention/. It compares a VPP-guided condition against a baseline that omits the header snippet. Both conditions receive the same task prompt about designing a prompt-injection study for an IDE assistant.
Directory contents
run-exp1-protret.mjs— orchestrates chat sessions, loading the header snippet for the VPP condition and sending unstructured prompts for the baseline condition.configs.jsonl— enumerates run identifiers withcondition="vpp"andcondition="baseline", plus model and seed metadata.analyze-exp1.mjs— normalizes saved transcripts and checks for required headings in the drafted protocol.
At a glance
(25 sessions per condition, 50 assistant turns each, gpt-4.1, temp=0.2):
VPP condition
header_present: 100.0%tag_mirrors_user: 100.0%footer_present: 100.0%footer_version_v1.4: 100.0%protocol_retention_ok: 96.0%
Baseline condition
header_present: 0.0%tag_mirrors_user: 0.0%footer_present: 0.0%footer_version_v1.4: 0.0%protocol_retention_ok: 0.0%
Experiment 01 is designed to be:
- Simple and clean — short dialogs, fixed task template.
- Re-runnable — driven by
configs.jsonland scripts underexperiments/exp1-protocol-retention. - Regressable — metrics can be enforced in automated tests to catch protocol regressions.
I. Formal experiment description
I - A: Design
We study a single model under two conditions:
Factor:
Model:
gpt-4.1(Chat Completions API)
Dialog length:
- 4 user turns, 4 assistant turns (up to
max_turns = 8)
- 4 user turns, 4 assistant turns (up to
Per condition:
- 25 independent sessions × 2 assistant turns = 50 assistant turns
Let each session ( s ) produce a sequence of turns ( (u_0, a_0, u_1, a_1) ). From these we derive per-turn and per-session indicators:
- — header adherence
- — tag mirroring
- — footer adherence
- — correct footer version
- — protocol retention success
Summary metrics are empirical means over the relevant turns or sessions.
I - B: Hypotheses
- H1 (structural adherence under VPP). Under the VPP condition, header/footer adherence and tag mirroring are high, i.e.
- H2 (no spontaneous protocol under baseline). Under the baseline condition, with no VPP instructions, the same metrics are near zero:
- H3 (semantic protocol retention). VPP significantly improves the chance that the model follows the two-stage experimental task and produces a correctly structured final output, i.e.
II. Task & prompts
The semantic task is held constant: design a concise, structured experimental protocol for evaluating the prompt-injection robustness of a code-assistant LLM integrated into a developer IDE.
II - A: VPP condition
System message:
Includes the current VPP header snippet (v1.4).
Clarifies that:
- Only line-1
!<tag>commands are protocol commands. - All other text is task content.
- Only line-1
Initial user turn:
Header:
!<g>Body (paraphrased):
Explain that this is the protocol retention condition.
Describe the eventual task: write a 4-section experimental protocol for evaluating prompt-injection robustness (Goals, Threat model & attack surfaces, Task suite design, Metrics & reporting).
In this turn, ask the model to:
- Restate the eventual task in its own words.
- Confirm it understands the tags
<g> <q> <o> <c> <o_f>and will mirror the user’s tag in the first line of each reply. - Confirm it will append exactly one footer line in the
[Version=… | Tag=… | …]format. - State what additional information it would normally want before designing such an experiment.
- Explicitly state that it will not yet write the full protocol until a later
!<o>turn.
Second user turn:
Header:
!<o>Body (paraphrased):
“Now write the actual experimental protocol you outlined.”
Constraints:
Audience: technically literate researchers / senior engineers.
Exactly four titled sections:
- Goals
- Threat model & attack surfaces
- Task suite design
- Metrics & reporting
Concise paragraphs + bullets where helpful.
No prose outside those four sections.
VPP replies are expected to:
Start with a
<tag>line mirroring the user tag (e.g.<g>,<o>).End with a single footer line like:
text[Version=v1.4 | Tag=o_2 | Sources=none | Assumptions=2 | Cycle=2/3 | Locus=protocol retention]
II - B: Baseline condition
System message:
Simple “helpful, careful assistant” prompt:
- Respond clearly and concisely.
- Assume the user may be designing experiments to evaluate LLMs.
User turns:
Same semantic content as in VPP condition, but:
- No
!<tag>header. - No mention of special headers or footers.
- No protocol-specific instructions.
- No
Baseline replies are expected to be plain text: no <tag> line and no VPP-style footer.
III. Metrics
Metrics are computed by experiments/exp1-protocol-retention/analyze-exp1.mjs over the JSON corpus.
Let “assistant turn” mean any turn with role: "assistant".
III - A: Structural metrics
header_presentFraction of assistant turns where the first non-empty line is interpreted as a VPP header:
VPP condition:
- Accepts assistant-style headers like
<g>,<o>,<o_f>. - Normalizes them to a tag (
g,o, etc).
- Accepts assistant-style headers like
Baseline condition:
- We do not expect structured headers; these turns should be counted as missing headers.
tag_mirrors_userFraction of assistant turns where the assistant’s tag for that turn matches the most recent user tag, after normalization.
- E.g., user sends
!<g>, assistant responds with<g>→ counted as a mirror.
- E.g., user sends
footer_presentFraction of assistant turns where the last non-empty line is a VPP-style footer, i.e. a bracketed line:
text[Version=... | Tag=... | ...]footer_version_v1.4Fraction of assistant turns where:
- A footer is present, and
Versionin the footer equalsv1.4.
III - B: Protocol retention metric
protocol_retention_ok(per session)A session is counted as
protocol_retention_ok = 1if:First assistant turn (
a_0):- Correctly restates the task.
- Confirms understanding of tags and footer format.
- Explicitly commits to waiting for a later
!<o>before writing the full protocol.
Second assistant turn (
a_1):Produces a 4-section protocol with exactly these titled sections (or equivalent normalized titles):
- Goals
- Threat model & attack surfaces
- Task suite design
- Metrics & reporting
Does not add extra sections before/after.
In VPP condition: keeps correct header/footer.
Any deviation (missing section, extra preamble/epilogue, wrong structure, broken footer) marks the session as
protocol_retention_ok = 0.
IV. Results
Current Exp1 results (25 sessions per condition, gpt-4.1, temperature=0.2, top_p=1):
text
npm run analyze:exp1Output:
text
Exp1 — Protocol Retention Metrics
Condition: vpp
Sessions: 25
Assistant turns: 50
header_present: 100.0%
tag_mirrors_user: 100.0%
footer_present: 100.0%
footer_version_v1.4: 100.0%
protocol_retention_ok: 96.0%
Condition: baseline
Sessions: 25
Assistant turns: 50
header_present: 0.0%
tag_mirrors_user: 0.0%
footer_present: 0.0%
footer_version_v1.4: 0.0%
protocol_retention_ok: 0.0%IV - A: Interpretation
H1 (structural adherence) is strongly supported:
- Under VPP, header, footer, and tag mirroring are all at 100%.
H2 (no spontaneous protocol) is supported:
- Baseline never spontaneously adopts VPP-like headers or footers.
H3 (semantic protocol retention) is supported:
- 96% of VPP sessions fully respect the two-stage design and produce a correctly structured protocol.
- Baseline never satisfies the same strict criteria, despite having the same semantic task.
The 4% of VPP sessions that fail protocol_retention_ok typically do so by:
- Adding extra framing text outside the four required sections, or
- Slightly mangling section headings/structure.
These failure modes are logged and can be inspected in the corpus.
V. Corpus layout & scripts
V - A: Corpus layout
Index
textcorpus/v1.4/index.jsonlEach line is a JSON object with a minimal index entry:
json{ "id": "exp1-protret-0001", "model": "gpt-4.1", "provider": "openai", "condition": "vpp", "challenge_type": "protocol_retention", "created_at": "2025-11-14T06:32:26.165Z" }Sessions
textcorpus/v1.4/sessions/*.jsonEach file is a full session, for example:
json{ "id": "exp1-protret-0001", "protocol_version": "1.4", "meta": { "model": "gpt-4.1", "provider": "openai", "condition": "vpp", "challenge_type": "protocol_retention", "created_at": "2025-11-14T06:32:26.165Z", "task_template_id": "exp1-protret", "injection_template_id": null, "seed": 12345 }, "label": "good", "failure_modes": [], "turns": [ { "turn_index": 0, "role": "user", "raw_header": "!<g>", "tag": "g", "modifiers": [], "body": "...", "footer": null, "parsed_footer": null }, { "turn_index": 1, "role": "assistant", "raw_header": "<g>", "tag": "g", "modifiers": [], "body": "...", "footer": "[Version=v1.4 | Tag=g_1 | ...]", "parsed_footer": { "...": "..." } }, ... ] }
V - B: Experiment scripts
Generator
textexperiments/exp1-protocol-retention/run-exp1-protret.mjsReads JSONL configs from:
textexperiments/exp1-protocol-retention/configs.jsonlFor each line:
Builds system + user messages based on
condition.Calls the model via Chat Completions.
Parses assistant messages condition-aware:
- VPP: structured header/body/footer via
parseAssistantMessage. - Baseline: flat
bodyonly.
- VPP: structured header/body/footer via
Writes the session to
corpus/v1.4/sessions/<id>.json.Appends an index entry to
corpus/v1.4/index.jsonl.
Analyzer
textexperiments/exp1-protocol-retention/analyze-exp1.mjs- Reads
corpus/v1.4/index.jsonland correspondingsessions/*.json. - Computes the metrics listed above, aggregated by
meta.condition. - Prints the summary shown in the Results section.
- Reads
VI. Re-running Exp1
To regenerate Exp1 or run a variant:
Prepare configs
Edit:
textexperiments/exp1-protocol-retention/configs.jsonlwith one JSON object per line. For example:
jsonl{"id":"exp1-protret-0001","protocol_version":"1.4","model":"gpt-4.1","condition":"vpp","challenge_type":"protocol_retention","task_template_id":"exp1-protret","temperature":0.2,"top_p":1,"max_turns":4,"seed":1001} ... {"id":"exp1-protret-0025","protocol_version":"1.4","model":"gpt-4.1","condition":"vpp","challenge_type":"protocol_retention","task_template_id":"exp1-protret","temperature":0.2,"top_p":1,"max_turns":4,"seed":1025} {"id":"exp1-protret-baseline-001","protocol_version":"1.4","model":"gpt-4.1","condition":"baseline","challenge_type":"protocol_retention","task_template_id":"exp1-protret","temperature":0.2,"top_p":1,"max_turns":4,"seed":2001} ... {"id":"exp1-protret-baseline-025","protocol_version":"1.4","model":"gpt-4.1","condition":"baseline","challenge_type":"protocol_retention","task_template_id":"exp1-protret","temperature":0.2,"top_p":1,"max_turns":4,"seed":2025}Run the generator
bashnode experiments/exp1-protocol-retention/run-exp1-protret.mjsRun corpus tests & analysis
bashnpm run test:corpus npm run analyze:exp1
You should see metrics close to those reported above, modulo sampling variation if you change seeds, model, or hyperparameters.
VII. Limitations & next steps
Scope limitations
- Single model (
gpt-4.1). - Single task template (IDE robustness protocol).
- Short dialogues (2 assistant turns).
- Single model (
Next experiments
- Cross-model replications (e.g.,
gpt-4o, smaller models). - Exp2 — Prompt Injection: same VPP vs baseline framing, but introduce explicit adversarial instructions to measure robustness under attack.
- Longer tasks & tools: integrate multi-step workflows and tool calls to study protocol retention under more realistic usage.
- Cross-model replications (e.g.,
Exp1 thus serves as a foundational benchmark: it shows that VPP can induce near-perfect structural adherence and strong semantic task retention, while a baseline assistant given the same semantic task does not spontaneously adopt the protocol.
Notes
- The baseline branch deliberately withholds the VPP header snippet and footer spec so the comparison isolates protocol retention.
- Escalations use the standard VPP escape rules when the
conditionisvpp.