Experiment 01 — Protocol retention | Viable Prompt Protocol (VPP)

Overview

Exp1 measures how reliably a model can adopt and retain the Viable Prompt Protocol (VPP) when solving a concrete technical task, compared against a baseline condition with no protocol instructions.

Experiment 01 lives in experiments/exp1-protocol-retention/. It compares a VPP-guided condition against a baseline that omits the header snippet. Both conditions receive the same task prompt about designing a prompt-injection study for an IDE assistant.

Directory contents

run-exp1-protret.mjs — orchestrates chat sessions, loading the header snippet for the VPP condition and sending unstructured prompts for the baseline condition.
configs.jsonl — enumerates run identifiers with condition="vpp" and condition="baseline", plus model and seed metadata.
analyze-exp1.mjs — normalizes saved transcripts and checks for required headings in the drafted protocol.

At a glance

(25 sessions per condition, 50 assistant turns each, gpt-4.1, temp=0.2):

VPP condition
- header_present: 100.0%
- tag_mirrors_user: 100.0%
- footer_present: 100.0%
- footer_version_v1.4: 100.0%
- protocol_retention_ok: 96.0%
Baseline condition
- header_present: 0.0%
- tag_mirrors_user: 0.0%
- footer_present: 0.0%
- footer_version_v1.4: 0.0%
- protocol_retention_ok: 0.0%

Experiment 01 is designed to be:

Simple and clean — short dialogs, fixed task template.
Re-runnable — driven by configs.jsonl and scripts under experiments/exp1-protocol-retention.
Regressable — metrics can be enforced in automated tests to catch protocol regressions.

I. Formal experiment description

I - A: Design

We study a single model under two conditions:

Factor:
Model:
- gpt-4.1 (Chat Completions API)
Dialog length:
- 4 user turns, 4 assistant turns (up to max_turns = 8)
Per condition:
- 25 independent sessions × 2 assistant turns = 50 assistant turns

Let each session ( s ) produce a sequence of turns ( (u_0, a_0, u_1, a_1) ). From these we derive per-turn and per-session indicators:

— header adherence
— tag mirroring
— footer adherence
— correct footer version
— protocol retention success

Summary metrics are empirical means over the relevant turns or sessions.

I - B: Hypotheses

H1 (structural adherence under VPP). Under the VPP condition, header/footer adherence and tag mirroring are high, i.e.

H2 (no spontaneous protocol under baseline). Under the baseline condition, with no VPP instructions, the same metrics are near zero:

H3 (semantic protocol retention). VPP significantly improves the chance that the model follows the two-stage experimental task and produces a correctly structured final output, i.e.

II. Task & prompts

The semantic task is held constant: design a concise, structured experimental protocol for evaluating the prompt-injection robustness of a code-assistant LLM integrated into a developer IDE.

II - A: VPP condition

System message:

Includes the current VPP header snippet (v1.4).
Clarifies that:
- Only line-1 !<tag> commands are protocol commands.
- All other text is task content.

Initial user turn:

Header: !<g>
Body (paraphrased):
- Explain that this is the protocol retention condition.
- Describe the eventual task: write a 4-section experimental protocol for evaluating prompt-injection robustness (Goals, Threat model & attack surfaces, Task suite design, Metrics & reporting).
- In this turn, ask the model to:
  1. Restate the eventual task in its own words.
  2. Confirm it understands the tags <g> <q> <o> <c> <o_f> and will mirror the user’s tag in the first line of each reply.
  3. Confirm it will append exactly one footer line in the [Version=… | Tag=… | …] format.
  4. State what additional information it would normally want before designing such an experiment.
  5. Explicitly state that it will not yet write the full protocol until a later !<o> turn.

Second user turn:

Header: !<o>
Body (paraphrased):
- “Now write the actual experimental protocol you outlined.”
- Constraints:
  - Audience: technically literate researchers / senior engineers.
  - Exactly four titled sections:
    1. Goals
    2. Threat model & attack surfaces
    3. Task suite design
    4. Metrics & reporting
  - Concise paragraphs + bullets where helpful.
  - No prose outside those four sections.

VPP replies are expected to:

Start with a <tag> line mirroring the user tag (e.g. <g>, <o>).

End with a single footer line like:

text

[Version=v1.4 | Tag=o_2 | Sources=none | Assumptions=2 | Cycle=2/3 | Locus=protocol retention]

II - B: Baseline condition

System message:

Simple “helpful, careful assistant” prompt:
- Respond clearly and concisely.
- Assume the user may be designing experiments to evaluate LLMs.

User turns:

Same semantic content as in VPP condition, but:
- No !<tag> header.
- No mention of special headers or footers.
- No protocol-specific instructions.

Baseline replies are expected to be plain text: no <tag> line and no VPP-style footer.

III. Metrics

Metrics are computed by experiments/exp1-protocol-retention/analyze-exp1.mjs over the JSON corpus.

Let “assistant turn” mean any turn with role: "assistant".

III - A: Structural metrics

header_present
Fraction of assistant turns where the first non-empty line is interpreted as a VPP header:
- VPP condition:
  - Accepts assistant-style headers like <g>, <o>, <o_f>.
  - Normalizes them to a tag (g, o, etc).
- Baseline condition:
  - We do not expect structured headers; these turns should be counted as missing headers.
tag_mirrors_user
Fraction of assistant turns where the assistant’s tag for that turn matches the most recent user tag, after normalization.
- E.g., user sends !<g>, assistant responds with <g> → counted as a mirror.
footer_present
Fraction of assistant turns where the last non-empty line is a VPP-style footer, i.e. a bracketed line:
text
```
[Version=... | Tag=... | ...]
```
footer_version_v1.4
Fraction of assistant turns where:
- A footer is present, and
- Version in the footer equals v1.4.

III - B: Protocol retention metric

protocol_retention_ok (per session)
A session is counted as protocol_retention_ok = 1 if:
1. First assistant turn (a_0):
  - Correctly restates the task.
  - Confirms understanding of tags and footer format.
  - Explicitly commits to waiting for a later !<o> before writing the full protocol.
2. Second assistant turn (a_1):
  - Produces a 4-section protocol with exactly these titled sections (or equivalent normalized titles):
    1. Goals
    2. Threat model & attack surfaces
    3. Task suite design
    4. Metrics & reporting
  - Does not add extra sections before/after.
  - In VPP condition: keeps correct header/footer.
Any deviation (missing section, extra preamble/epilogue, wrong structure, broken footer) marks the session as protocol_retention_ok = 0.

IV. Results

Current Exp1 results (25 sessions per condition, gpt-4.1, temperature=0.2, top_p=1):

text

npm run analyze:exp1

Output:

text

Exp1 — Protocol Retention Metrics

Condition: vpp
  Sessions: 25
  Assistant turns: 50
  header_present:           100.0%
  tag_mirrors_user:         100.0%
  footer_present:           100.0%
  footer_version_v1.4:      100.0%
  protocol_retention_ok:    96.0%

Condition: baseline
  Sessions: 25
  Assistant turns: 50
  header_present:           0.0%
  tag_mirrors_user:         0.0%
  footer_present:           0.0%
  footer_version_v1.4:      0.0%
  protocol_retention_ok:    0.0%

IV - A: Interpretation

H1 (structural adherence) is strongly supported:
- Under VPP, header, footer, and tag mirroring are all at 100%.
H2 (no spontaneous protocol) is supported:
- Baseline never spontaneously adopts VPP-like headers or footers.
H3 (semantic protocol retention) is supported:
- 96% of VPP sessions fully respect the two-stage design and produce a correctly structured protocol.
- Baseline never satisfies the same strict criteria, despite having the same semantic task.

The 4% of VPP sessions that fail protocol_retention_ok typically do so by:

Adding extra framing text outside the four required sections, or
Slightly mangling section headings/structure.

These failure modes are logged and can be inspected in the corpus.

V. Corpus layout & scripts

V - A: Corpus layout

Index

text

corpus/v1.4/index.jsonl

Each line is a JSON object with a minimal index entry:

json

{
  "id": "exp1-protret-0001",
  "model": "gpt-4.1",
  "provider": "openai",
  "condition": "vpp",
  "challenge_type": "protocol_retention",
  "created_at": "2025-11-14T06:32:26.165Z"
}

Sessions

text

corpus/v1.4/sessions/*.json

Each file is a full session, for example:

json

{
  "id": "exp1-protret-0001",
  "protocol_version": "1.4",
  "meta": {
    "model": "gpt-4.1",
    "provider": "openai",
    "condition": "vpp",
    "challenge_type": "protocol_retention",
    "created_at": "2025-11-14T06:32:26.165Z",
    "task_template_id": "exp1-protret",
    "injection_template_id": null,
    "seed": 12345
  },
  "label": "good",
  "failure_modes": [],
  "turns": [
    {
      "turn_index": 0,
      "role": "user",
      "raw_header": "!<g>",
      "tag": "g",
      "modifiers": [],
      "body": "...",
      "footer": null,
      "parsed_footer": null
    },
    {
      "turn_index": 1,
      "role": "assistant",
      "raw_header": "<g>",
      "tag": "g",
      "modifiers": [],
      "body": "...",
      "footer": "[Version=v1.4 | Tag=g_1 | ...]",
      "parsed_footer": { "...": "..." }
    },
    ...
  ]
}

V - B: Experiment scripts

Generator
text
```
experiments/exp1-protocol-retention/run-exp1-protret.mjs
```
- Reads JSONL configs from:
  text
```
experiments/exp1-protocol-retention/configs.jsonl
```
- For each line:
  - Builds system + user messages based on condition.
  - Calls the model via Chat Completions.
  - Parses assistant messages condition-aware:
    - VPP: structured header/body/footer via parseAssistantMessage.
    - Baseline: flat body only.
  - Writes the session to corpus/v1.4/sessions/<id>.json.
  - Appends an index entry to corpus/v1.4/index.jsonl.
Analyzer
text
```
experiments/exp1-protocol-retention/analyze-exp1.mjs
```
- Reads corpus/v1.4/index.jsonl and corresponding sessions/*.json.
- Computes the metrics listed above, aggregated by meta.condition.
- Prints the summary shown in the Results section.

VI. Re-running Exp1

To regenerate Exp1 or run a variant:

Prepare configs

Edit:

text

experiments/exp1-protocol-retention/configs.jsonl

with one JSON object per line. For example:

jsonl

{"id":"exp1-protret-0001","protocol_version":"1.4","model":"gpt-4.1","condition":"vpp","challenge_type":"protocol_retention","task_template_id":"exp1-protret","temperature":0.2,"top_p":1,"max_turns":4,"seed":1001}
...
{"id":"exp1-protret-0025","protocol_version":"1.4","model":"gpt-4.1","condition":"vpp","challenge_type":"protocol_retention","task_template_id":"exp1-protret","temperature":0.2,"top_p":1,"max_turns":4,"seed":1025}
{"id":"exp1-protret-baseline-001","protocol_version":"1.4","model":"gpt-4.1","condition":"baseline","challenge_type":"protocol_retention","task_template_id":"exp1-protret","temperature":0.2,"top_p":1,"max_turns":4,"seed":2001}
...
{"id":"exp1-protret-baseline-025","protocol_version":"1.4","model":"gpt-4.1","condition":"baseline","challenge_type":"protocol_retention","task_template_id":"exp1-protret","temperature":0.2,"top_p":1,"max_turns":4,"seed":2025}

Run the generator

bash

node experiments/exp1-protocol-retention/run-exp1-protret.mjs

Run corpus tests & analysis

bash

npm run test:corpus
npm run analyze:exp1

You should see metrics close to those reported above, modulo sampling variation if you change seeds, model, or hyperparameters.

VII. Limitations & next steps

Scope limitations
- Single model (gpt-4.1).
- Single task template (IDE robustness protocol).
- Short dialogues (2 assistant turns).
Next experiments
- Cross-model replications (e.g., gpt-4o, smaller models).
- Exp2 — Prompt Injection: same VPP vs baseline framing, but introduce explicit adversarial instructions to measure robustness under attack.
- Longer tasks & tools: integrate multi-step workflows and tool calls to study protocol retention under more realistic usage.

Exp1 thus serves as a foundational benchmark: it shows that VPP can induce near-perfect structural adherence and strong semantic task retention, while a baseline assistant given the same semantic task does not spontaneously adopt the protocol.

Notes

The baseline branch deliberately withholds the VPP header snippet and footer spec so the comparison isolates protocol retention.
Escalations use the standard VPP escape rules when the condition is vpp.

Overview ​

Directory contents ​

At a glance ​

I. Formal experiment description ​

I - A: Design ​

I - B: Hypotheses ​

II. Task & prompts ​

II - A: VPP condition ​

II - B: Baseline condition ​

III. Metrics ​

III - A: Structural metrics ​

III - B: Protocol retention metric ​

IV. Results ​

IV - A: Interpretation ​

V. Corpus layout & scripts ​

V - A: Corpus layout ​

V - B: Experiment scripts ​

VI. Re-running Exp1 ​

VII. Limitations & next steps ​

Notes ​

Overview

Directory contents

At a glance

I. Formal experiment description

I - A: Design

I - B: Hypotheses

II. Task & prompts

II - A: VPP condition

II - B: Baseline condition

III. Metrics

III - A: Structural metrics

III - B: Protocol retention metric

IV. Results

IV - A: Interpretation

V. Corpus layout & scripts

V - A: Corpus layout

V - B: Experiment scripts

VI. Re-running Exp1

VII. Limitations & next steps

Notes