AXON v1.1

AXON Playground

Translate natural language into compact, token-efficient notation. Runs entirely in your browser via WebAssembly.

AXON Output

Try an example

700+
Test cases
~71%
Avg token savings
0
Network requests

vNext

What's coming next for AXON — from prompt compression to native model understanding.

Fine-Tuned AXON Architecture

Today, AXON compresses prompts and relies on the base model to interpret notation via a system prompt. A fine-tuned model eliminates the system prompt overhead entirely and gains efficiency at every stage of inference.

TODAY — SYSTEM PROMPT User Prompt natural language 105 tok AXON -35% prompt 68 tok system prompt ~500 tok Concat prompt + sys prompt ~568 tok Base LLM interprets AXON via system prompt Response natural language Overhead System prompt parsed every request: ~500 tok Attention cost: O(n²) Context: 568 + response tok VNEXT — FINE-TUNED ON AXON User Prompt natural language 105 tok AXON -35% prompt 68 tok sys prompt 0 tok qwen2.5:7b-axon AXON is native no prompt overhead AXON Response compact notation fewer output tokens AXON Decoder client-side, 0 API cost Response natural language Efficiency Gains with Fine-Tuning 1 Zero system prompt overhead Eliminates ~500 tokens per request. Model natively understands AXON. 2 Smaller attention window 68 input tokens vs 568. Quadratic attention savings: O(68²) vs O(568²) = 70x less compute. 3 AXON-native output Model responds in AXON notation. Fewer output tokens generated, decoded client-side at zero cost. 4 Faster time-to-first-token Prefill phase processes 68 tokens instead of 568. 8x faster TTFT on input-bound workloads. Fine-tuned: ~88% fewer input tokens, ~50% fewer output tokens, ~70x less attention compute

Full Stack: AXON + Qwen2.5-7B-AXON + Caveman

Three complementary approaches — each targeting a different stage of the inference pipeline. AXON compresses input, a fine-tuned 7B model understands it natively, and Caveman compresses output. Combined, they minimize tokens at every stage.

Combined Inference Pipeline — Maximum Efficiency AXON input compression QWEN2.5-7B-AXON fine-tuned instruct model CAVEMAN output compression User Prompt natural language text 105 tok AXON v1.1 Translator sigils + L0 abbrev + stop-word strip runs client-side via WASM -35% input tok 68 tok sys prompt 0 tok 68 tok in qwen2.5:7b-instruct fine-tuned on AXON notation Embeddings Transformer AXON 7B Caveman Output terse AXON response baked in + prompted -70% output tokens ~610 tok AXON Decoder client-side WASM, 0 cost Final Response natural language Combined Efficiency Gains AXON — Input Side Prompt compression: 105 → 68 tokens (-35%). Runs client-side, zero latency added. No system prompt needed with fine-tuned model: saves ~500 tok per request. qwen2.5:7b-instruct — Model Side 7B params: fast inference, deployable on a single GPU or CPU via Ollama/vLLM. Native AXON I/O: no notation interpretation overhead. Prefill: O(68²) vs O(605²) = ~79x less compute. Generates AXON notation: denser information per token, fewer tokens to generate. Caveman — Output Side Terse response style baked into fine-tune + reinforced by prompt. Stacks with AXON notation. AXON density + Caveman brevity = ~70% fewer output tokens. Client-side decode at zero cost. Full stack: 68 tok in, 0 sys prompt, ~610 tok out, 7B model, single GPU vs baseline: -35% input, -70% output, ~79x less prefill compute

Three-Way Comparison

How each approach stacks across the inference pipeline. Columns are cumulative — each adds to the previous.

Stage Baseline + AXON + Qwen-7B-AXON + Caveman
Prompt tokens 105 tok 68 tok 68 tok 68 tok
System prompt 0 ~500 tok 0 ~20 tok
Total input to model 105 ~568 68 88
Prefill attention O(105²) O(568²) O(68²) O(88²)
Model size API (large) API (large) 7B (local) 7B (local)
Output format Natural lang Natural lang AXON notation Terse AXON
Output tokens (est.) ~2,038 ~2,038 ~1,000 ~610
Client decode None None WASM decoder WASM decoder
Total tokens ~2,143 ~2,606 ~1,068 ~698
vs Baseline +21.6% -50.2% -67.4%

+ columns are cumulative. "+ Qwen-7B-AXON" replaces the API model with a local 7B fine-tune that natively speaks AXON. "+ Caveman" adds terse output shaping. Text-only pipeline — no image tokens.

?

Why qwen2.5:7b-instruct? It's an open-weight 7B instruction-tuned model — small enough to run locally on a single GPU (or even CPU via Ollama/llama.cpp), yet capable enough for code assistance, technical Q&A, and structured output. Fine-tuning it on AXON notation (qwen2.5-7b-axon) would produce a model that natively understands and generates AXON without a system prompt.

The key advantage over API models: zero per-token cost, full control over the model, and the system prompt overhead disappears entirely. Combined with Caveman-style terse output baked into the fine-tune, every token in and out carries maximum information density.

Today vs Fine-Tuned

Metric Baseline Today (AXON) vNext (Fine-Tuned)
System prompt 0 tok ~500 tok 0 tok
Prompt tokens 105 68 68
Total input to model 105 ~568 68
Attention compute O(105²) O(568²) O(68²)
Response format Natural language Natural language AXON notation
Output tokens (est.) 2,038 2,038 ~1,000
Prefill speedup 1x 0.2x (slower) 1.5x faster

Estimates based on current AXON v1.1 L0 compression ratios. Actual fine-tuned model performance will depend on training data quality and model architecture.

!

The system prompt paradox. Today, AXON saves 35% on prompt tokens but adds ~500 tokens of system prompt to teach the model the notation. This means AXON only breaks even on prompts longer than ~770 tokens. A fine-tuned model eliminates this overhead entirely — every prompt is net positive, no matter how short.

Roadmap

Shipped

v1.1 — Tiered Abbreviations

Four abbreviation levels (L0-L3), stop-word stripping in code queries, response compression directive. 35% prompt savings on general technical questions, 71% on code commands.

Next

v1.2 — Bidirectional WASM Codec

Full encode/decode round-trip in the WASM engine. Client-side decoding of AXON responses at zero API cost. Enables the fine-tuned model to respond in AXON notation with lossless reconstruction.

Planned

v2.0 — qwen2.5-7b-axon

qwen2.5:7b-instruct fine-tuned to natively understand and produce AXON notation. No system prompt needed. Runs locally on a single GPU. Both input and output compressed. Target: 88% fewer input tokens, 50% fewer output tokens, 79x less attention compute per request.

Planned

Custom AXON Tokenizer

A BPE tokenizer trained on AXON notation, where sigils, operators, and abbreviated terms are single tokens instead of multi-token sequences. Further reduces token count at the tokenizer level.

The Full Stack

Every layer of the inference stack has compression opportunities. Today AXON addresses one layer. The vision is all of them.

PROMPT
AXON
Rule-based NLP
shipped
TOKENIZER
AXON BPE
Custom vocabulary
planned
MODEL
Fine-Tune
Native AXON I/O
planned
RESPONSE
Decoder
Client-side WASM
next

Specification

AXON v1.1 — ASCII-only operators, conditional sigils, tiered abbreviation levels, compound merging, response compression.

Type Sigils

Sigils are applied conditionally — only to tokens in the known entity/concept/verb databases. Unknown tokens are emitted bare.

Sigil Name Example
@Entity / Agent@sun @openai @user
#Concept / Abstract#gravity #justice
~Process / Action~emit ~learn
$Scalar / Value$high $3.14
^Temporal^now ^T+3d
!Negation!evidence !data
?Query / Unknown?cause ?result

Operators (ASCII only)

Operator Name Reads as
->Causes"leads to"
<-Result of"caused by"
:.Therefore"conclusion"
bcBecause"premise / reason"
&&And"conjunction"
||Or"disjunction"
A.For all"universal"
E.Exists"existential"

Confidence Markers

!! Certain
! High
~ Moderate
* Low
** Speculative
? Unknown

Temporal Markers

^now current moment
^T-7d one week ago
^T+30d next month
^A.t always / all time

Abbreviation Tiers (v1.1)

AXON supports four abbreviation levels, each inclusive of the previous. Select a level to match your compression needs.

Command Verbs >

Query Types ?

Structural Operations

Syntax Meaning
@Type+.fieldAdd field
@Type-.fieldRemove field
@Type.x=$vSet field value
@Type.x:TSet field type
@Type:impl(@Trait)Implement trait
@Child<@ParentInherit / extend
+use(module)Add import
-use(module)Remove import

Examples

Curated translations showing AXON v1.1 in action. Click any example to try it in the playground.

LLM Prompt

Paste this system prompt into any LLM to enable AXON v1.1 notation understanding.

System Prompt
You are an expert in AXON v1.1 — AI eXchange Optimized Notation, a compact symbolic language that compresses natural language into token-efficient expressions.

## Sigils (applied only to known entities/concepts/verbs; unknown tokens are bare)
@ = Entity/Agent  # = Concept  ~ = Process/Action  $ = Scalar  ^ = Temporal  ! = Negation  ? = Query

## Operators (ASCII only)
-> = causes  <- = caused by  :. = therefore  bc = because
&& = and  || = or  A. = for all  E. = exists

## Abbreviation Tiers (each level includes all previous)
L0 (default): object->obj, function->fn, component->comp, database->db, connection->conn,
  implementation->impl, authentication->auth, application->app, configuration->cfg,
  environment->env, performance->perf, memory->mem, render->rnd, parameter->param
L1 (extended): protocol->proto, process->proc, algorithm->algo, container->ctr,
  endpoint->ep, interface->iface, architecture->arch, library->lib, package->pkg
L2 (aggressive): between->btw, difference->diff, example->ex, strategy->strat,
  alternative->alt, mechanism->mech, automatic->auto, distributed->distrib
L3 (maximum): function->f, variable->v, string->s, number->n, boolean->b, message->m

## Rules
1. Strip filler words, articles, copulas, pleasantries
2. Sigil only known entities (@), concepts (#), verbs (~), scalars ($)
3. Unknown tokens get NO sigil — consecutive bare tokens hyphenate into compounds
4. Abbreviate using the active tier (default L0)
5. Confidence markers: !! (certain) ! (high) ~ (moderate) * (low) ** (speculative)
6. Temporal: ^now ^T-Nd ^T+Nd ^A.t

## Programming Commands
>doc >impl >fix >test >rev >ref >opt >plan >dep >add >rm >up >mv >cfg >mig
>db >api >ci >sec >err >log >bench >lint >merge >explain

## Programming Queries
?how ?why ?best ?what ?diff ?when ?where ?can ?cmp ?alt ?err ?perf

## Structural Operations
@Type+.field  @Type-.field  @Type.x=$v  @Type.x:T  @Type:impl(@Trait)  @A<@B  +use(mod)  -use(mod)

## Response Compression (optional — add to system prompt for output savings)
Respond using AXON notation where applicable: sigils, operators, abbreviations. Strip filler. Be terse.

## Examples
"fix the bug in the auth service" -> >fix bug:auth-service
"explain database connection pooling" -> >explain db-conn-pooling
"what is the best way to cache" -> ?best cache
"The sun emits ultraviolet radiation" -> @sun ~emit #ultraviolet #radiation
"add a field email to User" -> @user+.email

VS Code Extension

Translate prompts to AXON notation directly inside VS Code — saving tokens on every AI interaction.

Install

Install from the VS Code Marketplace or the command line:

code --install-extension colwill.axon-notation

Or search for AXON Notation in the VS Code Extensions panel (Ctrl+Shift+X).

View on Marketplace

Features

Chat Participant @axon

Type @axon in VS Code's chat panel to route prompts through AXON. Your input is translated to compact notation, sent to the language model, and the response is returned as normal.

Sidebar Chat

A dedicated AXON chat panel in the activity bar. Open it with Ctrl+Shift+A (Cmd+Shift+A on macOS) or via the command palette.

Multiple Targets

Send translated output to Claude Code, GitHub Copilot, clipboard, or the built-in chat — choose your preferred workflow.

Commands

Command Description
AXON: Open ChatFocus the AXON sidebar panel
AXON: Translate to AXONTranslate text and choose where to send it
AXON: Translate and Send to Claude CodeTranslate and send directly to Claude Code
AXON: Translate and Send to Copilot ChatTranslate and send to GitHub Copilot
AXON: Translate and Copy to ClipboardTranslate and copy the result

How It Works

1
Type a natural language prompt
2
WASM engine translates to AXON
3
Compressed prompt sent to AI
4
Same response, fewer tokens

Translation Examples

Natural Language AXON
what is the best way to implement caching?best impl-caching
fix the bug in the auth service>fix bug:auth-service
add a field email to User@user+.email
CO2 emissions cause climate change which increases temperature@CO2-emission → #climate-change!! → Δ$temp↑
VS Code 1.93+
Minimum version
0 API keys
Runs locally via WASM

Claude Code Skill

Use AXON notation directly inside Claude Code with the /axon slash command. Translate between natural language and AXON without leaving your terminal.

Install from Skill Marketplace

Install the AXON skill directly from the Claude Code skill marketplace:

claude install-skill https://github.com/colwill/axon

Or install interactively from within Claude Code:

/install-skill https://github.com/colwill/axon

Once installed, the /axon slash command becomes available in all your Claude Code sessions.

Usage

Encode natural language

Pass any natural language text to get the AXON translation:

$
/axon fix the bug in the authentication service
→ >fix bug:auth-service

Decode AXON notation

Pass AXON notation to get a natural language explanation:

$
/axon @sun ~emit* #uv-radiation
→ The sun probably emits ultraviolet radiation.

Interactive mode

Use /axon without arguments to start an interactive session where Claude will translate and explain AXON expressions.

What the Skill Does

Auto-detect direction

Automatically detects whether you're providing natural language or AXON notation and translates in the appropriate direction.

Token savings

Shows the token count comparison between the original text and the AXON translation, so you can see the savings.

Annotated output

Each sigil and operator is annotated so you can learn the notation as you use it.

Claude Code
CLI or IDE extension
0 config
Works immediately after install

AXON + Caveman

AXON compresses the prompt. Caveman compresses the response. Together they achieve the best total token savings of any approach tested.

*

AXON and Caveman operate on different sides of the pipe. AXON rewrites your prompt into compact notation (input tokens), while Caveman instructs the model to respond concisely (output tokens). They don't overlap — they stack.

Inference Architecture

Where each approach operates in the LLM inference pipeline.

AXON DOMAIN prompt compression CAVEMAN DOMAIN response compression User Prompt natural language 105 tok AXON Translator sigils + abbrev -35% input 68 tok Tokenizer BPE encode LLM inference attention + generation Caveman System Prompt "be terse, no fluff" shapes output Detokenizer BPE decode Response concise output -47% output 2,038 1,071 tok Combined: -46.9% total cost

Total Token Cost

Prompt + response tokens across 10 technical questions, measured with BPE-aware estimation. Model: Claude Opus 4.6.

Configuration Prompt Response Total vs Baseline
Baseline 105 2,038 2,143
Terse ("Answer concisely.") 105 2,168 2,273 +6.1%
Compress 105 1,690 1,795 -16.2%
Caveman 105 1,071 1,176 -45.1%
AXON only 68 2,038 2,106 -1.7%
AXON + Caveman 68 1,071 1,139 -46.9%

Where Each Approach Saves

AXON prompt side
35.2%

avg input token savings

why does my React component re-render...
?why react-comp-re-rnd-parent-updates
13 → 8 tokens (38.5%)
Caveman response side
47.4%

avg output token savings

baseline: 2,038 response tokens
caveman: 1,071 response tokens
same quality, half the tokens

Per-Prompt Breakdown

AXON prompt savings vs Caveman response savings for each of the 10 benchmark questions.

Prompt AXON Caveman Compress
Average 35.2% 47.4% 17.1%

Response Text Compression

Running response text through the AXON translator shows additional compression potential on the output side.

Baseline responses
41.6%
2,038 → 1,191 tokens
Caveman responses
34.5%
1,071 → 702 tokens
Compress responses
40.2%
1,690 → 1,011 tokens

How to Combine Them

1. Compress the prompt with AXON

Use the AXON translator (VS Code extension, REPL, or WASM) to convert your natural language prompt to compact notation before sending it to the model.

Input:
explain database connection pooling
→ >explain db-conn-pooling

2. Compress the response with Caveman

Add the Caveman system prompt to instruct the model to respond concisely. The model understands AXON input and responds in terse, information-dense style.

System prompt:
respond like caveman. short sentence. no fluff.

3. Result: 46.9% total savings

AXON handles the input, Caveman handles the output. No overlap, no quality loss, nearly half the token cost.

Methodology

Data sourced from the Caveman evaluation suite. 10 technical programming questions tested across 6 arms (baseline, terse, caveman, caveman-cn, caveman-es, compress) using Claude Opus 4.6. Token counts estimated using BPE-aware heuristics (whitespace-split + non-alphanumeric character counting). AXON translations performed by the Rust rule-based engine at L0 abbreviation level. Full test suite available in the AXON repository at lib/tests/caveman_comparison.rs.

AXON REPL

An interactive command-line translator for AXON notation. Convert between natural language and AXON in your terminal.

Install

Clone the repository and build with Cargo:

git clone https://github.com/colwill/axon.git
cd axon && cargo build --release -p axon-repl

The binary will be at target/release/axon-repl. You can copy it to a directory on your PATH:

cp target/release/axon-repl ~/.local/bin/

Usage

Interactive Mode

Launch the REPL and type expressions interactively:

$
axon-repl
axon>
fix the bug in the auth service
→ >fix bug:auth-service
axon>
what is the best way to cache data
→ ?best cache-data

Pipe Mode

Pipe text into the REPL for batch translation or scripting:

echo "add a field email to User" | axon-repl

JSON Output

Get structured JSON output for integration with other tools:

echo "fix the login bug" | axon-repl --json

Features

Bidirectional
Translate text to AXON and AXON back to text
Offline
Uses a local lookup table — no network needed
Pipeable
Works with stdin/stdout for scripting and agents

Requirements

Dependency Version
Rust2021 edition or later
CargoIncluded with Rust
Copied to clipboard