CEF — Compact Embedding Format v0.1.0

Status: Draft Last Updated: 2026-02-17

1. Introduction

CEF (Compact Embedding Format) defines a standard method for compressing semantic embedding vectors so they can be transported in HTTP headers. This enables AI agents to evaluate content relevance before downloading the full page content.

2. Problem Statement

A typical embedding vector (512 dimensions, float32) occupies 2,048 bytes — too large for practical HTTP header transport. CEF reduces this to approximately 470 bytes while preserving >98% of semantic similarity accuracy.

3. Encoding Pipeline

float32 vector

(2,048 bytes) to int8 gzip base64url

(512 bytes) (~350 B) (~470 B)

Quantize

Compress

Encode

3.1 Step 1: Quantization (float32 → int8)

Convert each float32 value to an 8-bit signed integer using min-max scaling:

For each dimension i:
  int8_value[i] = round((float32_value[i] - min) / (max - min) * 255) - 128

Where:
  min = minimum value across all dimensions
  max = maximum value across all dimensions

The min and max values MUST be included in the compressed payload as a 8-byte prefix (two float32 values) to enable decoding.

Size reduction: 75% (2,048 → 512 bytes + 8 bytes metadata = 520 bytes)

Precision loss: Cosine similarity correlation >0.98 with original float32 vectors.

3.2 Step 2: Compression (gzip)

Apply gzip compression (RFC 1952) to the quantized byte array (including the min/max prefix):

Input:  520 bytes (8 bytes min/max + 512 bytes int8 vector)
Output: ~350 bytes (variable, depends on vector content)

Compression level SHOULD be 6 (default gzip level) for optimal balance of speed and size.

3.3 Step 3: Encoding (base64url)

Encode the compressed bytes using base64url (RFC 4648 Section 5) — URL-safe base64 without padding:

Input:  ~350 bytes (gzip output)
Output: ~470 characters (base64url string)

base64url is used instead of standard base64 because:

No + or / characters (HTTP-header safe)
No = padding required
Safe for use in HTTP headers without escaping

4. Header Format

The CEF-encoded embedding is transported in the X-Mako-Embedding HTTP header:

X-Mako-Embedding: H4sIAAAAAAAAA2NgGAWjYBSMglEwCkYBNQEAN8zuSAAQAAA
X-Mako-Embedding-Model: mako-cef-v1
X-Mako-Embedding-Dim: 512

4.1 Required Companion Headers

When X-Mako-Embedding is present, the following headers MUST also be present:

Header	Type	Description
`X-Mako-Embedding-Model`	string	Identifier of the embedding model used
`X-Mako-Embedding-Dim`	integer	Number of dimensions in the original vector

4.2 Size Constraints

Dimensions	Quantized	Compressed	Encoded	Header safe?
256	264 B	~180 B	~240 chars	Yes
384	392 B	~270 B	~360 chars	Yes
512	520 B	~350 B	~470 chars	Yes
768	776 B	~520 B	~695 chars	Yes
1024	1,032 B	~690 B	~920 chars	Yes
1536	1,544 B	~1,030 B	~1,375 chars	Marginal

Recommendation: Use 512 dimensions for optimal balance of semantic quality and header size.

5. Decoding Pipeline

base64url

string to bytes gunzip int8→float32

(~470 chars) (~350 B) (520 B) (2,048 B)

Decode

Decompress

Dequantize

5.1 Dequantization

Read min, max from first 8 bytes (two float32 values)
For each dimension i:
  float32_value[i] = (int8_value[i] + 128) / 255 * (max - min) + min

6. Embedding Model Requirements

6.1 Standard Models

MAKO does not mandate a specific embedding model. However, for interoperability, the following model identifiers are recognized:

Model ID	Base Model	Dimensions	Description
`mako-cef-v1`	TBD (open source)	512	Default MAKO embedding model

6.2 Custom Models

Servers MAY use custom embedding models. The model identifier MUST be included in the X-Mako-Embedding-Model header. Agents that do not recognize the model identifier MAY:

Skip the embedding and download the full content
Attempt cosine similarity anyway (embeddings from different models may still correlate)
Use a model mapping service to translate between embedding spaces

6.3 Model Selection Criteria

An embedding model suitable for CEF SHOULD:

Support 512 dimensions or fewer
Be publicly available (open source or open API)
Produce normalized vectors (unit length)
Perform well on semantic similarity benchmarks (STS, MTEB)

7. Similarity Computation

Agents compute relevance using cosine similarity between their query embedding and the page's CEF embedding:

similarity = dot(query_vector, page_vector) / (norm(query_vector) * norm(page_vector))

7.1 Recommended Thresholds

Similarity	Interpretation	Agent Action
> 0.85	Highly relevant	Download MAKO content immediately
0.70 - 0.85	Potentially relevant	Download if within token budget
0.50 - 0.70	Marginally relevant	Skip unless no better results
< 0.50	Not relevant	Skip entirely

These thresholds are advisory. Agents MAY adjust based on their specific requirements.

8. Security Considerations

8.1 Privacy

CEF embeddings represent semantic content, not source text — the original text cannot be reconstructed from an embedding
Embedding vectors SHOULD be generated from the MAKO content only, not from private data
Servers MUST NOT embed user-specific information in CEF vectors

8.2 Trust Model

CEF embeddings are publisher-declared and MUST be treated as untrusted by consumers. A publisher can serve an embedding optimized for popular queries while the actual MAKO content is unrelated or low-quality.

Consumers SHOULD:

Use CEF embeddings only for approximate pre-filtering (e.g., deciding whether to download the full MAKO response via a HEAD request)
Generate their own embeddings from the actual MAKO body for final ranking and similarity decisions
Never rely solely on publisher-provided embeddings for content quality assessment

See spec.md Section 10.2 for the full security model.

9. Pseudocode Reference

Encoding

import gzip
import base64
import struct
import numpy as np

def cef_encode(vector: np.ndarray) -> str:
    """Encode a float32 embedding vector to CEF format."""
    # Step 1: Quantize
    v_min = float(vector.min())
    v_max = float(vector.max())
    normalized = (vector - v_min) / (v_max - v_min)
    quantized = np.round(normalized * 255).astype(np.int8) - 128

    # Prepend min/max as metadata
    metadata = struct.pack('ff', v_min, v_max)
    payload = metadata + quantized.tobytes()

    # Step 2: Compress
    compressed = gzip.compress(payload, compresslevel=6)

    # Step 3: Encode
    encoded = base64.urlsafe_b64encode(compressed).rstrip(b'=').decode('ascii')

    return encoded

def cef_decode(encoded: str, dim: int) -> np.ndarray:
    """Decode a CEF string back to a float32 embedding vector."""
    # Add padding if needed
    padding = 4 - len(encoded) % 4
    if padding != 4:
        encoded += '=' * padding

    # Step 1: Decode
    compressed = base64.urlsafe_b64decode(encoded)

    # Step 2: Decompress
    payload = gzip.decompress(compressed)

    # Step 3: Dequantize
    v_min, v_max = struct.unpack('ff', payload[:8])
    quantized = np.frombuffer(payload[8:], dtype=np.int8)
    vector = (quantized.astype(np.float32) + 128) / 255 * (v_max - v_min) + v_min

    return vector

JavaScript

async function cefEncode(vector) {
  // Step 1: Quantize
  const min = Math.min(...vector);
  const max = Math.max(...vector);
  const range = max - min;

  const metadata = new Float32Array([min, max]);
  const quantized = new Int8Array(vector.length);
  for (let i = 0; i < vector.length; i++) {
    quantized[i] = Math.round(((vector[i] - min) / range) * 255) - 128;
  }

  // Combine metadata + quantized
  const payload = new Uint8Array(8 + quantized.length);
  payload.set(new Uint8Array(metadata.buffer), 0);
  payload.set(new Uint8Array(quantized.buffer), 8);

  // Step 2: Compress (using CompressionStream API)
  const stream = new Blob([payload]).stream().pipeThrough(new CompressionStream('gzip'));
  const compressed = await new Response(stream).arrayBuffer();

  // Step 3: Encode (base64url)
  const base64 = btoa(String.fromCharCode(...new Uint8Array(compressed)));
  return base64.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
}