Skip to main content
Synthra’s context compression engine intelligently manages conversation context windows. Automatically triggered when approaching token limits.

Compression Strategies

FIFO (First In, First Out)

Removes oldest messages. Simple and fast (under 10ms).
agent.setContextConfig({
  retentionPolicy: 'fifo',
  maxTokens: 4096
});

Priority-Based

Removes lowest priority messages. Ideal for complex conversations (under 30ms).
agent.setContextConfig({
  retentionPolicy: 'priority',
  maxTokens: 8192
});

Semantic Retention

Removes redundant content using embeddings. Best for long-form content (under 50ms).
agent.setContextConfig({
  retentionPolicy: 'semantic',
  maxTokens: 16384
});

Token Limits

TierContext WindowCompression Threshold
Free4,096 tokens80% (3,277 tokens)
Pro8,192 tokens80% (6,554 tokens)
Enterprise32,768 tokens85% (27,853 tokens)

Manual Compression

await agent.compressContext(session.id);

const stats = await agent.getCompressionStats(session.id);
console.log(`Removed: ${stats.messagesRemoved}`);
console.log(`Saved: ${stats.tokensSaved} tokens`);

Best Practices

ScenarioStrategy
Customer supportFIFO
Multi-turn conversationsPriority
Content generationSemantic
Compression is non-destructive when archiving is enabled. Removed messages can be retrieved from the archive.
Semantic compression adds 20-30ms latency. Use FIFO or priority for latency-sensitive applications.