Synthra’s context compression engine intelligently manages conversation context windows. Automatically triggered when approaching token limits.
Compression Strategies
FIFO (First In, First Out)
Removes oldest messages. Simple and fast (under 10ms).
agent.setContextConfig({
retentionPolicy: 'fifo',
maxTokens: 4096
});
Priority-Based
Removes lowest priority messages. Ideal for complex conversations (under 30ms).
agent.setContextConfig({
retentionPolicy: 'priority',
maxTokens: 8192
});
Semantic Retention
Removes redundant content using embeddings. Best for long-form content (under 50ms).
agent.setContextConfig({
retentionPolicy: 'semantic',
maxTokens: 16384
});
Token Limits
| Tier | Context Window | Compression Threshold |
|---|
| Free | 4,096 tokens | 80% (3,277 tokens) |
| Pro | 8,192 tokens | 80% (6,554 tokens) |
| Enterprise | 32,768 tokens | 85% (27,853 tokens) |
Manual Compression
await agent.compressContext(session.id);
const stats = await agent.getCompressionStats(session.id);
console.log(`Removed: ${stats.messagesRemoved}`);
console.log(`Saved: ${stats.tokensSaved} tokens`);
Best Practices
| Scenario | Strategy |
|---|
| Customer support | FIFO |
| Multi-turn conversations | Priority |
| Content generation | Semantic |
Compression is non-destructive when archiving is enabled. Removed messages can be retrieved from the archive.
Semantic compression adds 20-30ms latency. Use FIFO or priority for latency-sensitive applications.