Skip to main content
Welcome to the Honcho changelog! This section documents all notable changes to the Honcho API and SDKs.
Each release is documented with:
  • Added: New features and capabilities
  • Changed: Modifications to existing functionality
  • Deprecated: Features that will be removed in future versions
  • Removed: Features that have been removed
  • Fixed: Bug fixes and corrections
  • Security: Security-related improvements

Version Format

Honcho follows Semantic Versioning:
  • MAJOR version for incompatible API changes
  • MINOR version for backwards-compatible functionality additions
  • PATCH version for backwards-compatible bug fixes

Honcho API and SDK Changelogs

v3.0.10 (Current)

Added

  • Messages are now embedded via a background task rather than blocking API request
  • Read-only DB session mode (get_read_db / tracked_db(..., read_only=True)) so reads don’t hold a transaction open across the work
  • CORS_ORIGINS env var to configure CORS allowed origins without editing source; defaults match the prior hardcoded list, so self-hosted deployments behind custom domains can whitelist their frontend (#697)
  • scripts/generate_jwt.py — utility for minting scoped or admin Honcho JWTs (--admin, --workspace/--peer/--session, --expires with human-friendly durations, --print-only) without calling the keys API (#757)
  • STALE_WORK_UNIT_CLEANUP_INTERVAL_SECONDS (default 60s) — minimum jittered spacing between deriver stale-work-unit cleanup runs, so cleanup no longer runs on every seconds-scale poll (0.0 keeps the legacy every-poll behavior) (#773)

Changed

  • Optimized the deriver and dreamer prompt cache prefixes to improve prompt-cache hit rates (#806)

Fixed

  • times_derived is now properly reinforced when a duplicate conclusion is detected. It had been pinned at 1 for nearly every conclusion (the reject-new branch dropped the increment and the new-wins branch reset the count to 1), so ORDER BY times_derived DESC fell back to arbitrary heap order and froze stale conclusions to the front of injected context. Reinforcement is now an atomic increment and both most-derived queries gained a created_at DESC recency tiebreaker (#768)
  • Webhook creation now correctly rejects private/internal IP addresses (#793)
v3.0.9

Changed

  • Connection acquisition is now a single attempt with no server-side retry, on a vanilla AsyncSession. A new DB_CONNECT_TIMEOUT_SECONDS (default 2s) bounds the attempt so a saturated or unreachable pooler fails fast instead of holding a client connection open to re-knock. A saturated DB now surfaces to the caller — the API returns an error and the deriver backs off and retries on a later poll — which lets the pooler drain rather than amplifying saturation.

Added

  • Deriver poll jitter so instances that start together don’t poll in lockstep: DERIVER_POLLING_STARTUP_JITTER_SECONDS (random delay before the first poll, default 30s) and DERIVER_POLLING_JITTER_RATIO (±fraction applied to every poll sleep, default 0.5). Both disable at 0.0; the underlying backoff schedule is unchanged.

Removed

  • Reverted the connection-checkout retry and HonchoAsyncSession custom session introduced in 3.0.8. Removed the DB_CONNECTION_RETRY_ENABLED / DB_CONNECTION_RETRY_MAX_DELAY_SECONDS / DB_CONNECTION_RETRY_BACKOFF_INITIAL_SECONDS / DB_CONNECTION_RETRY_BACKOFF_MAX_SECONDS settings, the db_connection_acquisitions{outcome=...} Prometheus counter, and the db.pool.acquire Sentry span. Alerting built on db_connection_acquisitions should migrate to db_pool_connections / db_queries_in_flight.
v3.0.8

Added

  • Connection-checkout retry with bounded exponential backoff (tenacity) on get_db/tracked_db: transient transaction-pooler (Supavisor) rejections — SQLAlchemy TimeoutError and OperationalError — now retry with backoff instead of surfacing as 500s under client-connection saturation. Gated by DB_CONNECTION_RETRY_ENABLED with configurable delay/backoff knobs; ~10s default budget (#758)
  • HonchoAsyncSession — a lazy AsyncSession that checks out its pooled connection (with retry) on the first DB-touching call rather than at construction. Request handlers doing non-DB work (embedding, file, LLM) before their first query no longer pin a pooler connection across it. Only the checkout is retried; the statement still runs exactly once, so writes are never duplicated (#758)
  • Adaptive deriver queue polling: the poll interval backs off when the queue is idle or erroring (base → max, doubling each cycle) and snaps back to base the moment work is claimed, cutting steady-state query load against the DB. Gated by DERIVER_POLLING_BACKOFF_ENABLED with configurable max/multiplier (#758)
  • New Prometheus db_pool_connections gauge (checked_out / checked_in / size / overflow), labeled api|deriver, registered in both the API lifespan and the deriver metrics server (#758)
  • New Prometheus db_connection_acquisitions{outcome=ok|retried|exhausted} counter — the alertable early-warning signal that connection checkouts are retrying through pooler rejection, before requests start failing (#758)
  • New Prometheus db_queries_in_flight gauge — statements actually executing on the wire (via SQLAlchemy cursor-execute events). Paired with checked_out, the gap reveals connections held but parked (the “idle in transaction during an external call” antipattern). Gated on METRICS.ENABLED for zero overhead when off (#758)
  • Explicit SqlalchemyIntegration in both the API and deriver Sentry inits; connection acquisition wrapped in a db.pool.acquire span with live pool stats captured on retry exhaustion (#758)

Changed

  • Default POOL_TIMEOUT lowered to 5s, with validation that it stays under the connection-retry budget when a pooled (non-null) POOL_CLASS is configured; config.toml.example and the v2/v3 configuration docs updated to match (#758)
  • HonchoAsyncSession wraps every DB-touching session method (execute / scalar / scalars / flush / merge / refresh / commit / get / get_one / stream / stream_scalars / delete) so the lazy-checkout-with-retry guarantee has no holes; the acquired flag resets on close()/reset() so a reused session re-acquires on next use (#758)

Fixed

  • Roll the session back on a retryable checkout failure before retrying — a failed autobegin could otherwise leave it pending-rollback, making the next connection attempt raise instead of cleanly re-checking-out (#758)
  • Guard DBPoolCollector.collect() so a pool-read/import hiccup can’t raise and abort the entire /metrics scrape (Prometheus drops all metrics if any collector raises) (#758)
  • Clamp the pool overflow gauge to ≥ 0 (it could report negative before the pool fills) (#758)
  • Removed a double-sleep in the deriver idle poll so the backoff cap is a true cap rather than 2× (#758)
v3.0.7

Added

  • New src/llm/ package as the single owner of provider runtime: clients, backends, history adapters, tool loop, request builder, credentials, and caching policy (#459)
  • New cloudevent LLMCallCompletedEvent (llm.call.completed) fires once per provider hit with full cost-attribution context: transport/provider_label, model, token counts with cache breakdown, finish_reason, outcome, retry/fallback state, duration, tool-call shape, streaming flag, and agent correlation (run_id + iteration) (#637)
  • RepresentationCompletedEvent now carries total_input_tokens for full-trace cost attribution; per-emitter honcho_version injection; deterministic per-run_id high-volume sampler via TelemetrySettings.HIGH_VOLUME_SAMPLE_RATE (#637)
  • Deriver custom instructions: per-workspace/peer guidance threaded into the deriver prompt with a MAX_CUSTOM_INSTRUCTIONS_TOKENS budget (default 2000); deriver MAX_INPUT_TOKENS raised 23000 → 25000 (#609)
  • Configurable embedding dimensions: EMBEDDING_MODEL_CONFIG__DIMENSIONS_MODE (auto/always/never) controls whether OpenAI dimensions= is forwarded (#678)
  • New honcho-cli package — Python CLI for inspecting and managing peers, sessions, and configuration against a Honcho deployment (#424)
  • HONCHO_API_URL env var support in the MCP Worker for self-hosted deployments (#575)
  • API ID max_length increased from 100 to 512 across WorkspaceCreate, PeerCreate, and SessionCreate to align with the DB schema (#684)
  • AttemptPlan dataclass pins per-retry provider selection across stream-final retries so streaming doesn’t bounce back to primary after the tool loop has settled on fallback (#459)
  • Gemini JSON-schema sanitizer for function_declarations — strips keywords Gemini’s validator rejects while preserving semantics for other backends (#459)

Changed

  • All LLM orchestration moved out of src/utils/clients.py into src/llm/ with modules split by responsibility (#459)
  • Default ModelConfig factories (deriver, summary, dreamer specialists, dialectic levels) normalized with no extra parameters set by default; operators add transport/thinking overrides explicitly (#459)
  • OpenAI reasoning-model routing widened to cover gpt-5.x and o1/o3/o4 — these models receive max_completion_tokens instead of max_tokens (#459)
  • Peer card prompts reframed as stable identity markers; induction specialist now opts out of peer card writes so only deduction touches the card (#686)
  • Vector store queries no longer fetch embedding vectors — only document metadata is returned, reducing payload size and DB load (pgvector, lancedb, turbopuffer) (#682)
  • Langfuse trace metadata now includes namespace, model, and provider so traces can be filtered by deployment slice (#565)
  • Deriver: model-aware tokenizer (replaces the previously hardcoded encoding) and explicit guard on empty message content (#647)
  • Dialectic level defaults now merge correctly with per-level overrides (#656)
  • Default dialectic tool choice switched to auto (#630)
  • Vector sync given a substantial retry budget to tolerate transient embedding provider outages (#604)
  • AgentToolConclusionsDeletedEvent payload now carries levels (#612)
  • Turbopuffer: InternalServerError caught and surfaced as a warning rather than a hard failure; vector store sync errors downgraded to warnings (#561)

Fixed

  • reverse query parameter is now honored on the v3 workspace list, peer list, workspace-scoped session list, and peer-scoped session list. Honcho SDKs at 2.1.0+ were already sending reverse=true for these routes but the server silently ignored it. Ties on created_at now fall back to the internal nanoid id for stable ordering across pages (#685)
  • LLM client factories now receive base_url from LLMSettings for default providers — operators pointing at OpenAI-compatible proxies via LLM__OPENAI_BASE_URL were previously ignored on the default path (#643, fixes #641)
  • Internal N+1 query in dialectic agent tool execution — collapsed per-iteration DB lookups into a single fetch (#652)
  • Dreamer threshold and time-guard semantics: count filter now includes only documents.level == 'explicit' (was inflating threshold via dreamer-created levels and creating a feedback loop); last_dream_at write relocated from enqueue to process so duplicate enqueues or failed runs no longer reset the 8-hour time guard (#573)
  • Deriver: blank observations are filtered out before embedding (previously triggered noisy embedding calls and persisted empty rows) (#615)
  • Surprisal module: filter format corrected from {"level": levels} to {"level": {"in": levels}} — the prior call silently returned 0 results and made the entire Surprisal phase of the Dream cycle a no-op (#581, fixes #559)
  • Removed hardcoded stop_sequences override from Deriver ModelConfig (was clobbering operator-configured stop sequences) (#587)
  • Embedding client: embed() now wraps single-string input in an array, restoring compatibility with OpenAI-compatible third-party providers that reject scalar input (#586)
  • Docker Compose: deriver service startup gated on the API service healthcheck — prevents races where the deriver starts before the API has run migrations (#689)
  • Docker image: HEALTHCHECK directive removed from the shared base image; service-level health checks now belong in each service’s own configuration (#530)
  • Removed strict parameter validation for thinking params on Anthropic and OpenAI transports — was rejecting valid per-transport configs (#686)
  • Stream-final retries pin to the AttemptPlan that succeeded rather than re-running provider selection through the outer current_attempt ContextVar (#459)
  • Gemini cached_content reuse keys now include system_instruction and tool_config so cache hits don’t cross configurations (#459)
  • CrewAI example updated for the latest CrewAI protocol (#631)

Removed

  • src/utils/clients.py deleted; its responsibilities are split across src/llm/registry.py, src/llm/credentials.py, and the backend-specific modules (#459)
  • HEALTHCHECK directive from the shared Docker image (#530)
v3.0.6

Changed

  • Tightened transaction scopes across search, agent tools, queue manager, and webhook delivery to minimize DB connection hold time during external operations (#525)
  • Search operations refactored to two-phase pattern — external work (embeddings, LLM calls) completes before opening a transaction (#525)
  • Agent tool executor performs external operations before acquiring DB sessions (#525)
  • Queue manager transaction scope reduced to only the critical section (#525)
  • Webhook delivery no longer holds a DB session parameter (#525)

Fixed

  • Session leakage in non-session-scoped dialectic chat calls (#526)

Added

  • Health check endpoint (/health) for container orchestration and load balancer probes (#510)
v3.0.5

Fixed

  • explicit rollback on all transactions to force connection closed
v3.0.4

Added

  • JSONB metadata validation enforces 100 key limit and max depth of 5 (#419)

Changed

  • Schemas refactored from single schemas.py into schemas/api.py, schemas/configuration.py, and schemas/internal.py with backwards-compatible re-exports (#419)

Fixed

  • Missing deleted_at filter on RepresentationManager._query_documents_recent() and ._query_documents_most_derived() allowed soft-deleted documents to leak into the deriver’s working representation (#456)
  • CleanupStaleItemsCompletedEvent emitted spuriously when no queue item was actually deleted (#454)
  • Empty JSON file uploads caused unhandled errors; now returns normalized error responses (#434)
  • Memory leak: _observation_locks switched to WeakValueDictionary to prevent unbounded growth (#419)
  • SQL injection in dependencies.py: parameterized set_config calls to prevent injection via request context (#419)
  • NUL byte crashes: string inputs (message content, queries, peer cards) now stripped at schema level (#419)
  • Filter recursion depth capped at 5 to prevent stack overflow (#419)
  • Dedup-skipped observations now correctly reflected in created counts (#477)
  • External vector store support for message search — routes queries through configured external vector store with oversampling and deduplication to handle chunked embeddings (#479)
  • Dialectic agent no longer holds a DB connection during LLM calls — embeddings are pre-computed before tool execution, DB sessions isolated in extract_preferences, query_documents no longer accepts a DB session parameter (#477)
v3.0.3

Added

  • Consolidated session context into a single DB session with 40/60 token budget allocation between summary and messages
  • Observation validation via ObservationInput Pydantic schema with partial-success support and batch embedding with per-observation fallback
  • Peer card hard cap of 40 facts with case-insensitive deduplication and whitespace normalization
  • Safe integer coercion (_safe_int) for all LLM tool inputs to handle non-integer values like "Infinity"
  • Embedding pre-computation and reuse across multiple search calls in dialectic and representation flows
  • Peer existence validation in dialectic chat endpoints — raises ResourceNotFoundException instead of silently failing
  • Logging filter to suppress noisy GET /metrics access logs
  • Oolong long-context aggregation benchmark (synth and real variants, 1K–4M token context windows)
  • MolecularBench fact quality evaluation (ambiguity, decontextuality, minimality scoring)
  • CoverageBench information recall evaluation (gold fact extraction, coverage matching, QA verification)
  • LoCoMo summary-as-context baseline evaluation
  • Webhook delivery tests, dependency lifecycle tests, queue cleanup tests, summarizer fallback tests
  • Parallel test execution via pytest-xdist with worker-specific databases
  • test_reasoning_levels.py script for LOCOM dataset testing across reasoning levels

Changed

  • Workspace deletion is now async — returns 202 Accepted, validates no active sessions (409 Conflict), cascade-deletes in background
  • Redis caching layer now stores plain-dict instead of ORM objects, with v2-prefixed keys, storage, resilient safe_cache_set/safe_cache_delete helpers, and deferred post-commit cache invalidation
  • All get_or_create_* CRUD operations now use savepoints (db.begin_nested()) instead of commit/rollback for race condition prevention
  • Reconciler vector sync uses direct ORM mutation instead of batch parameterized UPDATE statements
  • Summarizer enforces hard word limit in prompt and creates fallback text for empty summaries with summary_tokens = 0
  • Blocked Gemini responses (SAFETY, RECITATION, PROHIBITED_CONTENT, BLOCKLIST) now raise LLMError to trigger retry/backup-provider logic
  • Gemini client explicitly sets max_output_tokens from max_tokens parameter
  • All deriver and metrics collector logging replaced with structured logging.getLogger(__name__) calls
  • Dreamer specialist prompts updated to enforce durable-facts-only peer cards with max 40 entries and deduplication
  • GetOrCreateResult changed from NamedTuple to dataclass with async post_commit() method
  • FastAPI upgraded from 0.111.0 to 0.131.0; added pyarrow dependency
  • Queue status filtering to only show user-facing tasks (representation, summary, dream); excludes internal infrastructure tasks

Fixed

  • JWT timestamp bug — JWTParams.t was evaluated once at class definition time instead of per-instance
  • Session cache invalidation on deletion was missing
  • get_peer_card() now properly propagates ResourceNotFoundException instead of swallowing it
  • set_peer_card() ensures peer exists via get_or_create_peers() before updating
  • Backup provider failover with proper tool input type safety
  • Removed setup_admin_jwt() from server startup
  • Sentry coroutine detection switched from asyncio.iscoroutinefunction to inspect.iscoroutinefunction

Removed

  • explicit.py and obex.py benchmarks replaced by coverage.py and molecular.py
  • Claude Code review automation workflow (.github/workflows/claude.yml)
  • Coverage reporting from default pytest configuration
v3.0.2

Added

  • Documentation for reasoning_level and Claude Code plugin

Changed

  • Gave dreaming sub-agents better prompting around peer card creation, tweaked overall prompts

Fixed

  • Added message-search fallback for memory search tool, necessary in fresh sessions
  • Made FLUSH_ENABLED a config value
  • Removed N+1 query in search_messages
v3.0.1

Fixed

  • Token counting in Explicit Agent Loop
  • Backwards compatibility of queue items
v3.0.0

Added

  • Agentic Dreamer for intelligent memory consolidation using LLM agents
  • Agentic Dialectic for query answering using LLM agents with tool use
  • Reasoning levels configuration for dialectic (minimal, low, medium, high, max)
  • Prometheus token tracking for deriver and dialectic operations
  • n8n integration
  • Cloud Events for auditable telemetry
  • External Vector Store support for turbopuffer and lancedb with reconciliation flow

Changed

  • API route renaming for consistency
  • Dreamer and dialectic now respect peer card configuration settings
  • Observations renamed to Conclusions across API and SDKs
  • Deriver to buffer representation tasks to normalize workloads
  • Local Representation tasks to create singular QueueItems
  • getContext endpoint to use search_query rather than force last_user_message

Fixed

  • Dream scheduling bugs
  • Summary creation when start_message_id > end_message_id
  • Cashews upgrade to prevent NoScriptError
  • Memory leak in accumulate_metric call

Removed

  • Peer card configuration from message configuration; peer cards no longer created/updated in deriver process
v2.5.1

Fixed

  • Backwards compatibility for message_ids field in documents to handle legacy tuple format
v2.5.0

Added

  • Message level configurations
  • CRUD operations for observations
  • Comprehensive test cases for harness
  • Peer level get_context
  • Set Peer Card Method
  • Manual dreaming trigger endpoint

Changed

  • Configurations to support more flags for fine-grained control of the deriver, peer cards, summaries, etc.
  • Working Representations to support more fine-grained parameters

Fixed

  • File uploads to match MessageCreate structure
  • Cache invalidation strategy
v2.4.3

Added

  • Redis caching to improve DB IO
  • Backup LLM provider to avoid failures when a provider is down

Changed

  • QueueItems to use standardized columns
  • Improved Deduplication logic for Representation Tasks
  • More finegrained metrics for representation, summary, and peer card tasks
  • DB constraint to follow standard naming conventions
v2.4.2

Fixed

  • Langfuse tracing to have readable waterfalls
  • Alembic Migrations to match models.py
  • message_in_seq correctly included in webhook payload

Changed

  • Alembic to always use a session pooler
  • Statement timeout during alembic operations to 5 min
v2.4.1

Added

  • Alembic migration validation test suite

Fixed

  • Alembic migrations to batch changes
  • Batch message creation sequence number

Changed

  • Logging infrastructure to remove noisy messages
  • Sentry integration is centralized
v2.4.0

Added

  • Unified Representation class
  • vllm client support
  • Periodic queue cleanup logic
  • WIP Dreaming Feature
  • LongMemEval to Test Bench
  • Prometheus Client for better Metrics
  • Performance metrics instrumentation
  • Error reporting to deriver
  • Workspace Delete Method
  • Multi-db option in test harness

Changed

  • Working Representations are Queried on the fly rather than cached in metadata
  • EmbeddingStore to RepresentationFactory
  • Summary Response Model to use public_id of message for cutoff
  • Semantic across codebase to reference resources based on observer and observed
  • Prompts for Deriver & Dialectic to reference peer_id and add examples
  • Get Context route returns peer card and representation in addition to messages and summaries
  • Refactoring logger.info calls to logger.debug where applicable

Fixed

  • Gemini client to use async methods
v2.3.3

Changed

  • Deriver Rollup Queue processes interleaved messages for more context

Fixed

  • Dialectic Streaming to follow SSE conventions
  • Sentry tracing in the deriver
v2.3.2

Added

  • Get peer cards endpoint (GET /v2/peers/{peer_id}/card) for retrieving targeted peer context information

Changed

  • Replaced Mirascope dependency with small client implementation for better control
  • Optimized deriver performance by using joins on messages table instead of storing token count in queue payload
  • Database scope optimization for various operations
  • Batch representation task processing for ~10x speed improvement in practice

Fixed

  • Separated clean and claim work units in queue manager to prevent race conditions
  • Skip locked ActiveQueueSession rows on delete operations
  • Langfuse SDK integration updates for compatibility
  • Added configurable maximum message size to prevent token overflow in deriver
  • Various minor bugfixes
v2.3.1

Fixed

  • Added max message count to deriver in order to not overflow token limits
v2.3.0

Added

  • getSummaries endpoint to get all available summaries for a session directly
  • Peer Card feature to improve context for deriver and dialectic

Changed

  • Session Peer limit to be based on observers instead, renamed config value to SESSION_OBSERVERS_LIMIT
  • Messages can take a custom timestamp for the created_at field, defaulting to the current time
  • get_context endpoint returns detailed Summary object rather than just summary content
  • Working representations use a FIFO queue structure to maintain facts rather than a full rewrite
  • Optimized deriver enqueue by prefetching message sequence numbers (eliminates N+1 queries)

Fixed

  • Deriver uses get_context internally to prevent context window limit errors
  • Embedding store will truncate context when querying documents to prevent embedding token limit errors
  • Queue manager to schedule work based on available works rather than total number of workers
  • Queue manager to use atomic db transactions rather than long lived transaction for the worker lifecycle
  • Timestamp formats unified to ISO 8601 across the codebase
  • Internal get_context method’s cutoff value is exclusive now
v2.2.0

Added

  • Arbitrary filters now available on all search endpoints
  • Search combines full-text and semantic using reciprocal rank fusion
  • Webhook support (currently only supports queue_empty and test events, more to come)
  • Small test harness and custom test format for evaluating Honcho output quality
  • Added MCP server and documentation for it

Changed

  • Search has 10 results by default, max 100 results
  • Queue structure generalized to handle more event types
  • Summarizer now exhaustive by default and tuned for performance

Fixed

  • Resolve race condition for peers that leave a session while sending messages
  • Added explicit rollback to solve integrity error in queue
  • Re-introduced Sentry tracing to deriver
  • Better integrity logic in get_or_create API methods
v2.1.2

Fixed

  • Summarizer module to ignore empty summaries and pass appropriate one to get_context
  • Structured Outputs calls with OpenAI provider to pass strict=True to Pydantic Schema
v2.1.1

Added

  • Test harness for custom Honcho evaluations
  • Better support for session and peer aware dialectic queries
  • Langfuse settings
  • Added recent history to dialectic prompt, dynamic based on new context window size setting

Fixed

  • Summary queue logic
  • Formatting of logs
  • Filtering by session
  • Peer targeting in queries

Changed

  • Made query expansion in dialectic off by default
  • Overhauled logging
  • Refactor summarization for performance and code clarity
  • Refactor queue payloads for clarity
v2.1.0

Added

  • File uploads
  • Brand new “ROTE” deriver system
  • Updated dialectic system
  • Local working representations
  • Better logging for deriver/dialectic
  • Deriver Queue Status no longer has redundant data

Fixed

  • Document insertion
  • Session-scoped and peer-targeted dialectic queries work now
  • Minor bugs

Removed

  • Peer-level messages

Changed

  • Dialectic chat endpoint takes a single query
  • Rearranged configuration values (LLM, Deriver, Dialectic, History->Summary)
v2.0.5

Fixed

  • Groq API client to use the Async library
v2.0.4

Fixed

  • Migration/provision scripts did not have correct database connection arguments, causing timeouts
v2.0.3

Fixed

  • Bug that causes runtime error when Sentry flags are enabled
v2.0.2

Fixed

  • Database initialization was misconfigured and led to provision_db script failing: switch to consistent working configuration with transaction pooler
v2.0.1

Added

  • Ergonomic SDKs for Python and TypeScript (uses Stainless underneath)
  • Deriver Queue Status endpoint
  • Complex arbitrary filters on workspace/session/peer/message
  • Message embedding table for full semantic search

Changed

  • Overhauled documentation
  • BasedPyright typing for entire project
  • Resource filtering expanded to include logical operators

Fixed

  • Various bugs
  • Use new config arrangement everywhere
  • Remove hardcoded responses
v2.0.0

Added

  • Ability to get a peer’s working representation
  • Metadata to all data primitives (Workspaces, Peers, Sessions, Messages)
  • Internal metadata to store Honcho’s state no longer exposed in API
  • Batch message operations and enhanced message querying with token and message count limits
  • Search and summary functionalities scoped by workspace, peer, and session
  • Session context retrieval with summaries and token allocatio
  • HNSW Index for Documents Table
  • Centralized Configuration via Environment Variables or config.toml file

Changed

  • New architecture centered around the concept of a “peer” replaces the former “app”/“user”/“session” paradigm
  • Workspaces replace “apps” as top-level namespace
  • Peers replace “users”
  • Sessions no longer nested beneath peers and no longer limited to a single user-assistant model. A session exists independently of any one peer and peers can be added to and removed from sessions.
  • Dialectic API is now part of the Peer, not the Session
  • Dialectic API now allows queries to be scoped to a session or “targeted” to a fellow peer
  • Database schema migrated to adopt workspace/peer/session naming and structure
  • Authentication and JWT scopes updated to workspace/peer/session hierarchy
  • Queue processing now works on ‘work units’ instead of sessions
  • Message token counting updated with tiktoken integration and fallback heuristic
  • Queue and message processing updated to handle sender/target and task types for multi-peer scenarios

Fixed

  • Improved error handling and validation for batch message operations and metadata
  • Database Sessions to be more atomic to reduce idle in transaction time

Removed

  • Metamessages removed in favor of metadata
  • Collections and Documents no longer exposed in the API, solely internal
  • Obsolete tests for apps, users, collections, documents, and metamessages

v1.1.0

Added

  • Normalize resources to remove joins and increase query performance
  • Query tracing for debugging

Changed

  • /list endpoints to not require a request body
  • metamessage_type to label with backwards compatibility
  • Database Provisioning to rely on alembic
  • Database Session Manager to explicitly rollback transactions before closing the connection

Fixed

  • Alembic Migrations to include initial database migrations
  • Sentry Middleware to not report Honcho Exceptions
v1.0.0

Added

  • JWT based API authentication
  • Configurable logging
  • Consolidated LLM Inference via ModelClient class
  • Dynamic logging configurable via environment variables

Changed

  • Deriver & Dialectic API to use Hybrid Memory Architecture
  • Metamessages are not strictly tied to a message
  • Database provisioning is a separate script instead of happening on startup
  • Consolidated session/chat and session/chat/stream endpoints

Previous Releases

For a complete history of all releases, see our GitHub Releases page.

Getting Help

If you encounter issues using the Honcho API or its SDKs:
  1. Open an issue on GitHub
  2. Join our Discord community for support