Made Open

The Event-Driven Spine

The event bus and job queue are the permanent, non-negotiable backbone of the platform. All services hang off this spine. It is the first thing built and the last thing changed.

Event Bus: NATS JetStream

NATS JetStream is the central nervous system. Every significant state change in the platform is published as an immutable event.

Why NATS JetStream:

  • Single binary, trivial to deploy and operate
  • Built-in persistence with configurable retention and replay
  • Supports both pub/sub (fan-out) and work queues (competing consumers) natively
  • Scales to Kafka-level throughput when needed
  • Can be swapped for Kafka without changing event contracts

Event Structure

Every event shares this envelope:

{
  "eventId": "uuid-v4",
  "eventType": "domain.NounVerb",
  "eventVersion": "1.0",
  "timestamp": "2026-01-01T12:00:00.000Z",
  "source": "service-or-plugin-id",
  "correlationId": "uuid-v4",
  "data": {}
}

eventType follows domain.NounVerb convention. correlationId traces a chain of related events (e.g., one inbound call → multiple downstream events all share the same correlationId).

Event Domains

DomainEmitted byExamples
connector.*Connector plugins, Connector Serviceconnector.DataReceived, connector.SyncCompleted
communication.*Communication Service, Channel pluginscommunication.MessageReceived, communication.CallStarted
data.*Data Service, ETL Servicedata.EntityCreated, data.EntityUpdated
rule.*Rules Servicerule.ConditionMatched, rule.ActionTriggered
job.*Job Queuejob.Enqueued, job.Completed, job.Failed
ai.*AI Serviceai.QueryReceived, ai.ResponseGenerated
system.*All servicessystem.ServiceStarted, system.PluginLoaded, system.HealthCheckFailed

NATS Streams

NATS JetStream organizes events into named streams with configurable retention:

StreamSubjectsRetention
platform-eventsAll *.* events30 days (configurable)
connector-dataconnector.*7 days
audit-logAll events (audit copy)1 year

Job Queue: pg-boss (BullMQ aspirational)

A dedicated Job Service translates events into durable, queued jobs. This separates event acknowledgment from work execution. Services can acknowledge an event immediately and process it asynchronously via the job queue.

pg-boss

pg-boss (pg-boss ^10.0.0, confirmed in apps/hub/package.json) is a PostgreSQL-backed job queue. It runs entirely on the Supabase PostgreSQL instance — no additional infrastructure required.

Why pg-boss:

  • Zero additional infrastructure (uses Supabase PG already required)
  • ACID guarantees — jobs are transactional
  • Supports priority-based scheduling per job via boss.send(..., { priority })
  • Good enough for thousands of jobs/day

BullMQ on Redis (aspirational)

Status (2026-04-12 audit): BullMQ is not a direct dependency of apps/hub. Only ioredis ^5.9.3 is present, and the redis:7-alpine service in docker-compose.yml is used for caching/rate-limiting, not a BullMQ job queue. The section below describes the intended graduation path when pg-boss throughput becomes insufficient.

When throughput demands it, the plan is to graduate to BullMQ backed by Redis. The job API is intended to stay the same; only the driver changes.

Queue Naming (current reality)

Status (2026-04-12 audit): The platform does not currently expose three global priority queues. Today each service registers its own domain-specific pg-boss queue name — for example ConnectorService uses connector.sync (see apps/hub/src/services/connectors/ConnectorService.ts) and passes a numeric priority option on boss.send. The realtime / interactive / background tier names below are the target taxonomy, not the names used in code today.

Tier (target)Target LatencyUse Cases
realtime< 100msInbound call routing, instant message delivery, live UI updates
interactive< 5sOutbound messages, AI queries, dashboard requests
backgroundMinutes–hoursData sync, search indexing, ETL, cleanup, report generation

Job Guarantees

  • At-least-once delivery — every job runs at least once
  • Idempotent handlers — every handler checks an idempotency key (typically the eventId that triggered the job) before executing
  • Exponential backoff retries — failed jobs retry with increasing delays
  • Dead-letter queue — after max retries, jobs land in DLQ for manual inspection
  • Scheduled jobs — cron-style scheduling for periodic sync, SLA timeouts, reminders

Common Job Types

JobQueueTriggered by
ProcessIncomingDataJobbackgroundconnector.DataReceived
SyncConnectorDataJobbackgroundCron / manual
IndexDocumentJobbackgrounddata.EntityCreated, data.EntityUpdated
RunRuleEngineJobrealtimecommunication.*, connector.LocationUpdated
SendOutgoingMessageJobinteractiveRule action, user action
PlaceCallJobrealtimeRule action, user action
SendFaxJobinteractiveRule action
ExecuteLLMQueryJobinteractiveai.QueryReceived
WriteEntityJobbackgroundETL output

Observability

Every event published includes a correlationId. Job payloads include the triggering eventId. This enables full trace reconstruction:

incoming_call (communication.CallStarted, correlationId=X)
  └─► RunRuleEngineJob (correlationId=X)
        └─► rule.ActionTriggered (correlationId=X)
              └─► SendOutgoingMessageJob (correlationId=X)
                    └─► communication.MessageSent (correlationId=X)

OpenTelemetry trace context is carried in the event envelope and propagated through job payloads.