AI Layer

The AI layer is the platform's sovereign intelligence — a personal AI agent with privileged access to all user data, constrained entirely by user-configured consent rules. It never operates outside the user's explicit permissions, every LLM call is audited, and write actions require human confirmation.

The data model, event system, embeddings pipeline, and pgvector are built from the start — so when the AI agent ships, it has a full indexed history to work with.

Four Components

┌─────────────────────────────────────────────────────────────┐
│                      AI LAYER                               │
│                                                             │
│  ┌─────────────────┐    ┌──────────────────────────────┐   │
│  │   Agent Service  │    │      LLM Router               │   │
│  │                 │    │   OpenRouter → Cloud APIs     │   │
│  │  Conversation   │───►│   Ollama → Local models       │   │
│  │  RAG Engine     │    │   User routing rules          │   │
│  │  Context Mgmt   │    │   Budget enforcement          │   │
│  └─────────────────┘    └──────────────────────────────┘   │
│                                                             │
│  ┌─────────────────┐    ┌──────────────────────────────┐   │
│  │ Workflow Engine  │    │   Embedding Service          │   │
│  │                 │    │                              │   │
│  │  YAML workflows │    │   Write-time: embed entities │   │
│  │  Step execution │    │   pgvector storage           │   │
│  │  Human-in-loop  │    │   Semantic search index      │   │
│  └─────────────────┘    └──────────────────────────────┘   │
│         ▲                                                   │
│         │ decomposes                                        │
│  ┌─────────────────┐                                        │
│  │ Agentic Planner │                                        │
│  │ NL → workflow   │                                        │
│  └─────────────────┘                                        │
└─────────────────────────────────────────────────────────────┘

Agent Service

The Agent Service manages conversations and orchestrates the RAG pipeline.

Conversation Model

interface AgentConversation {
  conversationId: string;
  userId: string;
  messages: AgentMessage[];
  context: {
    currentLocation?: string;    // from Present plane
    activeCalendarEvents?: Event[];
    recentActivity?: Activity[];
  };
}

interface AgentMessage {
  role: 'user' | 'assistant' | 'system';
  content: string;
  citations?: Citation[];        // RAG source references
  requiresConfirmation?: boolean; // for write actions
}

RAG Engine Pipeline

User query
    │
    ▼
1. Query Expansion
   └─ Generate structured sub-queries from natural language
      e.g., "what did Sarah email me?" →
            query: messages WHERE sender LIKE Sarah AND direction=inbound
            query: persons WHERE name LIKE Sarah

    ▼
2. Parallel Retrieval
   ├── pgvector: cosine similarity search (semantic)
   ├── Meilisearch: full-text search
   └── PostgreSQL: structured queries (date ranges, person IDs, channel types)

    ▼
3. Re-ranking
   └─ Score all results by relevance to original query
      Boost recent results, boost exact name matches

    ▼
4. Context Assembly
   └─ Select top-k results within token budget
      Format as structured context block for LLM

    ▼
5. LLM Call (via LLM Router)
   └─ System prompt + user context + user query → response

    ▼
6. Response with Citations
   └─ Stream response to user
      Attach source references (message IDs, document IDs)

Example RAG Queries

User Query	Data Sources Queried
"Where was I when I emailed Sarah last month?"	messages + location_history (joined by timestamp)
"Who called me from 727 area code last week?"	messages (voice) + persons
"Summarize my meetings this week"	events + attendees (persons)
"What documents did I share with Acme Corp?"	documents + persons (org)

LLM Router

The LLM Router directs each query to the right model based on user-defined rules, privacy constraints, and cost controls.

Routing Architecture

AI query
    │
    ▼
Router: evaluate routing rules
    │
    ├─► Local model (Ollama)     — sensitive data, budget exhausted, user preference
    │   └── Llama 3, Mistral, Phi
    │
    └─► Cloud API (OpenRouter)   — complex reasoning, creative generation
        ├── Claude (Anthropic)   — default for reasoning
        ├── GPT-4 (OpenAI)       — complex tasks
        └── Gemini (Google)      — multimodal, long context

OpenRouter Integration

OpenRouter provides a unified API for all cloud LLM providers. A single OpenRouter API key routes to any supported provider — no per-provider key management required.

// LLM Router call via OpenRouter
const response = await openrouter.chat({
  model: 'anthropic/claude-3-5-sonnet',  // or 'openai/gpt-4o', etc.
  messages: [...],
  stream: true,
});

User Routing Rules

Users define routing rules in the Consent UI. Rules are evaluated in order:

# Example user routing configuration
routing_rules:
  - condition: "data_domain contains 'health'"
    model: local/llama3
    reason: "Health data never leaves device"

  - condition: "query_type == 'reasoning'"
    model: openrouter/claude-3-5-sonnet

  - condition: "monthly_spend >= budget_limit"
    model: local/llama3
    reason: "Budget exhausted, fall back to local"

  - default: openrouter/claude-3-haiku  # fast, cheap default

Budget Control

interface LLMBudgetConfig {
  monthlyLimitUSD: number;      // e.g., 20.00
  warningThresholdPct: number;  // e.g., 0.80 (warn at 80%)
  hardCutoff: boolean;          // if true, switch to local at limit; if false, warn only
}

The router tracks token usage per provider and switches to local models when the monthly budget is reached.

Embedding Service

The Embedding Service runs continuously in the background, keeping the vector index current.

Write-Time Embedding

When any text entity is created or updated in the Data Service, the Embedding Service generates a vector embedding and stores it in pgvector:

data.EntityCreated (NATS)
    │
    ▼
IndexDocumentJob (background queue)
    │
    ▼
Embedding Service
    │  calls local embedding model (Sentence Transformers or Ollama)
    ▼
pgvector embeddings table
    INSERT (entity_type, entity_id, model, embedding)

Vector Schema

embeddings (
  id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  entity_type text NOT NULL,    -- 'message' | 'document' | 'person' | 'event'
  entity_id   uuid NOT NULL,
  model       text NOT NULL,    -- embedding model identifier
  embedding   vector(1536),     -- OpenAI ada-002 dim; or 768 for local models
  created_at  timestamptz DEFAULT now()
)
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops);

Semantic Search Query

-- Find top-10 messages semantically similar to a query vector
SELECT e.entity_id, 1 - (em.embedding <=> $query_vector) AS similarity
FROM embeddings em
JOIN messages e ON e.id = em.entity_id
WHERE em.entity_type = 'message'
ORDER BY em.embedding <=> $query_vector
LIMIT 10;

Workflow Engine

The Workflow Engine executes multi-step automated tasks defined in YAML. It handles state persistence, error recovery, and human-in-the-loop confirmation.

Workflow Definition

id: summarize-and-file
name: "Summarize and File Important Articles"
version: "1.0"
trigger:
  event: data.EntityCreated
  filter:
    entity_type: document
    tags: [article]

steps:
  - id: summarize
    type: ai.generate
    model: openrouter/claude-3-haiku
    prompt: |
      Summarize this article in 3 bullet points:
      {{ steps.input.content }}
    output: summary

  - id: classify
    type: ai.classify
    model: local/llama3
    categories: [tech, science, business, culture, health]
    input: "{{ steps.input.content }}"
    output: topic

  - id: tag
    type: data.update
    entity_type: document
    entity_id: "{{ steps.input.entity_id }}"
    fields:
      tags: ["{{ steps.classify.output }}", "auto-summarized"]
      notes: "{{ steps.summarize.output }}"

Human-in-the-Loop Step

  - id: confirm-send
    type: human.confirm
    message: |
      About to send this email to {{ recipient_name }}:
      {{ steps.compose.output }}
    timeout: 300   # seconds to wait for confirmation
    on_timeout: cancel
    on_confirm:
      - type: communication.send_email
        ...

AI-Powered Rules Integration

The Rules Service can use AI as a condition or action:

# AI condition: classify an email before routing
trigger: communication.EmailReceived
condition:
  operator: ai.classify_text
  params:
    text: "{{ event.body }}"
    category: urgent
  # Returns true if classified as urgent

action:
  operator: ai.generate_text
  params:
    prompt: "Summarize in one sentence: {{ event.body }}"
  output: summary

action:
  operator: communication.send_notification
  params:
    body: "Urgent email: {{ summary }}"

Agentic Planner

The Agentic Planner accepts natural language goals and decomposes them into executable workflow definitions.

Planning Process

User: "Plan a weekend trip to San Francisco next month for me and my partner.
       Find flights, a pet-friendly hotel, and Italian restaurant for Saturday night."

    ▼
1. Agentic Planner sends goal to powerful LLM (Claude Sonnet or GPT-4)
   System prompt: available plugins, user preferences, current context

    ▼
2. LLM decomposes into steps:
   - search_flights(from=user_home, to=SFO, dates=next_month_weekend, passengers=2)
   - search_hotels(location=SFO, pet_friendly=true, dates=same)
   - search_restaurants(location=SFO, cuisine=italian, night=saturday)

    ▼
3. Agentic Planner generates a dynamic workflow YAML

    ▼
4. Workflow Engine executes
   - Each step calls the appropriate plugin/service
   - Results collected

    ▼
5. Results presented to user for selection

    ▼
6. Human-in-the-loop confirmation before any booking action

Privacy Guarantees

The AI layer enforces the platform's zero-trust principles:

Guarantee	Mechanism
No data access without consent	Policy Service checked before every RAG retrieval
Every LLM call logged	Audit log: provider, model, token count, timestamp
PII redaction for cloud prompts	Policy Service strips names/phones/emails from prompt before routing
Write actions require confirmation	Human-in-the-loop step mandatory for email/call/purchase
Budget hard limit	LLM Router blocks cloud calls when limit reached
Local model fallback	Health and sensitive data always routed to local Ollama

Capability Rollout

Stage	Capability
Embeddings foundation	Embedding Service active (all entities embedded at write time); pgvector schema in place
Semantic search	Basic semantic search in the web app (not full AI agent)
Full AI agent	Full Agent Service; RAG Engine; LLM Router with OpenRouter + local Ollama; Workflow Engine; Agentic Planner; AI chat UI in web + Android
Advanced AI	Multi-agent workflows; federated AI (queries span multiple user hubs); on-device LLM on Android