Skip to content

Architecture

CIUF implements an incremental view maintenance model inspired by Noria. Instead of caching opaque query results, it maintains a Directed Acyclic Graph (DAG) where each node caches its portion of the computation incrementally.

The DAG Model

Every query registered with CIUF maps to a chain of nodes:

TableNode → JoinNode → FilterNode → GroupNode → SelectNode

Each node holds an in-memory DataFrame (via pandas). When a write event arrives, only the nodes downstream from the affected table re-compute — and only for the affected rows.

Node types

Node Responsibility
TableNode Mirrors a database table in memory. Receives raw write events.
JoinNode Maintains the join result between two tables. Updates incrementally on row add/update/delete.
FilterNode Applies WHERE conditions. Re-evaluates only affected rows.
GroupNode Maintains GROUP BY aggregations (COUNT, SUM, AVG, MIN, MAX).
SelectNode Final projection. The result returned to callers.

Data flow

Cold read (first query)

  1. CIUF parses the SQL query into a DAG structure
  2. Each TableNode loads its table from PostgreSQL via SQLAlchemy
  3. The DAG processes the full dataset from table → select
  4. The SelectNode result is cached in memory
  5. The DataFrame is returned to the caller

Hot read (repeated query)

  1. CIUF finds the registered SelectNode for the query pattern
  2. Returns the cached DataFrame directly — no database roundtrip
  3. Latency: microseconds to low milliseconds depending on result size

Write event (on_insert / on_update / on_delete)

  1. CIUF receives the changed row(s) at the TableNode
  2. The delta propagates downstream through the DAG:
  3. JoinNode adds/removes/updates the joined rows
  4. FilterNode re-evaluates the filter predicate for affected rows
  5. GroupNode updates aggregation accumulators
  6. SelectNode updates the cached result
  7. The next read returns the updated result — no full recompute

Memory model

Each node stores a pandas DataFrame. Memory usage scales with: - Number of rows in each table - Number of registered query patterns - Selectivity of filters (fewer rows pass → smaller downstream caches)

LRU eviction (controlled by max_memory_mb) evicts the least-recently-used SelectNode cache when memory exceeds the limit. TTL eviction (controlled by ttl_seconds) evicts stale results regardless of memory pressure.

Concurrency

All shared DAG state is protected by threading.RLock. CIUF is safe for concurrent reads and writes from multiple threads. For multi-process deployments, each process maintains an independent cache — use Redis if you need a shared cache across processes.

SQL parsing (M1)

CIUF uses sqlglot to parse SQL queries into DAG structures automatically. Supported constructs:

  • SELECT with column projection
  • WHERE with comparison, logical, and BETWEEN/IN/LIKE operators
  • JOIN (INNER, LEFT) on equality predicates
  • GROUP BY with COUNT, SUM, AVG, MIN, MAX
  • ORDER BY and LIMIT

Unsupported (out of scope for v1): subqueries, CTEs, window functions, UNION.

Limitations

  • Single-process only: each CIUF instance is in-process; no shared cache between processes
  • Memory-bound: result sets must fit in RAM; not suitable for datasets exceeding available memory
  • Write path required: CIUF must receive write events; it does not poll the database for changes
  • Python 3.11+: no backport planned