Architecture¶

CIUF implements an incremental view maintenance model inspired by Noria. Instead of caching opaque query results, it maintains a Directed Acyclic Graph (DAG) where each node caches its portion of the computation incrementally.

The DAG Model¶

Every query registered with CIUF maps to a chain of nodes:

TableNode → JoinNode → FilterNode → GroupNode → SelectNode

Each node holds an in-memory DataFrame (via pandas). When a write event arrives, only the nodes downstream from the affected table re-compute — and only for the affected rows.

Node types¶

Node	Responsibility
`TableNode`	Mirrors a database table in memory. Receives raw write events.
`JoinNode`	Maintains the join result between two tables. Updates incrementally on row add/update/delete.
`FilterNode`	Applies WHERE conditions. Re-evaluates only affected rows.
`GroupNode`	Maintains GROUP BY aggregations (COUNT, SUM, AVG, MIN, MAX).
`SelectNode`	Final projection. The result returned to callers.

Data flow¶

Cold read (first query)¶

CIUF parses the SQL query into a DAG structure
Each TableNode loads its table from PostgreSQL via SQLAlchemy
The DAG processes the full dataset from table → select
The SelectNode result is cached in memory
The DataFrame is returned to the caller

Hot read (repeated query)¶

CIUF finds the registered SelectNode for the query pattern
Returns the cached DataFrame directly — no database roundtrip
Latency: microseconds to low milliseconds depending on result size

Write event (on_insert / on_update / on_delete)¶

CIUF receives the changed row(s) at the TableNode
The delta propagates downstream through the DAG:
JoinNode adds/removes/updates the joined rows
FilterNode re-evaluates the filter predicate for affected rows
GroupNode updates aggregation accumulators
SelectNode updates the cached result
The next read returns the updated result — no full recompute

Memory model¶

Each node stores a pandas DataFrame. Memory usage scales with: - Number of rows in each table - Number of registered query patterns - Selectivity of filters (fewer rows pass → smaller downstream caches)

LRU eviction (controlled by max_memory_mb) evicts the least-recently-used SelectNode cache when memory exceeds the limit. TTL eviction (controlled by ttl_seconds) evicts stale results regardless of memory pressure.

Concurrency¶

All shared DAG state is protected by threading.RLock. CIUF is safe for concurrent reads and writes from multiple threads. For multi-process deployments, each process maintains an independent cache — use Redis if you need a shared cache across processes.

SQL parsing (M1)¶

CIUF uses sqlglot to parse SQL queries into DAG structures automatically. Supported constructs:

SELECT with column projection
WHERE with comparison, logical, and BETWEEN/IN/LIKE operators
JOIN (INNER, LEFT) on equality predicates
GROUP BY with COUNT, SUM, AVG, MIN, MAX
ORDER BY and LIMIT

Unsupported (out of scope for v1): subqueries, CTEs, window functions, UNION.

Limitations¶

Single-process only: each CIUF instance is in-process; no shared cache between processes
Memory-bound: result sets must fit in RAM; not suitable for datasets exceeding available memory
Write path required: CIUF must receive write events; it does not poll the database for changes
Python 3.11+: no backport planned