Architecture¶
CIUF implements an incremental view maintenance model inspired by Noria. Instead of caching opaque query results, it maintains a Directed Acyclic Graph (DAG) where each node caches its portion of the computation incrementally.
The DAG Model¶
Every query registered with CIUF maps to a chain of nodes:
Each node holds an in-memory DataFrame (via pandas). When a write event arrives, only the nodes downstream from the affected table re-compute — and only for the affected rows.
Node types¶
| Node | Responsibility |
|---|---|
TableNode |
Mirrors a database table in memory. Receives raw write events. |
JoinNode |
Maintains the join result between two tables. Updates incrementally on row add/update/delete. |
FilterNode |
Applies WHERE conditions. Re-evaluates only affected rows. |
GroupNode |
Maintains GROUP BY aggregations (COUNT, SUM, AVG, MIN, MAX). |
SelectNode |
Final projection. The result returned to callers. |
Data flow¶
Cold read (first query)¶
- CIUF parses the SQL query into a DAG structure
- Each
TableNodeloads its table from PostgreSQL via SQLAlchemy - The DAG processes the full dataset from table → select
- The
SelectNoderesult is cached in memory - The DataFrame is returned to the caller
Hot read (repeated query)¶
- CIUF finds the registered
SelectNodefor the query pattern - Returns the cached DataFrame directly — no database roundtrip
- Latency: microseconds to low milliseconds depending on result size
Write event (on_insert / on_update / on_delete)¶
- CIUF receives the changed row(s) at the
TableNode - The delta propagates downstream through the DAG:
JoinNodeadds/removes/updates the joined rowsFilterNodere-evaluates the filter predicate for affected rowsGroupNodeupdates aggregation accumulatorsSelectNodeupdates the cached result- The next read returns the updated result — no full recompute
Memory model¶
Each node stores a pandas DataFrame. Memory usage scales with: - Number of rows in each table - Number of registered query patterns - Selectivity of filters (fewer rows pass → smaller downstream caches)
LRU eviction (controlled by max_memory_mb) evicts the least-recently-used SelectNode cache when memory exceeds the limit. TTL eviction (controlled by ttl_seconds) evicts stale results regardless of memory pressure.
Concurrency¶
All shared DAG state is protected by threading.RLock. CIUF is safe for concurrent reads and writes from multiple threads. For multi-process deployments, each process maintains an independent cache — use Redis if you need a shared cache across processes.
SQL parsing (M1)¶
CIUF uses sqlglot to parse SQL queries into DAG structures automatically. Supported constructs:
SELECTwith column projectionWHEREwith comparison, logical, andBETWEEN/IN/LIKEoperatorsJOIN(INNER, LEFT) on equality predicatesGROUP BYwithCOUNT,SUM,AVG,MIN,MAXORDER BYandLIMIT
Unsupported (out of scope for v1): subqueries, CTEs, window functions, UNION.
Limitations¶
- Single-process only: each CIUF instance is in-process; no shared cache between processes
- Memory-bound: result sets must fit in RAM; not suitable for datasets exceeding available memory
- Write path required: CIUF must receive write events; it does not poll the database for changes
- Python 3.11+: no backport planned