Abstractor

System design primitives for computing — memory hierarchies, derived data, synchronization, and the fundamental concepts that underlie all software systems.

Abstractor

System Primitives

These constraints are rooted in physics and fundamental CS. They apply at every layer. Abstractions can defer them, transform them, or trade one for another — but never eliminate them.

Derived Data - Systems

Physics creates distance. Distance forces copies. Copies require coherence.

Computation and storage are separated by distance — in the memory hierarchy, across the network, even in time. When crossing that distance repeatedly is too expensive, you store a copy closer. That copy is derived data.

Source → transform → store closer → faster access → two representations → sync obligation

Every cache, replica, index, materialized view, denormalized table, and memoized result is the same pattern: store a transform of source data closer to consumption. Pay for sync.

The Three Choices

1. What transform?
- Identity → CPU cache, CDN, replica
- Projection → covering index, column store
- Aggregation → materialized view, rollup
- Structure change → B-tree, hash, inverted index
- Lossy → bloom filter, HyperLogLog, sketch
2. Where to store?
- Memory → register → L1 → L2 → L3 → RAM → SSD → disk
- Network → same-process → machine → rack → DC → region → edge
- Time → precomputed → on-demand → lazy
3. How to sync?
- Sync on write → strong consistency, write pays RTT
- Invalidate → strong consistency, invalidation fanout
- TTL → bounded staleness, no coordination
- Version on read → strong consistency, read pays check
- Never → immutable source, no cost

Unification

Name	Transform	Location	Sync
CPU L1 cache	identity	RAM → L1	coherence protocol
CDN	identity	origin → edge	TTL / purge
Redis cache	identity / projection	disk → RAM	TTL / invalidate
Database replica	identity	primary → secondary	sync / async
Materialized view	aggregation	compute → storage	refresh / periodic
Index	projection + structure	scan → lookup	sync on write
Denormalized table	join	join-time → storage	dual write
Memoization	full result	compute → lookup	none (pure)
Bloom filter	lossy projection	set → bits	rebuild

What This Explains

Cache invalidation is hard — it's distributed coordination
Immutability is powerful — no sync needed, copies are forever valid
Indexes slow writes — sync obligation on every mutation
Eventual consistency exists — coordination is expensive, defer it
CDNs use TTL — bounded staleness avoids coordination
Denormalization is dangerous — multiple sync points, easy to diverge
Memoization is easy — pure functions have immutable inputs

Concrete Tradeoff Examples

Real-world technology choices mapped to the primitives above. Hover to see which constraints apply.

Derived Data - Languages

Physics creates distance. Distance forces copies. Copies require coherence.

This pattern applies at every layer of the stack.

SYSTEM: Source DB → replica → CDN edge → browser cache
LANGUAGE: Value → alias → copy → register

Isomorphism

Derived Data (System)	Derived Data (Language)
Source	Original value / memory location
Transform	Copy / reference / move / borrow
Store closer	Register, stack, local variable, cache line
Two representations	Multiple references to same data
Sync obligation	Coherence: who can read/write when?

Layers

Layer	Source	Derived Copy	Sync Strategy
CPU cache	RAM	L1/L2/L3 line	MESI protocol
Compiler	Memory	Register	Register allocation
Language	Original binding	Alias / reference	Borrow checker / locks / GC
Thread	Shared heap	Thread-local	Mutex / channels / atomics
Process	Shared memory	Process-local	IPC / message passing
Database	Primary	Replica	Replication protocol
Network	Origin server	CDN edge	TTL / invalidation
Geo-distributed	Region A	Region B	Eventual consistency

Triangle

Three primitives rooted in physics:

                      TIME
                     (when)
                       △
                      /|\
                     / | \
                    /  |  \
                   / STATE \
                  /    |    \
                 /     |     \
                /      |      \
             SPACE ----+---- IDENTITY
            (where)         (which)

Primitive	Physical Root	Question
SPACE	Locality, memory hierarchy	Where does data reside?
TIME	Causality, sequence	When does it exist/change?
IDENTITY	Equivalence, sameness	Are these the same data?

STATE emerges from the triangle:

STATE = f(SPACE, TIME)

Dimensions of TIME

TIME is overloaded in programming:

Dimension	Question	Example
Execution	What order?	Statement A before B
Parallel	Simultaneous?	Thread 1 and Thread 2
Existence	How long?	Value lifetime, reference validity

Coherence problems arise at the intersections:

Parallel TIME + shared SPACE → data races
Existence TIME mismatch → dangling references, use-after-free

Coherence Problem

Three ingredients:

SHARED IDENTITY Multiple paths to "the same" data + MULTIPLE SPACES Data exists in more than one location + TIME FLOWS Mutation can occur = COHERENCE PROBLEM Which STATE is true? How to sync?

Remove any one:

Remove	Strategy
Shared Identity	Value semantics, deep copy
Multiple Spaces	Single source of truth
Time flows	Immutability

Operations

Operation	Meaning	Parallel Hazard
Read	Observe value	Stale/torn reads
Write	Mutate value	Lost update, race
Copy	Create independent duplicate	Torn copy
Move	Transfer ownership, invalidate source	Double-move
Alias	Second reference to same location	Data race
Sync	Reconcile divergent copies	(the solution)

Sync Strategies

Strategy	Language Level
Forbid the problem	Ownership (move semantics)
Freeze time	Immutable bindings
Serialize access	Mutex, RwLock
Hardware arbitration	Atomics, CAS
Compile-time proof	Borrow checker
Copy-on-write	CoW data structures
Message passing	Channels
Optimistic	STM, persistent structures
Trust the user	`unsafe`, raw pointers

Language Choices

Language	IDENTITY	TIME	Coherence
Haskell	Shared freely	Frozen	No mutation → no problem
Erlang	Process isolation	Frozen + messages	Copy between processes
Clojure	Shared freely	Frozen	Persistent structures
Rust	Ownership + borrowing	Controlled	Compile-time proof
Go	Shared	Free	Channels or locks
Java	Shared	Free	Locks / volatile
C/C++	Unrestricted	Free	Programmer responsibility
JavaScript	Shared	Free	Single-threaded
Python	Shared	Free	GIL

Language Constructs

Construct	TIME	SPACE	IDENTITY	Coherence
Register	Runtime	Register	Unique	N/A
Stack variable	Runtime	Stack	Scoped	Lexical scope
Heap allocation	Runtime	Heap	Reference(s)	Manual / GC / ownership
Compile-time const	Compile	Inlined	N/A	None needed
Static / global	Program lifetime	Data segment	Global	Atomic / lock / immutable
Thread-local	Runtime	Per-thread	Thread-scoped	No sharing
Immutable value	Frozen	Any	Shared freely	Frozen → valid
Mutable + locked	Serialized	Heap	Shared	Mutex
Atomic	Hardware	RAM	Shared	Hardware coherence
Channel message	Runtime	Copied	Transferred	No shared state

Examples

Immutability enables safe sharing — frozen TIME allows IDENTITY to span SPACEs
Rust's fearless concurrency — compile-time proof of IDENTITY rules
Locks are slow — serialize TIME globally, threads wait
Lock-free is hard — hardware IDENTITY arbitration is subtle
GC doesn't prevent data races — GC manages SPACE, not IDENTITY+TIME
JavaScript is single-threaded — serialize TIME globally
Python's GIL — serialize TIME at interpreter level
Value types are easier — copy creates new IDENTITY
Pointers are dangerous — unrestricted IDENTITY + free TIME
const differs across languages — different TIME/IDENTITY choices

Tradeoff

FLEXIBILITY ◄─────────────────────────► SAFETY

  Unrestricted aliasing         Restricted IDENTITY
  Free mutation                 Controlled TIME
  Manual management             Compiler/runtime enforced
         │                              │
         ▼                              ▼
  Maximum power                 Maximum guarantees
  C, unsafe Rust                Haskell, Rust safe, Erlang

1. Which axis to constrain?

• TIME → immutability

• SPACE → single location

• IDENTITY → ownership, value semantics

2. Who enforces it?

• Programmer

• Compiler

• Runtime

• Hardware

3. When to check?

• Compile-time

• Runtime

• Never

Primitive Interactions

Programs add an expression layer to the primitives:

Expression	What it introduces
Variables	Names for SPACE
Scopes	Bounded regions of TIME
Functions	Reusable TIME sequences
Types	Constraints on SPACE contents
References	IDENTITY relationships
Threads	Parallel TIME lines

All programming concepts emerge from interactions between primitives and expression.

Pairwise Interactions

Interaction	Question	Concepts
SPACE × TIME	When does memory exist?	Allocation, deallocation, lifetime, scope, mutation
SPACE × IDENTITY	How many paths to this memory?	Variable, pointer, alias, copy, move, null
TIME × IDENTITY	When is a name valid?	Declaration, scope, shadowing, rebinding, drop

Three-way Interaction

When SPACE × TIME × IDENTITY interact simultaneously:

    SPACE ────────── TIME
        \           /
         \         /
          \       /
           \     /
          IDENTITY

    Center = hard problems

Scenario	Interaction	Result
Parallel TIME + shared SPACE + multiple IDENTITY	Concurrent mutation	Data race
IDENTITY outlives SPACE in TIME	Reference to freed memory	Dangling pointer
SPACE freed, IDENTITY used later	Access after deallocation	Use-after-free
SPACE freed twice in TIME	Double deallocation	Double free
IDENTITY transferred, old used in TIME	Access after move	Use-after-move
Multiple IDENTITY + mutation + overlapping TIME	Writes interleave	Race condition

Bugs as Interaction Failures

Bug	Failed Interaction	What went wrong
Memory leak	SPACE × TIME	SPACE exists past needed TIME
Use-after-free	SPACE × TIME × IDENTITY	IDENTITY used after SPACE's TIME ends
Dangling pointer	TIME × IDENTITY	IDENTITY outlives referent
Data race	SPACE × TIME × IDENTITY	Parallel TIME + shared SPACE + mutation
Double free	SPACE × TIME	SPACE deallocated twice
Null dereference	SPACE × IDENTITY	IDENTITY points to no SPACE
Buffer overflow	SPACE × IDENTITY	IDENTITY exceeds SPACE bounds
Uninitialized read	SPACE × TIME × IDENTITY	IDENTITY used before SPACE has value

Features as Interaction Solutions

SPACE × TIME solutions:

Feature	Mechanism
Garbage collection	Runtime tracks SPACE, frees when unreachable
RAII	Tie SPACE lifetime to scope (TIME)
Reference counting	Track IDENTITY count, free when zero
Stack allocation	SPACE lifetime = function TIME

SPACE × IDENTITY solutions:

Feature	Mechanism
Ownership	Unique IDENTITY to SPACE
Move semantics	Transfer IDENTITY, invalidate source
Value types	Copy creates new SPACE, new IDENTITY
Nullable types	Explicit "IDENTITY to no SPACE"

TIME × IDENTITY solutions:

Feature	Mechanism
Lexical scope	IDENTITY valid in TIME region
Lifetimes	Explicit IDENTITY validity bounds
Closures	Extend IDENTITY across TIME boundaries
Drop order	Defined IDENTITY end sequence

SPACE × TIME × IDENTITY solutions:

Feature	Mechanism
Borrow checker	Prove all three consistent at compile time
Locks/Mutex	Serialize TIME access to SPACE
Atomics	Hardware-arbitrated SPACE × TIME
Channels	Transfer IDENTITY, no shared SPACE
Immutability	Freeze TIME dimension, sharing safe
Actor model	Isolate SPACE per actor, message only
Linear types	IDENTITY used exactly once

Paradigms as Interaction Strategies

Paradigm	Strategy	Constrains	Tradeoff
Functional	Freeze mutation	TIME (no state change)	Easy concurrency ↔ efficiency
OOP	Encapsulate memory	SPACE (hide behind interface)	Modularity ↔ aliasing complexity
Rust	Restrict aliasing	IDENTITY (ownership)	Safety + performance ↔ learning curve
Actor	Isolate memory	SPACE (no sharing)	Fault isolation ↔ message overhead
Linear	Single use	IDENTITY (exactly once)	Resource safety ↔ flexibility

Bugs are interaction failures. Features are interaction solutions. Paradigms are holistic bets on which axis to constrain.

Representation Constraints

Programs are expressed in a medium: text files, ASTs, bytecode. The medium has constraints independent of SPACE/TIME/IDENTITY.

PRIMITIVES × MEDIUM CONSTRAINTS = FORCED FEATURES
(SPACE/TIME/IDENTITY) (representation) (workarounds)

Many features exist not because of computational necessity, but because the representation can’t express what we want directly.

Constraints

Constraint	What it means
Forward-only TIME	Execution proceeds forward; can't undo a statement
Names persist	Once declared, a name exists until scope ends
Values persist	A value exists until scope ends; can't delete mid-scope
Stack is LIFO	Can only deallocate top of stack
Text is sequential	One statement after another

Features as Workarounds

Want to...	Can't because...	Workaround
Delete a name	Names persist in scope	Shadowing
Delete a value	Values persist until scope end	Move + invalidation
Free mid-stack	Stack is LIFO	Heap allocation
Go back in TIME	Forward-only	Loops
Undo mutation	Forward-only	Immutability
Parallel execution	Text is sequential	Explicit threads/async

Shadowing — Can’t Delete Names

let x = 5;
// Want: delete x, reclaim the name
// Can't: name persists until scope ends
// Workaround: shadow

let x = "hello";  // New IDENTITY, same name
                  // Old x still in memory, just unreachable

Shadowing exists because names can’t be undeclared. The old binding still exists — destructors run at scope end, not at shadow point.

Move — Can’t Delete Values

let x = vec![1, 2, 3];
let y = x;
// Want: delete x entirely after transfer
// Can't: name 'x' persists in scope
// Workaround: invalidate the IDENTITY

// x still exists as a name, but IDENTITY is severed
// Compiler tracks: "name exists, IDENTITY gone"

Move semantics exist because values can’t be deleted mid-scope. We transfer IDENTITY and mark the source as invalid.

Heap — Can’t Free Mid-Stack

Stack (LIFO):

    ┌─────┐
    │  c  │  ← top, can free
    ├─────┤
    │  b  │  ← can't free until c is gone
    ├─────┤
    │  a  │  ← can't free until b, c are gone
    └─────┘

Heap exists because stack forces LIFO TIME on SPACE. Heap allows independent lifetimes, shared IDENTITY, dynamic size.

SSA — Making Constraints Explicit

Compilers transform to SSA (Static Single Assignment):

// Source
let mut x = 5;
x = x + 1;
x = x * 2;

// SSA
let x1 = 5;
let x2 = x1 + 1;
let x3 = x2 * 2;

SSA reveals the truth: we never “modified” x. We created new values and reused the name.

Can’t delete names → each assignment is a new IDENTITY
Can’t go back → values flow forward
Mutation is illusion → it’s name rebinding

Alternative Representations

Representation	TIME	IDENTITY	SPACE	Key Workarounds
Imperative text	Forward-only	Names persist	Stack LIFO	Shadow, move, heap, loops
SSA	Forward-only	Unique names	Explicit	Phi nodes
Stack-based	Forward-only	Position, not name	Explicit	Stack shuffling
Dataflow graph	By dependency	Nodes	Edges	Control dependencies
Logic	Declarative	Unification	Automatic	Cut, clause order

Features are shaped by representation. The medium constrains what's expressible. Language designers create workarounds. Understanding the constraint explains the workaround.

Abstraction Layers

Every layer in computing—hardware or software—can be understood as an abstractor: hiding complexity below while exposing a simpler interface above.

Click a concept to see its dependency thread. Hover to see connections.

▶ Engineering Profiles

Layer Details

Expand each layer to see what it abstracts (hides), exposes (provides), leaks (breaks through), and escapes (workarounds).

This post is licensed under CC BY 4.0 by the author.