Author: Marcus Chen, Principal Engineer, Real‑time Infrastructure
Date: May 8, 2026
Reading time: 22 minutes

✍️ Introduction

You’re in a Google Doc, editing the same paragraph as a colleague in Tokyo. You see their cursor move character by character. In Figma, three designers manipulate the same vector shape simultaneously, and somehow – magically – nobody’s changes overwrite anyone else’s.

Behind that “magic” lies one of the hardest problems in distributed systems: real‑time collaborative editing (RTCE). Unlike a food delivery platform (where events are discrete and independent), collaborative editing demands:

Sub‑100ms latency for remote cursors
Conflict‑free merging without locking documents
Offline support that syncs seamlessly when you reconnect
Scalability to millions of concurrent users across hundreds of thousands of documents

In this post, I’ll walk you through the architecture of a modern collaborative editor. We’ll compare Conflict‑free Replicated Data Types (CRDTs) vs. Operational Transformation (OT), explain how we handle presence (who’s viewing what), and show you how we scale WebSocket connections to hundreds of thousands per region.

Let’s open a shared document together.

📋 Requirements (From a Real Product)

Feature	Description
Live cursors & selections	See other users’ cursors and text selections in real time
Simultaneous editing	Multiple users edit the same paragraph without locking
Offline editing	Make changes without internet; sync when back online
Version history	Complete record of all changes, with branching & restore
Commenting & suggestions	Threaded discussions, suggested edits that can be accepted/rejected
Presence list	Who’s viewing the document, with their current page/region
Cross‑platform	Web (desktop/mobile), iOS, Android, desktop apps

Non‑functional Requirements

Latency – Cursor movements < 100ms p95; text edits visible < 200ms
Consistency – No two users ever see different final documents (strong eventual consistency)
Availability – 99.99% uptime for editing (design tools can’t afford downtime)
Scalability – One document can have 50+ active collaborators (e.g., all‑hands doc)
Durability – Every keystroke persisted; zero data loss
Bandwidth efficiency – Sync only deltas, not whole documents

🧱 High‑Level Architecture

We use a hybrid peer‑to‑peer + cloud model: WebSocket connections to a regional collaborator service, which replicates changes via a global log. For offline, we rely on CRDTs (more on that later).

graph TB
    ClientA[User A Browser<br/>CRDT engine]
    ClientB[User B Browser<br/>CRDT engine]

    subgraph Edge Network
        WS[WebSocket Gateway<br/>Nginx + Centrifugo]
        LB[Load Balancer]
    end

    subgraph Region us-east-1
        CollabSvc[Collaboration Service<br/>Document session manager]
        PresenceSvc[Presence Service<br/>Cursor tracking]
        DocStore[(Document Store<br/>PostgreSQL + S3)]
    end

    subgraph Global
        Kafka[Kafka<br/>Document change log]
        CRDTStore[(CRDT State Store<br/>FoundationDB)]
        HistorySvc[History Service<br/>Time‑travel & restore]
    end

    ClientA --> WS
    ClientB --> WS
    WS --> LB --> CollabSvc
    CollabSvc --> PresenceSvc
    CollabSvc --> DocStore
    CollabSvc --> Kafka
    Kafka --> CRDTStore
    CollabSvc --> CRDTStore
    HistorySvc --> CRDTStore
    PresenceSvc --> Redis[Redis<br/>Live presence sets]

Figure 1: Regional collaboration service with global log replication.

🧵 Core Concept 1: CRDTs vs. OT

All collaborative editors face the same fundamental problem: concurrent edits. User A deletes “Hello” while User B types “Hi” – what’s the final result?

Two main families of solutions:

	Operational Transformation (OT)	CRDTs (Conflict‑free Replicated Data Types)
How it works	Transform operations against concurrent ops (central server or p2p)	Data structure that merges automatically using commutative/lattice maths
Server role	Often a central sequencer that orders and transforms ops	Can be fully decentralized, but usually server for persistence
Offline support	Requires server to transform; offline is tricky	Native – peers can generate ops offline and merge later
Complexity	Transformation functions are complex (hundreds of cases for rich text)	CRDT design is mathematically subtle, but once correct, it “just works”
Examples	Google Docs, Etherpad	Figma, Apple Notes, Redis CRDTs

We chose CRDTs for our platform (let’s call it CollabEdit). Specifically, we use a sequence CRDT for text (based on RGA – Replicated Growable Array) and a map CRDT for properties (styles, comments).

Why CRDT over OT?

Offline first – Our mobile app works on planes. With CRDT, each user generates operations locally, timestamped with a logical clock (Lamport + node ID). When reconnecting, we just send all pending ops – no complex transformation against missed ops.
No central sequencer bottleneck – OT usually needs a server to assign order and transform. CRDT allows us to batch‑insert operations directly into a distributed log (Kafka) and let the state store merge them.
Deterministic merge – Given the same set of operations, any replica ends up with identical state. This is provable.

Trade‑off: CRDTs have higher storage overhead because each character (or element) carries a unique ID. For a 10‑page document, we store ~2‑3x more metadata than raw text. Compression helps, and we accept this for offline capability.

🔄 Document State & Operation Flow

Every collaborative document is a CRDT state – an immutable, append‑only log of operations (insert, delete, style). The log is global and ordered by a hybrid logical clock (HLC).

Operation format

{
  "docId": "doc_abc123",
  "opId": {
    "timestamp": 1746638292000,
    "nodeId": "client_us-east-1_9876"
  },
  "type": "insert",
  "pos": {
    "after": ["char_id_101", "char_id_102"]
  },
  "content": "Hello, world!",
  "charId": "char_456",
  "deps": ["op_171232", "op_171233"]  // causal dependencies
}

pos is a CRDT position – not an integer index but an identifier referencing surrounding characters. This makes concurrent inserts deterministic.
deps ensures causal ordering (we use a vector clock per document).

Lifecycle of a keystroke

User types → Browser CRDT engine generates insert operation with local timestamp.
Operation added to local queue and applied immediately to local state (optimistic UI).
WebSocket sends operation to Collaboration Service in the nearest region.
Collaboration Service validates (authenticity, document permissions) and appends to Kafka topic doc_changes.
Kafka replicator streams to all regions. In each region, a CRDT State Store (FoundationDB) consumes the topic and merges the op into the persistent CRDT snapshot.
Other connected clients receive the op via WebSocket subscription and merge it into their local CRDT.
If conflicts appear (rare – CRDTs are commutative), no action needed. The merge produces the same result everywhere.

sequenceDiagram
    participant A as User A Browser
    participant WS as WebSocket Gateway
    participant Collab as Collaboration Service
    participant Kafka
    participant Store as CRDT State Store
    participant B as User B Browser

    A->>A: type "H"
    A->>WS: INSERT {char: "H", pos: [...]}
    WS->>Collab: forward
    Collab->>Collab: validate, assign HLC timestamp
    Collab->>Kafka: append operation
    Collab-->>A: ACK (op applied globally)
    Kafka->>Store: consumer
    Store->>Store: merge op into persistent CRDT
    Kafka-->>WS: broadcast op to other subscribers
    WS-->>B: operation
    B->>B: merge into local CRDT, update UI

Figure 2: Operation propagation – optimistic local update, then global broadcast.

🖱️ Presence & Cursors (High Frequency, Low Value)

Cursors are ephemeral and high‑volume – moving a mouse generates 10‑20 events per second. We do not send cursor events through Kafka or persistent store. Instead:

Each client sends cursor position (line, column, or 2D canvas coordinates) via dedicated WebSocket frames every 50ms.
The Presence Service maintains a Redis sorted set per document: presence:doc_abc123 with members {userId, cursorPos, lastSeen}.
When a user updates their cursor, Presence Service updates their entry and sets TTL to 2 seconds.
Other clients subscribe to presence updates via a different WebSocket channel (/presence/doc_abc123).
The Presence Service streams deltas (only changes) to listeners.

For large documents with 100+ viewers (read‑only, e.g., engineering design doc), we disable cursor broadcasting for non‑editors. They see a static list of who’s viewing, but not live cursor positions – saves tremendous bandwidth.

Cursor consistency

No strong consistency required – if a cursor lags or disappears after 2 seconds, it’s fine.
We use UDP style over WebSocket: latest cursor update overrides previous, no ordering guarantees.

🗄️ Database & Storage Layers

Component	Technology	Purpose
Document CRDT log	Kafka (retention 7 days)	Immutable, replayable operation history
Persistent CRDT state	FoundationDB (with CRDT extension)	Current document state, supports atomic merges
Document metadata	PostgreSQL	Title, owner, sharing permissions, creation time
Blob storage (images)	S3 / GCS	Images, videos, large assets referenced in document
Presence	Redis (with expiry)	Live cursors, viewing users, typing indicators
Version history	FoundationDB snapshots + S3	Hourly snapshots + incremental logs for time travel

FoundationDB as the CRDT store

FoundationDB (FDB) provides ACID transactions over a distributed key‑value store. We store each document’s CRDT as a set of key‑value pairs:

doc:<docId>:char:<charId> → character metadata
doc:<docId>:style:<range> → formatting
doc:<docId>:tombstones → set of deleted IDs

FDB’s transactions allow us to merge multiple incoming operations atomically. When merging an operation from Kafka, we:

Read the current CRDT state from FDB.
Compute the new state by applying the operation (pure function).
Commit the new state in a transaction. If a concurrent merge happened, FDB’s optimistic concurrency control aborts and we retry.

This gives us serializable consistency for merges – no two merges can clobber each other.

📴 Offline Editing & Sync

The offline story is why we love CRDTs.

Local persistence – Browser’s IndexedDB stores all operations generated while offline (or unsynced). Also stores the last known CRDT snapshot.
When offline – User edits generate ops with local timestamps (HLC). These are applied to local CRDT and queued.
Reconnection – Client sends sync request with its highest known op timestamp per document.
Collaboration Service replies with all ops from the global log that the client missed (after that timestamp).
Client merges those ops into its local CRDT. Because CRDT operations are commutative and idempotent, merging is just applying them all.
Then client sends its queued offline ops. The server appends them to Kafka (with new HLC timestamps that now reflect the real time). Since the client already merged the server’s ops, there’s no double application – the server’s ops are idempotent.

Conflict example (offline sync)

User A changes “cat” to “dog” while offline. User B changes “cat” to “bat” while online. When A reconnects:

A’s local state: “dog”
Global state: “bat”
The server sends B’s op to A. A’s CRDT merges: “bat” (last write wins? No – CRDT rules decide).

For text, typical CRDTs like RGA use “insert after” positioning. If both replaced the same range, the merge depends on the causal order. In our design, the server’s timestamp wins (HLC). But the beauty of CRDTs is that the result is deterministic and conflict‑free without user intervention – no “merge conflict” popup for simple text edits.

⚡ Scaling to Millions of Users

Regional sharding

We have 4 regions (us-east-1, us-west-2, eu-west-1, ap-southeast-1). Each region has:

10 Collaboration Service pods (Kubernetes, HPA based on WebSocket connections)
1 Kafka broker for local writes (then replicated globally)
1 FoundationDB cluster (sharded by document ID)

Write path: When a user edits, their operation is written to the local Kafka broker, which replicates to other regions asynchronously. The local Collaboration Service does not wait for global replication before ACKing the user – that would add 200ms cross‑region latency. We accept eventual consistency: a user in Tokyo may not see a user in London’s edit for 300ms.

Read path: Each region reads from its local FoundationDB replica (FDB uses synchronous replication within region, async across regions). So eventual consistency again, but acceptable for editors.

Handling 50 collaborators on one document

When a document has many active editors, the state store can receive hundreds of ops per second. This is a hot partition problem – all ops target the same document ID.

Mitigations:

Batching – The Collaboration Service buffers ops for 10ms and submits them as a batch to Kafka (one message, many ops). This reduces Kafka overhead.
FoundationDB transaction merging – Instead of one transaction per op, we run a micro‑batch merge: every 20ms, all ops received for a document are merged in a single FDB transaction. This is safe because CRDT ops are commutative – order within batch doesn’t matter.
Conflict resolution – FDB transactions might still abort if two merges happen concurrently on the same document. We cap retries at 5, and if still failing, we split the batch into smaller groups.

WebSocket scaling

We use Centrifugo – a scalable WebSocket server that can handle 500k connections per node. It exposes a message bus (Nats) to route messages between Centrifugo nodes. Our Collaboration Service publishes to Centrifugo’s API, which fans out to all clients subscribed to a document channel.

For documents with 50+ editors, Centrifugo uses a broadcast per connection – not per room. This is still O(N²) network overhead if every client sends frequent ops. So we additionally:

Throttle cursor updates to 20 Hz.
Batch text operations at the client side (group keystrokes within 100ms).
Use delta compression – send only the changed range, not full operations.

🔁 Consistency Guarantees (Strong vs. Eventual)

Aspect	Guarantee	Rationale
Text content	Strong eventual convergence	All replicas eventually identical given same ops
Cursor positions	No guarantee (best effort)	Ephemeral, not persisted
Comments	Strong consistency (server‑ordered)	Comments are fewer, need ordering
Document metadata (title, shares)	Strong (via PostgreSQL)	Central source of truth
Offline sync	Eventual (within seconds of reconnection)	Good enough

For comments, we do not use CRDTs. Comments are fewer, they have threads, and ordering matters for readability. We store them in PostgreSQL with a simple created_at server timestamp. This is a centralized design – but comments are not real‑time critical; a 500ms delay is fine.

🧪 Lessons Learned (From Production)

1. Don’t underestimate garbage collection

CRDTs leave tombstones (deleted elements) forever unless you compact. After a few months, a document with 10k edits would accumulate 50k tombstones. We implemented a compaction process that runs on idle documents (no edits for 1 hour): it rewrites the CRDT into a fresh state without tombstones and creates a new log start point.

2. WebSocket reconnection storms

When a network glitch affects 10k clients (e.g., AWS AZ fail), all try to reconnect simultaneously. Our WebSocket gateway uses exponential backoff with jitter on the client (2s, 4s, 8s…) and connection throttling on the server (max 500 new connections per second). Without this, the gateway instantly hits file descriptor limits.

3. FoundationDB transaction size limits

FDB has a 10MB transaction limit. After a user pastes a 15MB novel, we hit this. We now detect large operations and store the content as a blob in S3, then insert a reference operation (pointer + SHA256). The CRDT stores the reference, and clients fetch the blob separately.

4. Presence DoS attacks (real)

One user joined a public doc with a script that updated cursor position 1000 times per second. Presence Redis CPU spiked. We implemented rate limiting per user per document: 20 updates per second. Larger spikes are silently dropped.

5. Offline sync with large operation logs

A user stayed offline for 3 weeks on a plane, generating 50k ops. On reconnect, the server tried to send all missed ops (global log ~200k ops) at once. Timeout. We changed to chunked sync: GET /sync?from=ts1&limit=5000 and the client pulls in chunks, merging incrementally.

🛡️ Security & Permissions

Document permissions are centralized in PostgreSQL, not embedded in the CRDT (never trust client state). On each operation:

The Collaboration Service checks if the user has write permission for that document (PostgreSQL lookup, cached in Redis for 10s).
For view‑only users, the WebSocket gateway rejects insert/delete operations outright.
Share links are encoded with a random 32‑byte token, stored in PostgreSQL with expiration.

We also support end‑to‑end encryption for enterprise documents. The client encrypts each operation before sending to the server. The server never sees plaintext – it just replicates encrypted blobs. The downside: search and version history become impossible. We offer this only as an opt‑in feature.

📊 Observability

Signal	Tools	What we watch
Metrics	Prometheus + Grafana	Op latency (p50/99), Kafka lag, WebSocket connections per node
Logs	Loki + Grafana	Error logs (correlation ID per op)
Tracing	Jaeger	Full RPC trace: Browser → WS → Collab → Kafka → FDB
Real‑user monitoring	Custom	On‑device merge time (time from keystroke to screen update)

We have a dashboard for each document that highlights conflicts: if two CRDT replicas diverge (detected via checksum), an alert fires. This has never happened in production – CRDTs work.

🚀 Future Roadmap

History branching – Allow users to fork a document from any point in time, inspired by Git. We already have the operation log; we just need a way to apply a subset of ops.
AI conflict resolution – For structured content (tables, lists), CRDTs sometimes produce unexpected merges (e.g., two people sorting a column simultaneously). We’ll use LLMs to suggest a “human‑friendly” merge and let users accept.
WebTransport – Replace WebSockets with WebTransport for even lower latency and better stream control.

🎯 Parting Thoughts

Building a real‑time collaborative editor is a beautiful intersection of distributed systems, data structures, and HCI. CRDTs give us the mathematical confidence that merges will never lose data. FoundationDB gives us the transactional backbone to make those merges durable. And a carefully throttled presence system gives users the feeling that they’re in the same room – even when they’re oceans apart.

If you take one thing away: never try to build your own CRDT from scratch unless you have a PhD and six months to spare. Use a well‑vetted library (e.g., yjs for JavaScript, automerge for backend) and focus on the orchestration layer instead.

Now go build something that lets people create together. 🚀

Questions? Did I just gloss over vector clocks? Find me on Mastodon @marcus_chen@hachyderm.io or open an issue on our CollabEdit Architecture repo.

Further reading: - [CRDTs: The hard parts – Martin Kleppmann (2020 talk)] - [FoundationDB: A distributed ACID database (Apple Engineering)] - [Figma’s multi‑player architecture (Figma Engineering Blog)] - [Yjs: A CRDT framework for real‑time collaboration (yjs.dev)]

The Architecture Behind Real‑Time Collaborative Editing: How Figma & Google Docs Sync Millions of Cursor