Skip to content

Entitybase Backend πŸ—οΈ

An API-first, billion-scale backend for structured knowledge β€” items, properties, and statements without the wiki.

What Problem Does This Solve? πŸ€”

Imagine you're building the next Wikidata β€” a database that holds structured knowledge about everything in the world. Millions of items, billions of statements, and thousands of edits per second. 😱

The problem? Traditional databases overwrite data. When someone edits an item, the old version disappears. You can't easily: - See who changed what and when - Roll back to an old version - Know what changed between two edits - Trust that your data hasn't been silently modified

Entitybase solves this by treating every edit as an immutable snapshot β€” like Git for structured data. Once written, an edit can never be changed or deleted. Ever. This gives you perfect auditability, easy rollbacks, and rock-solid consistency.

Who Is This For? πŸ‘₯

Entitybase is perfect for:

  • πŸš€ App developers β€” Building apps that need structured knowledge without a full wiki
  • πŸ”¬ Research platforms β€” Need a clean API for knowledge graphs and structured data
  • πŸ›οΈ Wikimedia projects β€” Want just the data model (entities, statements) without MediaWiki
  • πŸ“Š Data engineers β€” Need clean RDF exports of structured knowledge
  • πŸ› οΈ Custom solutions β€” Building "knowledge base" apps without wiki overhead

The 3-Second Pitch πŸ“£

Entitybase = Git for structured knowledge

Every edit creates an immutable snapshot in S3. Vitess handles fast lookups. REST API gives you full CRUD. Built for 1 billion+ entities and 1 trillion statements.

TL;DR Quick Facts ⚑

Capability Value
Interface REST API only (no wiki pages) πŸƒ
Authentication None yet (planned) πŸ”
Capacity 1B+ entities πŸ’Ž
Statements 1T+ unique statements
Revisions Immutable (never overwritten) πŸ”’
API 122 REST endpoints
Storage S3 (snapshots) + Vitess (indexing)
Exports JSON, Turtle (RDF) 🐒
Version 0.1.0 🚧

Architecture in 3 Boxes πŸ“¦

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    You      │───▢│   REST API  │───▢│    S3       β”‚
β”‚  (clients)  β”‚    β”‚  (FastAPI)  β”‚    β”‚  (storage)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚   Vitess    β”‚
                   β”‚  (indexing) β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. You (clients) β€” API calls from your app, scripts, or frontend
  2. REST API (FastAPI) β€” Validates, processes, and routes requests
  3. S3 (storage) β€” Immutable snapshots of every entity revision
  4. Vitess (indexing) β€” Lightning-fast lookups and queries

πŸ’‘ Think of it like this: S3 is the permanent record (πŸ“œ), Vitess is the index in the back of the book (πŸ“‘), and the API is the librarian (πŸ‘©β€πŸ«)

Key Features ✨

  • Immutable Revisions β€” Every edit creates a snapshot you can never overwrite
  • 122 REST Endpoints β€” Full CRUD for entities, terms, statements, users, and more
  • Complete Lexeme Support β€” Forms, senses, lemmas, glosses, and lexical categories
  • Statement Deduplication β€” Hash-based storage can reduce storage by 50%+
  • RDF Export β€” Turtle format for semantic web integration 🐒
  • Entity Protection β€” Full locks, semi-protection, archiving, and mass-edit protection
  • User Features β€” Watchlists, endorsements, thanks, and activity tracking
  • Horizontal Scaling β€” Vitess sharding + S3 for infinite scale

Why This Design? 🧠

You might ask: "Why not just use MySQL like everyone else?"

The Problem with Mutable Data

Traditional database:     Entitybase:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Item Q1   β”‚         β”‚ Rev 1: Q1   │──▢ S3 snapshot
β”‚  (current)  β”‚         β”‚ Rev 2: Q1   │──▢ S3 snapshot
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚ Rev 3: Q1   │──▢ S3 snapshot
  (overwrites!)         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         (append-only!)

Traditional databases overwrite data. You lose history. Entitybase appends immutable snapshots. You gain:

  • βœ… Perfect auditability β€” Who changed what, when
  • βœ… Easy rollbacks β€” Just point to an old revision
  • βœ… No data loss β€” Old versions are never deleted
  • βœ… Event sourcing β€” Replay events to rebuild state
  • βœ… Conflict resolution β€” Compare revisions without locking

Statement Deduplication

Statements (like "population β†’ 8.9 billion") appear millions of times across entities. Entitybase stores each unique statement once and references it by hash:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Q1: Earth       β”‚      β”‚ Statement hash   β”‚
β”‚  population: X   │─────▢│ "population:8.9B"β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      (stored once! πŸ’Ύ)
β”‚  Q2: Country     β”‚
β”‚  population: X   │─────▢ Same hash, reuses storage
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This can reduce storage by 50%+ for typical Wikibase datasets!

Comparison with Wikibase Suite

Feature Entitybase Wikibase Suite
Interface REST API only REST API + web UI + wiki pages
Wiki pages ❌ None βœ… Article pages, talk pages, user pages
Frontend ❌ None βœ… Wikibase UI (Vue)
Authentication ❌ None (planned) βœ… Full user system
Capacity 1B+ entities, 1T+ statements ~100M entities (Wikidata)
Storage Immutable S3 snapshots + Vitess MySQL + blob storage
Statement Deduplication Hash-based (~50%+ storage reduction) None
JSON Output /entities/{id}.json wbgetentities API
TTL Output Turtle (alpha) Full support
RDF/XML Not planned Supported
NTriples Not planned Supported
Change Streaming RDF change events (alpha) None
Batch Updates No (separate calls) Yes (single API call)

Note: Entitybase is an alternative backend for structured data β€” not a drop-in replacement. Wikibase Suite is production-ready for full wiki sites.

Explore