Entitybase Backend ποΈ
An API-first, billion-scale backend for structured knowledge β items, properties, and statements without the wiki.
What Problem Does This Solve? π€
Imagine you're building the next Wikidata β a database that holds structured knowledge about everything in the world. Millions of items, billions of statements, and thousands of edits per second. π±
The problem? Traditional databases overwrite data. When someone edits an item, the old version disappears. You can't easily: - See who changed what and when - Roll back to an old version - Know what changed between two edits - Trust that your data hasn't been silently modified
Entitybase solves this by treating every edit as an immutable snapshot β like Git for structured data. Once written, an edit can never be changed or deleted. Ever. This gives you perfect auditability, easy rollbacks, and rock-solid consistency.
Who Is This For? π₯
Entitybase is perfect for:
- π App developers β Building apps that need structured knowledge without a full wiki
- π¬ Research platforms β Need a clean API for knowledge graphs and structured data
- ποΈ Wikimedia projects β Want just the data model (entities, statements) without MediaWiki
- π Data engineers β Need clean RDF exports of structured knowledge
- π οΈ Custom solutions β Building "knowledge base" apps without wiki overhead
The 3-Second Pitch π£
Entitybase = Git for structured knowledge
Every edit creates an immutable snapshot in S3. Vitess handles fast lookups. REST API gives you full CRUD. Built for 1 billion+ entities and 1 trillion statements.
TL;DR Quick Facts β‘
| Capability | Value |
|---|---|
| Interface | REST API only (no wiki pages) π |
| Authentication | None yet (planned) π |
| Capacity | 1B+ entities π |
| Statements | 1T+ unique statements |
| Revisions | Immutable (never overwritten) π |
| API | 122 REST endpoints |
| Storage | S3 (snapshots) + Vitess (indexing) |
| Exports | JSON, Turtle (RDF) π’ |
| Version | 0.1.0 π§ |
Architecture in 3 Boxes π¦
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β You βββββΆβ REST API βββββΆβ S3 β
β (clients) β β (FastAPI) β β (storage) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β
βΌ
βββββββββββββββ
β Vitess β
β (indexing) β
βββββββββββββββ
- You (clients) β API calls from your app, scripts, or frontend
- REST API (FastAPI) β Validates, processes, and routes requests
- S3 (storage) β Immutable snapshots of every entity revision
- Vitess (indexing) β Lightning-fast lookups and queries
π‘ Think of it like this: S3 is the permanent record (π), Vitess is the index in the back of the book (π), and the API is the librarian (π©βπ«)
Key Features β¨
- Immutable Revisions β Every edit creates a snapshot you can never overwrite
- 122 REST Endpoints β Full CRUD for entities, terms, statements, users, and more
- Complete Lexeme Support β Forms, senses, lemmas, glosses, and lexical categories
- Statement Deduplication β Hash-based storage can reduce storage by 50%+
- RDF Export β Turtle format for semantic web integration π’
- Entity Protection β Full locks, semi-protection, archiving, and mass-edit protection
- User Features β Watchlists, endorsements, thanks, and activity tracking
- Horizontal Scaling β Vitess sharding + S3 for infinite scale
Why This Design? π§
You might ask: "Why not just use MySQL like everyone else?"
The Problem with Mutable Data
Traditional database: Entitybase:
βββββββββββββββ βββββββββββββββ
β Item Q1 β β Rev 1: Q1 ββββΆ S3 snapshot
β (current) β β Rev 2: Q1 ββββΆ S3 snapshot
βββββββββββββββ β Rev 3: Q1 ββββΆ S3 snapshot
(overwrites!) βββββββββββββββ
(append-only!)
Traditional databases overwrite data. You lose history. Entitybase appends immutable snapshots. You gain:
- β Perfect auditability β Who changed what, when
- β Easy rollbacks β Just point to an old revision
- β No data loss β Old versions are never deleted
- β Event sourcing β Replay events to rebuild state
- β Conflict resolution β Compare revisions without locking
Statement Deduplication
Statements (like "population β 8.9 billion") appear millions of times across entities. Entitybase stores each unique statement once and references it by hash:
ββββββββββββββββββββ ββββββββββββββββββββ
β Q1: Earth β β Statement hash β
β population: X βββββββΆβ "population:8.9B"β
ββββββββββββββββββββ ββββββββββββββββββββ
ββββββββββββββββββββ (stored once! πΎ)
β Q2: Country β
β population: X βββββββΆ Same hash, reuses storage
ββββββββββββββββββββ
This can reduce storage by 50%+ for typical Wikibase datasets!
Comparison with Wikibase Suite
| Feature | Entitybase | Wikibase Suite |
|---|---|---|
| Interface | REST API only | REST API + web UI + wiki pages |
| Wiki pages | β None | β Article pages, talk pages, user pages |
| Frontend | β None | β Wikibase UI (Vue) |
| Authentication | β None (planned) | β Full user system |
| Capacity | 1B+ entities, 1T+ statements | ~100M entities (Wikidata) |
| Storage | Immutable S3 snapshots + Vitess | MySQL + blob storage |
| Statement Deduplication | Hash-based (~50%+ storage reduction) | None |
| JSON Output | /entities/{id}.json |
wbgetentities API |
| TTL Output | Turtle (alpha) | Full support |
| RDF/XML | Not planned | Supported |
| NTriples | Not planned | Supported |
| Change Streaming | RDF change events (alpha) | None |
| Batch Updates | No (separate calls) | Yes (single API call) |
Note: Entitybase is an alternative backend for structured data β not a drop-in replacement. Wikibase Suite is production-ready for full wiki sites.
Explore
- π Getting Started β Quick start guide (5 minutes!)
- π Tutorial β Hands-on step-by-step walkthrough
- βοΈ Setup β Environment setup
- π Project Structure β Codebase overview
- ποΈ Architecture β Deep dive into system design
- β¨ Features β API endpoints, statement deduplication, bulk operations
- π’ Wikidata β Wikidata integration
- π Diagrams β Visual architecture
- π Glossary β Domain terms explained
- β‘ Quick Reference β One-page command reference