Project Structure π
Understanding how the Entitybase codebase is organized. Let's make sense of it all!
Architecture in 3 Sentences π§
- You (clients) talk to the REST API
- The API stores data in S3 (permanent storage) and indexes it in Vitess (fast lookups)
- Everything is built around immutable revisions β once written, never changed
The 3 Main Parts ποΈ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β REST API βββββΆβ S3 β β Vitess β
β (FastAPI) β β (storage) β β (indexing) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
β Immutable Fast lookups
β snapshots + queries
β
βΌ
(what you talk to)
Directory Layout
src/models/
βββ config/ # Configuration and settings
βββ data/ # Data models (Pydantic)
β βββ config/ # Data config models
β βββ infrastructure# Infra data models (S3, Vitess records)
β βββ rest_api/ # API request/response models
β βββ workers/ # Worker data models
βββ infrastructure/ # External service integrations
β βββ s3/ # S3 storage client
β βββ stream/ # Event streaming
β βββ vitess/ # Database repositories
βββ internal_representation/ # Core domain models
βββ json_parser/ # JSON parsing (Wikidata format β internal)
βββ rdf_builder/ # RDF generation (internal β Turtle/XML)
βββ rest_api/ # FastAPI endpoints and handlers
βββ services/ # Business logic layer
βββ utils/ # Shared utilities
βββ validation/ # Input validation
βββ workers/ # Background workers
tests/ # Test suite
docs/ # Documentation
schemas/ # JSON schemas for S3 data formats
Key Concepts
- Internal Representation - Domain models (Entity, Statement, Value)
- JSON Parser - Converts Wikidata JSON β Internal models
- RDF Builder - Converts Internal models β RDF Turtle/XML
- Repositories - Database access layer (Vitess)
- Services - Business logic between API and repositories
Stack
- API: FastAPI
- Database: Vitess (MySQL sharding)
- Storage: S3 (immutable revisions)
- Validation: Pydantic v2
What Each Part Does π―
src/models/rest_api/ β The Doorway πͺ
This is what clients talk to. It handles: - HTTP requests and responses - Input validation - Error handling
src/models/services/ β The Brain π§
The business logic layer. It: - Coordinates between API and storage - Implements core features - Contains the "rules" of the system
src/models/infrastructure/ β The Connectors π
Integrations with external systems:
- s3/ β Talks to S3 for storing revisions
- vitess/ β Talks to Vitess for indexing
- stream/ β Event streaming (change notifications)
src/models/internal_representation/ β The Core π
The domain models β the heart of Entitybase:
- Entity β Item, property, or lexeme
- Statement β Claims about entities
- Value β The actual data (strings, items, dates, etc.)
src/models/json_parser/ β The Translator π
Converts Wikidata JSON format β Internal models
src/models/rdf_builder/ β The Exporter π’
Converts Internal models β RDF Turtle format for semantic web
src/models/workers/ β The Background Helpers βοΈ
Background jobs that run separately: - ID generation (creating Q1, P1, etc.) - Dump generation (exporting all entities) - RDF streaming (generating RDF changes)
Quick Mapping π
| You want to... | Look in... |
|---|---|
| Add a new API endpoint | rest_api/ |
| Change how data is stored | infrastructure/s3/ |
| Change how data is indexed | infrastructure/vitess/ |
| Add a new entity type | internal_representation/ |
| Handle Wikidata JSON import | json_parser/ |
| Add RDF export format | rdf_builder/ |
| Add a background job | workers/ |
See Also
- π Getting Started β Quick start
- π Tutorial β Hands-on walkthrough
- ποΈ Architecture β Deep dive