Skip to content

Project Structure πŸ“

Understanding how the Entitybase codebase is organized. Let's make sense of it all!


Architecture in 3 Sentences 🧠

  1. You (clients) talk to the REST API
  2. The API stores data in S3 (permanent storage) and indexes it in Vitess (fast lookups)
  3. Everything is built around immutable revisions β€” once written, never changed

The 3 Main Parts πŸ—οΈ

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    REST API  │───▢│     S3       β”‚    β”‚   Vitess    β”‚
β”‚  (FastAPI)   β”‚    β”‚  (storage)   β”‚    β”‚  (indexing) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚                   β”‚                   β”‚
     β”‚              Immutable            Fast lookups
     β”‚              snapshots            + queries
     β”‚
     β–Ό
(what you talk to)

Directory Layout

src/models/
β”œβ”€β”€ config/           # Configuration and settings
β”œβ”€β”€ data/             # Data models (Pydantic)
β”‚   β”œβ”€β”€ config/       # Data config models
β”‚   β”œβ”€β”€ infrastructure# Infra data models (S3, Vitess records)
β”‚   β”œβ”€β”€ rest_api/     # API request/response models
β”‚   └── workers/      # Worker data models
β”œβ”€β”€ infrastructure/   # External service integrations
β”‚   β”œβ”€β”€ s3/           # S3 storage client
β”‚   β”œβ”€β”€ stream/       # Event streaming
β”‚   └── vitess/       # Database repositories
β”œβ”€β”€ internal_representation/  # Core domain models
β”œβ”€β”€ json_parser/      # JSON parsing (Wikidata format β†’ internal)
β”œβ”€β”€ rdf_builder/     # RDF generation (internal β†’ Turtle/XML)
β”œβ”€β”€ rest_api/         # FastAPI endpoints and handlers
β”œβ”€β”€ services/         # Business logic layer
β”œβ”€β”€ utils/            # Shared utilities
β”œβ”€β”€ validation/      # Input validation
└── workers/          # Background workers

tests/                # Test suite
docs/                 # Documentation
schemas/              # JSON schemas for S3 data formats

Key Concepts

  • Internal Representation - Domain models (Entity, Statement, Value)
  • JSON Parser - Converts Wikidata JSON β†’ Internal models
  • RDF Builder - Converts Internal models β†’ RDF Turtle/XML
  • Repositories - Database access layer (Vitess)
  • Services - Business logic between API and repositories

Stack

  • API: FastAPI
  • Database: Vitess (MySQL sharding)
  • Storage: S3 (immutable revisions)
  • Validation: Pydantic v2

What Each Part Does 🎯

src/models/rest_api/ β€” The Doorway πŸͺ

This is what clients talk to. It handles: - HTTP requests and responses - Input validation - Error handling

src/models/services/ β€” The Brain 🧠

The business logic layer. It: - Coordinates between API and storage - Implements core features - Contains the "rules" of the system

src/models/infrastructure/ β€” The Connectors πŸ”Œ

Integrations with external systems: - s3/ β€” Talks to S3 for storing revisions - vitess/ β€” Talks to Vitess for indexing - stream/ β€” Event streaming (change notifications)

src/models/internal_representation/ β€” The Core πŸ’Ž

The domain models β€” the heart of Entitybase: - Entity β€” Item, property, or lexeme - Statement β€” Claims about entities - Value β€” The actual data (strings, items, dates, etc.)

src/models/json_parser/ β€” The Translator 🌐

Converts Wikidata JSON format β†’ Internal models

src/models/rdf_builder/ β€” The Exporter 🐒

Converts Internal models β†’ RDF Turtle format for semantic web

src/models/workers/ β€” The Background Helpers βš™οΈ

Background jobs that run separately: - ID generation (creating Q1, P1, etc.) - Dump generation (exporting all entities) - RDF streaming (generating RDF changes)


Quick Mapping πŸ”—

You want to... Look in...
Add a new API endpoint rest_api/
Change how data is stored infrastructure/s3/
Change how data is indexed infrastructure/vitess/
Add a new entity type internal_representation/
Handle Wikidata JSON import json_parser/
Add RDF export format rdf_builder/
Add a background job workers/

See Also