RDF Builder Implementation
This document describes the RDF builder implementation for wikibase-backend, including algorithms, design decisions, and references to MediaWiki Wikibase source code.
Overview
The RDF builder converts Wikibase entities from JSON to RDF Turtle format, producing both full and truthy RDF exports that match Wikidata's output.
Blank Node Generation
Algorithm
Blank node IDs for owl:complementOf statements are generated using MD5 hashing to ensure stability across multiple exports.
MediaWiki Wikibase Source: mediawiki-extensions-Wikibase/repo/includes/Rdf/PropertySpecificComponentsRdfBuilder.php:218
$stableBNodeLabel = md5(implode('-', ['owl:complementOf', $repositoryName, $localName]));
Our Implementation:
def _generate_blank_node_id(property_id: str) -> str:
repository_name = "wikidata"
hash_input = f"owl:complementOf-{repository_name}-{property_id}"
return hashlib.md5(hash_input.encode()).hexdigest()
Example: For property P31, the blank node ID is _:0b8bd71b926a65ca3fa72e5d9103e4d6
RDF Output:
wdno:P31 a owl:Class ;
owl:complementOf _:0b8bd71b926a65ca3fa72e5d9103e4d6 .
_:0b8bd71b926a65ca3fa72e5d9103e4d6 a owl:Restriction ;
owl:onProperty wdt:P31 ;
owl:someValuesFrom owl:Thing .
Property Predicate Structure
Predicate Types
| Prefix | Namespace | Purpose |
|---|---|---|
wd: |
http://www.wikidata.org/entity/ | Entity URIs |
wdt: |
http://www.wikidata.org/prop/direct/ | Direct claim properties |
p: |
http://www.wikidata.org/prop/ | Claim predicates |
ps: |
http://www.wikidata.org/prop/statement/ | Statement property values |
psv: |
http://www.wikidata.org/prop/statement/value/ | Statement value nodes |
pq: |
http://www.wikidata.org/prop/qualifier/ | Qualifier predicates |
pqv: |
http://www.wikidata.org/prop/qualifier/value/ | Qualifier value nodes |
pr: |
http://www.wikidata.org/prop/reference/ | Reference predicates |
prv: |
http://www.wikidata.org/prop/reference/value/ | Reference value nodes |
wdno: |
http://www.wikidata.org/prop/novalue/ | No-value classes for restrictions |
Predicate Declarations
All predicate predicates (p:, ps:, pq:, pr:, wdt:) are declared as owl:ObjectProperty.
MediaWiki Wikibase Source: mediawiki-extensions-Wikibase/repo/includes/Rdf/PropertySpecificComponentsRdfBuilder.php:125-158
Key Finding: psv:, pqv:, and prv: predicates are always written as owl:ObjectProperty, regardless of property datatype.
psv:P31 a owl:ObjectProperty .
pqv:P31 a owl:ObjectProperty .
prv:P31 a owl:ObjectProperty .
Entity Metadata Loading
Required JSON Structure
Referenced entities (e.g., Q17633526 used as a statement value) must have metadata files with the following structure:
{
"id": "Q17633526",
"labels": {
"en": {
"language": "en",
"value": "Wikinews article"
}
},
"descriptions": {
"en": {
"language": "en",
"value": "used with property P31"
}
}
}
Important: The id field is required. Without it, parse_entity extracts an empty string for the entity ID, causing invalid RDF output like wd: instead of wd:Q17633526.
Metadata Download
Use the provided script to download entity metadata from Wikidata:
# Find and download all referenced entities
source .venv/bin/activate
python scripts/download_missing_entity_metadata.py
Location: test_data/entity_metadata/{entity_id}.json
Implementation Files
| File | Purpose |
|---|---|
src/models/rdf_builder/converter.py |
Main entity to RDF conversion orchestration |
src/models/rdf_builder/writers/property_ontology.py |
Property metadata and ontology generation |
src/models/rdf_builder/writers/triple.py |
Triple writing for entities and statements |
src/models/rdf_builder/property_registry/registry.py |
Property datatype and predicate management |
src/models/rdf_builder/entity_cache.py |
Entity metadata loading from SPARQL |
References to MediaWiki Wikibase
Key Files Studied
PropertySpecificComponentsRdfBuilder.php: Blank node generation and predicate declarationsRdfBuilder.php: Main RDF builder architecture and factory patternRdfVocabulary.php: Namespace and prefix management
Design Patterns Adopted
- Factory Pattern: Multiple entity-specific RDF builders for different entity types (items, properties, lexemes)
- Hash-based Deduplication:
HashDedupeBagfor efficient duplicate detection - Stable Blank Nodes: MD5-based hashing for consistent RDF exports
Known Differences from MediaWiki Wikibase
Currently, our implementation aims to produce identical RDF output to MediaWiki Wikibase. Any intentional differences will be documented here.
None currently.