Designing a Music Rights Data Model: Identifiers, Entities, and Relationship Patterns for Developers

A robust music rights data model is the foundation for accurate publishing, licensing, and royalty workflows in any production system. This guide gives developers, data architects, and music professionals a standards-aligned, production-ready blueprint for identifiers, canonical entities, relationship patterns, and practical techniques for reconciliation, provenance, and time-bounded rights. You will find concrete schemas, SQL and Neo4j examples, and mappings to DDEX, ISWC/ISRC, and IPI to help implement and validate real-world workflows.
1. Identifier Landscape and When to Trust Each Identifier
Direct point: Treat identifiers as scoped attributes, not absolute keys. Each identifier carries a bounded authority, reliability profile, and failure modes; your canonical record must record which source issued the identifier and how confident you are that it applies.
Core identifiers, authority, and typical producers
- ISWC - Issued by CISAC and registration agencies; scope is the musical work or composition. Use for canonical Work records. See CISAC ISWC docs.
- ISRC - Issued under IFPI guidance to labels and rights owners; scope is the sound recording. Use for Recording records and delivery reconciliation. See IFPI ISRC guide.
- IPI/CAE - Party identifiers assigned via PROs for songwriters and publishers; best for cross-system party matching.
- ISNI - Broader contributor identity registry useful for non-performance contributors and metadata linkage.
- GRid - Identifier for releases and release packages; used by distributors and aggregators.
- DDEX ERN / RIN - Message-level identifiers and resource references used in metadata exchanges; authoritative for feed provenance. See DDEX standards.
- CWR - Bulk works registration format used between publishers and PROs; source of bulk ISWC and share data when available.
How these fail in practice: Identifiers are missing, misapplied, or duplicated. Labels sometimes issue ISRCs to incorrect track versions. Works are often unregistered and lack ISWC. Parties are represented by conflicting local IDs across PRO exports. Relying blindly on any single identifier causes silent corruption of splits and payment flows.
Validation and normalization tips: Always validate format - for example ISWC has a check digit and prefix pattern - and store the issuing authority and timestamp. Enrich identifiers against authoritative registries where possible rather than trusting whatever came in the feed. Log the enrichment result and rate limit errors to avoid silent mismatches.
Canonical key strategy and fallback heuristics
Canonical key rule: Use a composite canonical key composed of ___CODE0 plus CODE1 and CODE_2___ for traceability. Do not make a single external identifier the sole primary key for your Work or Recording tables.
Fallbacks and matching hierarchy: Prefer exact identifier match. If absent, perform authoritative registry lookup, then high-confidence metadata match using normalized title + contributor IPI, then acoustic fingerprinting, then human review. Each step must write a confidence score and provenance pointer to the raw payload.
Concrete example: An incoming DDEX ERN arrives without ISWC. Your pipeline should attempt a CISAC/PRO lookup for ISWC by contributor IPI and title, then query MusicBrainz for matching recordings and fingerprints, and if still unresolved create a provisional Work record tagged ___CODE0 with the ERN as CODE1___. Flag the record for manual adjudication if confidence is below threshold.
Prioritize identifier provenance over identifier presence - recording who issued the ID and when is as important as the ID itself.
Takeaway: build your canonicalization around authority and provenance, not around the illusion of a universal id. Next consideration - design ownership and time-bounded assertions to reference those composite identifier records.
2. Canonical Entities to Model and Their Core Attributes
Start with entities, not documents. A robust music rights data model treats canonical entities as the source of truth for downstream licensing, reporting, and reconciliation—not the incoming file formats. Design each entity around its operational responsibilities: identification, provenance, and time-bounded assertions.
Core canonical entities (what to store and why)
- Work — ___CODE0, CODE1, *ISWCCODE2originalLanguageCODE3firstReleaseDateCODE4contributorsCODE5partyIdCODE6roleCODE7IPICODE_8___canonicalTitles` (normalized variants)
- Recording — ___CODE0, CODE1, *ISRCCODE2durationCODE3audioFingerprintCODE4derivedFromWorkIdsCODE5___releaseIds` (many)
- Release — ___CODE0, CODE1, CODE2/catalogNumberCODE3releaseDateCODE4labelIdCODE5trackListingsCODE_6___recordingId`)
- Party — ___CODE0, CODE1, CODE2 (person|organization), CODE3/CODE4, CODE5___ (writer, publisher, performer), contact and contractual pointers kept in local records
- Agreement — ___CODE0, CODE1, CODE2, CODE3, CODE4, CODE5 (links to CODE_6___)
- RightInstance — ___CODE0, CODE1 (performance|mechanical|synchronization|distribution), CODE2, CODE3 flag, CODE4, CODE5, CODE_6___
- OwnershipShare — ___CODE0, CODE1, CODE2, CODE3, CODE4, CODE5, CODE6, CODE7, CODE_8___
- Territory — ISO codes, aggregations, and common groupings used by business logic
- Event — usage or registration events (___CODE0, CODE1, CODE2, CODE3___) for provenance and audit
Practical insight: Model OwnershipShare as numerator/denominator, not a floating percentage. Fractional fields avoid rounding drift during split aggregation and preserve exact arithmetic for royalty calculations and audits.
Design trade-off: Normalize ___CODE0 and CODE1___ to avoid repeated contact and contract data, but denormalize a read-optimized view (cached canonical record) for fast payout queries. In practice we keep strict normalized tables for authoritative updates and a denormalized materialized view for batch royalty runs.
Concrete example: When ingesting a DDEX ERN, create or update a ___CODE0 with CODE1 and contributor CODE2 links, create CODE3 rows for each sound recording with CODE4 and CODE5, and emit CODE6 records from ERN contributor share fields referencing the ERN as CODE7. Sample minimal JSON: CODE_8___.
| Entity | Minimal key attributes |
|---|---|
| Work | workId, title, ISWC, contributors[], firstReleaseDate |
| Recording | recordingId, title, ISRC, duration, fingerprint, derivedFromWorkIds[] |
| OwnershipShare | ownershipShareId, ownerPartyId, numerator, denominator, effectiveFrom, sourceRecordId |
Final judgment: Most problems come from under-modeling time and provenance, or collapsing Release and Recording. Model temporal validity and source references from day one; you will thank yourself when you need to answer who owned what in a past reporting period.
3. Relationship Patterns and Cardinality Rules
Direct point: relationships in a music rights data model are rarely one-to-one; design for many-to-many by default and make the exceptions explicit and enforced.
Common relationship archetypes
| Relationship | Typical cardinality | Implementation note |
|---|---|---|
| Work ↔ Recording | Many-to-many | Use a join table work_recording_map with an edge_type (e.g., arrangement, cover, sample) and optional confidence |
| Recording → Release | Many-to-one (track appears on one release record per release instance) | release_track with track_position and release_catalogue_number to handle multiple releases |
| Party ↔ Work (ownership) | Many-to-many via OwnershipShare | Store fractional shares as numerator/denominator and make shares time-bounded (effectiveFrom/effectiveTo) |
| Agreement → RightInstance | One-to-many | RightInstance captures rightType, territory, exclusivity, and term; agreements reference those instances |
Cardinality rules you must codify: enforce that for any single RightInstance scoped to a work/territory/time window the set of active OwnershipShare records represents the authoritative split. Do not rely on free-text percentages in agreements as the source of truth.
- Active-sum rule: for each (workid, righttype, territory, time-window) active shares should sum to 1 (or the agreed total) — validate on ingest.
- Non-overlap rule: ownership shares that are exclusive must not overlap in time for the same right and territory; model overlaps as explicit transitions with provenance.
- Edge-type rule: relationship edges need a typed qualifier (e.g., derivedfrom, interpolated, containssample) so downstream royalty logic can treat them differently.
Trade-off: enforcing arithmetic and non-overlap at the database constraint level is safe but brittle when ingesting messy external feeds. In practice, validate and auto-reject only the clearly invalid records; flag ambiguous conflicts for human review and store raw payloads to allow later correction.
Concrete example: a medley recording contains three compositions. Represent that with three ___CODE0 rows linking Recording R123 to Works W1, W2, W3 and create OwnershipShare records for each songwriter with effective ranges tied to the recording's release date. When one component later reverts to a prior publisher, add a new OwnershipShare with a later CODE1___ and sourceRecordId pointing to the reversion agreement.
Graph vs relational judgment: use relational tables and JSONB for canonical, auditable records and constraints; add a graph index or Neo4j replica for fast traversals like find-all-derived-recordings or transitive publisher networks. Do not model provenance only as graph properties — keep authoritative time-bounded fields in the canonical store for accounting.
Important: treat territorial and exclusivity attributes as first-class parts of relationship keys — failing to do so creates silent double-payments and reconciliation headaches.
Next consideration: once you codify these cardinality rules, build automated tests that simulate common edge cases — medleys, samples, reverted publisher rights, and territory-limited exclusives — and assert both the arithmetic of shares and the provenance links back to source messages like DDEX ERN. For reference and mapping guidance see DDEX standards and store raw payloads as described on the canonical ingestion checklist at UniteSync.
4. Modeling Ownership Splits, Provenance, and Time-bounded Rights
Ownership must be modeled as immutable, auditable assertions that are valid for a time window and trace back to a source. Treat each share as a first-class record rather than a mutable field on a Work or Party.
Practical record layout: create an ___CODE0 record with CODE1, CODE2 (writer/publisher/owner), CODE3, CODE4, CODE5, CODE6, CODE7 (ISO codes or coverage list), CODE8, CODE9, CODE10, CODE11, and CODE_12___ linking to a change event. Using an integer numerator/denominator avoids repeated rounding errors when splitting further or aggregating across multiple rights owners.
Provenance and audit chain
Store raw source payloads but keep them separate from canonical joins. Persist the incoming DDEX ERN/CWR/PRO payload in a raw store or a compressed ___CODE0 archive and reference it with CODE1. Keep an CODE_2___ table that records who ingested or changed the share, why (agreement id, revocation, correction), and a hash of the raw payload so you can prove the exact message used to compute a split.
Trade-off to accept: raw payload retention increases storage and backs up privacy concerns; compress and tier older payloads to cold storage but never delete the pointer from the canonical row. Deleting provenance kills forensic reconciliation and legal defensibility.
Calculating payable splits: to compute a split for a given play date and territory, query shares where effectiveFrom <= playDate < effectiveTo and territory matches, then sum applicable fractions grouped by payee role. Prefer authoritative sources by right-type—use PRO-registered splits for performance royalties and publisher contracts for mechanicals when discrepancies appear.
Concrete example: a songwriter assigned publishing to Publisher A (exclusive) from ___CODE0 until a reversion on CODE1. You keep two CODE2 rows: one with CODE3 sourced to the original publishing agreement and a second starting CODE_4___ sourced to the reversion notice (ERN or agreement). A play on 2019-05-01 picks the first row; a play on 2021-03-01 picks the second.
Common failure mode: overwriting the previous share instead of closing it. In practice this causes historical royalty queries to be irreproducible and creates audit disputes. Always append a new time-bounded row and mark the old row closed.
- Conflict resolution: implement a ranked source authority map (PRO > Publisher > Label > Aggregator) and use
confidenceScoreplus source rank to choose which source to trust for each right type. - Indexing: build a composite index on ___CODE0 and consider partitioning by CODE1 or CODE_2___ for large catalogs.
- Derived fields: store a normalized decimal
computedSharefor fast payout math but recompute when provenance or numerator/denominator changes.
| Field | Purpose |
|---|---|
| shareNumerator / shareDenominator | Exact fractional ownership to avoid rounding cascade |
| effectiveFrom / effectiveTo | Time interval for which the ownership assertion applies |
| sourceSystem / sourceRecordId | Link to raw ERN/CWR/Agreement for audit and legal trace |
| confidenceScore | Operational flag for human review and reconciliation rules |
5. Mapping to Industry Message Standards and Ingest Patterns
Direct mapping to message standards is not optional — it is the most practical way to reduce reconciliation work and capture provenance. Design your ingest to treat DDEX ERN/RIN, CWR, and PRO exports as authoritative source documents, not just data rows to be parsed and discarded.
Key distinction: use ___CODE0 for resource registration and metadata, CODE1 for rights and usage updates, and CODE_2___/PRO exports for bulk works registration where DDEX is not available. In practice RIN carries rights-relevant assertions while ERN supplies the canonical resource metadata; both must be preserved as raw payloads for audits. See DDEX, CISAC ISWC guidance, and IFPI on recording identifiers at IFPI.
Practical ingest pipeline stages
- Validate: schema checks and message signature where available; reject malformed ERN/RIN early.
- Normalize: canonicalize identifier formats (strip separators, uppercase ISWC/ISRC), normalize names and territory codes.
- Enrich: lookup missing ISWC/ISRC/IPI against registries; attach acoustic fingerprints when recordings present.
- Match & score: identifier-first matching, fallback to metadata fuzzy match with confidence score.
- Merge rules: apply source priority and versioning; keep prior records as time-bounded assertions rather than overwrite.
- Persist raw payload: append-only raw store or JSONB column with ___CODE0, CODE1, and CODE_2___.
- Emit derived records: create canonical Work/Recording/OwnershipShare rows and index for queries.
Tradeoff and limitation: streaming RIN updates work well for near-real-time reconciliation but increase complexity around idempotency and ordering. Batch CWR files are simpler to process but arrive stale and lack recording-level granularity. Most production systems use hybrid: streaming for updates, periodic batch reconciliation against bulk files.
Concrete example: an aggregator sends an ERN lacking ISWC but with contributor roles and share percentages. Your pipeline should validate the ERN, enrich by querying CISAC/IPI and ISWC registries, create OwnershipShare records referencing the original ERN sourceRecordId, and store the ERN payload in a JSONB raw table so an auditor can replay the enrichment and matching decisions later.
Canonical field -> DDEX mapping (common fields)
| Canonical field | DDEX ERN/RIN path | Notes / handling |
|---|---|---|
| Work.title | WorkTitle / workTitle | Normalize whitespace and diacritics; store original in raw payload |
| Work.iswc | WorkReference / WorkID where WorkIDType = ISWC | Enrich when missing; validate checksum |
| Recording.isrc | SoundRecordingReference / SoundRecordingID where IDType = ISRC | ISRC often assigned by label; treat as high-confidence match |
| Contributor.role + IPI | ContributorList / Party / PartyId (IPI/ISNI) | Map to Party records; fallback to name-based match if IPI absent |
| OwnershipShare | ContributorList / Share / Percentage or Split | Convert to numerator/denominator; record source ___CODE0 and CODE1___ |
Do not trust percentage fields alone. Convert to numerator/denominator and retain the original percentage string from the message to avoid rounding disputes during royalty calculations.
Judgment: prioritise preserving source messages and provenance over attempting perfect automated resolution at ingest time. Misapplied IDs and inconsistent contributor roles are the usual failure modes; keeping the raw message and a deterministic score-based matching record lets you correct mappings without losing traceability. Next consideration: define the source priority table and idempotency keys before you process your first feed.
6. Implementation Patterns: Relational, Document, and Graph Models with Examples
Direct point: pick the storage model that matches the workload: use a relational core for accounting and canonical records, JSONB documents for ingest and provenance, and a graph for relationship exploration — and accept that a hybrid architecture is the realistic outcome for production music rights systems.
Relational core (what to store and why)
Relational strength: ACID guarantees, predictable joins, and auditability make RDBMS the right primary store for ___CODE0, CODE1, CODE2, CODE3, and ledger-like tables. Define CODE4 with numerator/denominator, CODE5/CODE6, CODE7 and CODE8 to preserve provenance. Example DDL line: CODE9___
Document layer (Postgres JSONB for ingest and denormalization)
Document strength: JSONB handles raw DDEX ERN, full source payloads, and denormalized canonical records for fast reads. Store raw payloads in ___CODE0 and canonical snapshots in CODE1 as JSONB. Use a CODE2 index on CODE3 and path indexes for CODE_4___ to accelerate enrichment. This keeps the canonical relational model small while preserving forensic detail.
Graph layer (Neo4j for traversal and discovery)
Graph strength: use a graph for transitive, multi-hop queries that are awkward in SQL — find all recordings that sample a work, discover publisher networks, or compute contributor reach. Store ownership splits on edges: ___CODE0. Example Cypher: CODE1___.
- Indexes to add: btree on ___CODE0, btree on CODE1, GIN on CODE2, partial index on CODE3 where CODE_4___ for current ownership lookups
- Sync pattern: write-to-relational first for financial operations, mirror denormalized JSONB for read APIs, and asynchronously push relationship changes to the graph with an event stream
| Model | When to use | Limitations |
|---|---|---|
| Relational (Postgres) | Accounting, reconciliation, canonical sources of truth | Complex relationship traversals become expensive at scale |
| Document (JSONB) | Raw payload retention, fast denormalized reads, schema-flexible fields | Harder to enforce cross-document constraints and share normalization |
| Graph (Neo4j) | Exploration, sampling chains, publisher networks, lineage queries | Not ideal for financial ledgers or bulk analytical aggregations |
Concrete example: to answer where royalties should flow for a sampled recording, join recordings -> work_recording_map -> ownership_shares in SQL to compute splits for the pay period, while using a graph traversal to surface all upstream sampled works and their publisher networks for human review. In practice this hybrid query pattern reduces reconciliation errors and keeps payouts auditable.
Next consideration: define clear sync contracts and idempotent events between systems before you implement the hybrid stack — mismatched ownership records across stores are the single biggest operational failure mode.
7. Reconciliation, Matching Heuristics, and Operational Considerations
Direct assertion: Reconciliation is not a one-off algorithm — it is an operational pipeline with measurable stages, failure modes, and human checkpoints. Treat matching as an engineered workflow: deterministic fast-paths, probabilistic slow-paths, and an auditable manual review loop.
Matching priority and thresholds
- Exact identifier match: ISWC for works, ISRC for recordings, IPI for parties. Auto-accept and link when identifiers and source authority match.
- High-confidence metadata match: Normalized title + contributor set (prefer IPI) + duration window. Use for auto-accept only when confidence > 0.95.
- Acoustic fingerprint: Confirm or disambiguate recordings when IDs are missing. Fingerprints are powerful but can conflate covers or remasters.
- Fuzzy / ML match: Use as a last-resort ranking signal; always surface confidence and provenance and queue below a strict threshold.
- Manual review: For medleys, samples, or conflicting shares; ensure reviewers see full source payloads (store raw ERN/CWR) and previous merge history.
Practical insight: Title normalization matters more than you think. Normalize unicode, remove textual noise (feat, remix tags), collapse whitespace, and compare with trigram similarity plus contributor-weighting. This reduces false positives from minor punctuation differences without turning on expensive ML.
| Heuristic | Typical confidence range | Action |
|---|---|---|
| Exact ID (ISWC/ISRC + IPI) | 0.99–1.00 | Auto-link; log provenance |
| Normalized metadata (title+contributors+duration) | 0.85–0.97 | Auto-link if >0.95; otherwise queue |
| Acoustic fingerprint | 0.80–0.98 | Use to confirm; require metadata match for auto-link |
| ML fuzzy match | 0.60–0.90 | Rank candidates; always human-validate under 0.90 |
Concrete example: An incoming DDEX ERN lacks ISWC but includes contributors with IPI and an audio file. Pipeline step 1 normalizes the title and matches contributors to internal party IDs (confidence 0.87). Step 2 runs an acoustic fingerprint which matches an existing recording at 0.93; because contributor mapping is present and fingerprint is high, the system marks for human review with suggested merge and pre-filled OwnershipShare candidates.
Trade-off and limitation: Relying heavily on fuzzy or ML matches increases match coverage but proportionally raises false positives and audit burden. In practice, prioritize high-precision heuristics for payable calculations, and allow lower-precision matches only for discovery or enrichment workflows.
Operational details that matter: Partition reconciliation by territory and effective date to avoid mixing current and historical rights. Use incremental pipelines: ingest -> enrich -> candidate generation -> score -> action. Indexes to prioritize: GIN on JSONB payloads, trigram on normalized titles, and specialized indexes for fingerprints or LSH buckets for fast nearest-neighbor.
Human-in-the-loop is not optional: for ambiguous splits, medleys, samples, or when sources disagree on share fractions; surface provenance and sourceRecordId in the review UI.
Judgment: Don’t trust a single canonical source blindly. Give priority by source authority (PRO/registry > label > aggregator > third-party metadata) and record the priority used in the merge. When in doubt, prefer conservative outcomes for royalty payments and expose the uncertainty in downstream accounting.
Next consideration: Implement reconciliation as observable software: versioned records, immutable raw payloads, confidence scores on every linkage, and dashboards that drive staffing for manual review rather than hoping automation will catch everything.
8. Sample End-to-End Example and Implementation Checklist
Direct assertion: implement ingestion as a small, auditable pipeline stage that preserves raw source payloads, performs identifier enrichment asynchronously when possible, and writes time-bounded OwnershipShare records that always reference the source message. Doing that keeps reconciliation deterministic and reduces manual cases later.
End-to-end example: DDEX ERN → canonical records → verification
Concrete Example: an incoming DDEX ERN arrives for a new composition with contributors but no ISWC. The pipeline stores the raw ERN, attempts ISWC/IPI enrichment, matches to an existing work by title+contributors (low confidence), creates a new canonical work when confidence is below threshold, and writes OwnershipShare rows with sourceRecordId pointing to the raw ERN. This keeps a clear audit trail even if the external registry later issues an ISWC.
Sample SQL inserts (Postgres style): insert raw payload, then canonical work and ownership share. Use jsonb for raw payloads and explicit numerator/denominator for shares.
INSERT INTO raw_payloads(id, source_system, payload, received_at) VALUES (uuid_generate_v4(), DDEX_ERN, $1::jsonb, now());
INSERT INTO canonical_works(id, title, iswc, first_release_date, canonical_payload) VALUES (uuid_generate_v4(), Midnight Sun, NULL, 2025-02-14, $canonical_json::jsonb) RETURNING id;
INSERT INTO ownership_shares(id, work_id, owner_party_id, role, share_numerator, share_denominator, effective_from, effective_to, source_system, source_record_id, confidence_score) VALUES (uuid_generate_v4(), , , writer, 1, 2, 2025-02-14, NULL, DDEX_ERN, , 0.7);
Sample canonical Work JSON (stored in canonical_payload): {title:Midnight Sun,iswc:null,contributors:[{name:Alex Rivera,ipi:00000012345,role:writer,share:{num:1,den:2}}],sourceReferences:[{system:DDEX_ERN,id:ern-2025-0001}]}
Verification Cypher (Neo4j) to check linkage and latest effective share: MATCH (w:Work {iswc:T-123.456.789-0})-[:HAS_RECORDING]->(r:Recording) OPTIONAL MATCH (p:Party)-[s:OWNS]->(w) WHERE s.effective_from <= date() AND (s.effective_to IS NULL OR s.effective_to > date()) RETURN w.title, r.isrc, p.name, s.share_numerator, s.share_denominator, s.source_record_id LIMIT 50;
- Checklist - mandatory items: raw source retention (___CODE0 append-only), normalized identifier columns (iswc, isrc, ipi), CODE1 with numerator/denominator, CODE2 and CODE3___, and confidence_score for human triage.
- Checklist - indexes & performance: btree on ___CODE0, CODE1, CODE2; partial index on active shares (CODE3); GIN index on CODE_4___ for ad-hoc lookups.
- Checklist - workflows: asynchronous enrichment queue for ISWC/IPI lookups, human review queue for low-confidence matches, and periodic reconciliation jobs against PRO exports.
- Checklist - tests & data: unit tests for splitting math (use integer arithmetic), integration tests with a synthetic DDEX ERN sample, and a small seed from MusicBrainz or DDEX sample payloads.
Trade-off to accept: synchronous enrichment gives immediate canonical keys but increases latency and external dependency surface — prefer async enrichment with an idempotent reconciliation pass and escrowed provisional payouts if business requires prompt payment.
effective_to and create a new row for changes. That preserves auditability and lets you re-run royalty calculations for past accounting periods.Operational judgment: implement numerator/denominator arithmetic and integer-based rollups; floating percentage fields introduce rounding drift across millions of payouts and create reconciliation headaches.
Next consideration: after deployment, instrument KPIs — match rate, unresolved records, enrichment latency, and reconciliation lag — and bake those into SLAs with your data providers and payout processes.
AUTHOR

Charly
Carlos Palop is a seasoned music publishing expert, adept in rights management and royalty distribution, ensuring artists' works are protected and profitably managed. Their strategic expertise and commitment to fair practices have made them a trusted figure in the industry.



