Trustworthy by default.
Nothing hidden.

How DjeedX transforms your data and the open record into Silver — and why every step is public. Where enterprise structured-intelligence platforms treat methodology as proprietary IP, DjeedX treats it as the trust signal that justifies the SaaS purchase. The same discipline applies to every connector and every topic dataset inside the workspace.

ProvenanceBerkeley Protocol-readySigned receiptsBronzeSilverGold

Principle 01Provenance on every record.

Every record in every Djeed dataset carries a full audit trail: who created it, when, and every diff since. The history endpoint is read-only and exportable.

AI-extracted records also store the upstream extraction payload — the raw response from the model, not just our parsed interpretation — so any disputed value can be re-checked against what the model actually returned.

provenance bundle
source_urlhttps://…/article/123
captured_at2026-04-26T08:14:02Z
content_hashsha256:a4f5…
extracted_byagent v3.2
edits3 (last 2026-04-25)

Principle 02Berkeley Protocol-ready outputs.

The Berkeley Protocol on Digital Open Source Investigations (2020, co-developed by the UC Berkeley Human Rights Center and the United Nations) is the most rigorous open-source-research methodology in public circulation: how to collect, preserve, verify, and present open-source material so the work holds up under scrutiny. The principles travel across sectors. The same discipline shows up whether the work is tracking permit pipelines across municipalities, watching industrial emissions disclosures against open environmental data, following large-project financing flows through procurement portals, or building entity graphs from corporate registries for a parcel or asset — and DjeedX connects that discipline to your own internal records.

Two things make structured intelligence at scale possible without losing the discipline, and Djeed combines them deliberately. AI extraction reads the open record at a volume no human team can match — turning unstructured prose into typed claims, line by line, source by source. That is the GenAI side: a reading machine. The Berkeley-aligned methodology is not a layer that sits on top of it — it is embedded by construction. The original source URL, capture timestamp, content hash, raw extraction payload, and corroboration trail are captured at the moment of reading, which is what makes the methodology and the extraction inseparable. The OSINT methodology preserves the chain of custody; AI makes it feasible at the scale the modern public record actually requires. The investigator's judgement stays where it belongs — human. The mechanical reading does not.

PRINCIPLE

Provenance captured at the source

PRINCIPLE

Chain of custody preserved end-to-end

PRINCIPLE

Corroboration across independent sources

DjeedX doesn't apply the methodology for you — you do. What DjeedX restructures into your workspace is material that is ready to be used under it.

Every Silver-tier record carries the chain-of-custody attributes the Protocol expects: original source URL, capture timestamp, content hash, raw extraction payload, deduplication trail, and version history. The analyst — you — runs the verification, source-assessment, corroboration, and review steps the methodology calls for.

We power the workflow with data and tools — DjeedX, the API, the per-dataset evidence trail — that make Berkeley-Protocol-ready research possible at the speed and scale your work actually demands: whether a planning team is mapping zoning changes against developer filings, an environmental researcher is comparing corporate commitments to what filings actually disclose, or a due-diligence analyst is tracing ownership, litigation, and permit history across registries.

Principle 03We transform into Silver. Gold is yours to build.

The DjeedX pipeline restructures inputs through three layers:

BRONZESILVERGOLD
Bronze· Raw input

Your files, the open record, social-media signal — direct extraction with the original URL preserved. Internal layer; never surfaced as a finished dataset.

Silver· What DjeedX restructures into

Deduplicated, entity-resolved, cross-source corroborated, methodology-public. The form your records take inside the workspace, ready to query through table, graph, map, or pivot.

Gold· What you build

Verified in your context. Combine Silver with your own observations, partner reports, expert judgement. Built inside DjeedX, where the data already lives.

The handoff is intentional. DjeedX does the layered restructuring at scale. You do the analysis that combines Silver with what only you know. Neither side does the other's job.

Principle 04

Topic datasets are sourced, not opinions.

The topic datasets DjeedX maintains (urban-development activity, environmental change, infrastructure projects, public-sector decisions, and more) come from the same BronzeSilver pipeline that restructures your own inputs, with explicit source URLs on every record. Each record carries a citation edge to the originating source. Edit a topic-dataset record in your DjeedX workspace and your edits stay in your workspace — the upstream isn't mutated. That's the path from Silver to your own Gold.

Principle 05

Privacy on cross-workspace entity matches.

DjeedX can tell you that a partner organisation in your workspace also appears in N other workspaces — but never which ones. The other workspace owner has to opt in to disclose a contact. This is the only way cross-workspace insight is shared.

Principle 06What ships with every published dataset.

Each dataset on djeed.com comes with a signed methodology page that documents exactly what is inside and how it got there. The page is readable on the public catalog and exportable alongside any record export — it is the dataset's receipt.

Coring (unit of observation)

Every dataset declares its primary unit — what one row represents. Djeed datasets are cored at one of:

  • Event-centric — one row = one event (incident, decision, action), with linked actors, locations, dates, and sources.
  • Claim-centric — one row = one assertion by one source about a fact in the world. Atomic, never merged across sources.
  • Act-centric — one row = one action (often nested inside an event); granular for deep operational tracing.
  • Entity-centric — one row = one organisation, person, group, or asset, aggregated across all its appearances.
  • Indicator-centric — one row = one measure at one place at one time (statistical surface).

The choice depends on the buyer use case. The methodology page of each dataset declares the coring up front and ties every record back to that decision.

Bronze → Silver migration

Records start at Bronze (raw extraction with the source URL preserved). They become Silver when they pass the migration gate. The pipeline runs:

  1. URL dedup — already-processed sources are skipped before any extraction work happens.
  2. Spatial enrichment — lat / lon is reverse-geocoded to authoritative country / admin1 / admin2 codes via shapefile join, plus an H3 hex cell (resolution 7) for fast spatial blocking.
  3. Entity resolution — fuzzy name matching collapses spelling variants of the same actor across sources.
  4. Cross-source corroboration — multiple claims about the same event are linked together, with confidence scored from source reliability + claim agreement.
  5. Promotion gate — a record is promoted to Silver when corroboration confidence ≥ 0.7 AND at least three independent sources back it. Below the gate it stays Bronze and is not published.
Deduplication scoring

At the dedup stage, every candidate pair is scored across five dimensions:

  • Spatial proximity — same H3 cell or within km distance threshold.
  • Temporal proximity — same day, or within N days when date precision is fuzzy.
  • Semantic similarity — claim-text embeddings cosine similarity above threshold.
  • Category match — primary + secondary action categories must align.
  • Entity match — actor / victim names resolve to the same canonical entity.

Pairs above the auto-merge threshold (typically same-day, within 5 km, same category) merge automatically. Borderline pairs go to AI review; hard cases escalate to human review. Every merge keeps a trail back to the source records — provenance is never lost.

Final dataset output

A subscriber sees a Silver-tier dataset with:

  • Typed rows following the coring (one event / claim / act / entity / indicator per row).
  • Per-record provenance bundle — source URLs, capture timestamp, content hash, raw extraction payload, full edit history.
  • Spatial fields — lat / lon, country, admin1, admin2, H3 cell.
  • Temporal fields — event date, date precision, week / month / year buckets.
  • Confidence score per record.
  • Verification status field (unverified / verified / disputed).
  • Linked entities, events, sources — explorable in the per-dataset graph view, mappable in the map view, sortable in the table view.

Output formats: CSV, JSON, GeoJSON, Excel. Live API + OData endpoints for Power BI and Tableau on higher-tier subscriptions.

RoadmapOpen questions we're still working on.

Per-record confidence-decay over time for AI-extracted records.

Public methodology versioning — every breaking schema change tagged with a migration note + diff endpoint.

Independent third-party methodology audits on the highest-evidence datasets.

A formal Silver-to-Gold playbook — patterns for how teams cook Djeed Silver into decision-ready Gold inside their own workflow.

Per-dataset Berkeley-Protocol coverage matrix — for each Silver dataset, which methodology attributes are guaranteed, which are best-effort, which are out of scope.

Built to be inspected. Trusted by default.

Start a workspace and see the methodology in action — or talk to us about a tailored build.