Skip to main content
← All Updates

Data Sovereignty and AI: Why Consent, Not Control, Is the Real Issue

data-sovereignty AI copyright C2PA EU-AI-Act UK-legislation Jura-Trace

In 2023, a museum in the Netherlands discovered that thousands of images from its digitised collection had been used to train a commercial image generation model. Nobody had asked. Nobody had been told. The museum only found out because a researcher noticed that the model could reproduce visual elements unmistakably drawn from their archive.

This is not an isolated story. It has happened to photojournalists, authors, community archives, independent publishers, and cultural institutions on every continent. Between 2020 and 2024, the major generative AI companies built their training datasets by crawling the public internet at industrial scale. LAION-5B, widely used for image generation, contained 5.85 billion image-text pairs scraped from across the web. The economic logic was simple: if data is accessible, treat it as available. Copyright holders were not consulted. Licence terms were not checked.

The real problem is not scraping. It is who carries the burden.

In the absence of clear rules, an informal system of opt-out mechanisms has emerged. Robots.txt, a protocol from the 1990s, lets website operators signal which parts of their site should not be crawled. More recently, proposals for an ai.txt standard have appeared, designed specifically for AI training preferences.

The trouble with opt-out is that it puts the responsibility on the wrong side. It requires every rights holder to actively defend their work against extraction, rather than requiring AI companies to seek consent before using it.

Think about what that means in practice. A photographer in Lagos, a community archive in rural Wales, and a small-press publisher in Copenhagen all have to independently discover that their work is being scraped, understand the technical countermeasures available, and implement them correctly. They are defending against companies with billions in funding, teams of engineers, and legal departments built for exactly this kind of dispute. That is not a fair arrangement.

An opt-out regime also assumes that the default state of creative work is "available for extraction unless you say otherwise." That assumption is wrong. Copyright law in most jurisdictions grants rights holders control over how their work is used. The default should be that you do not use someone's work without permission. Opt-in, not opt-out, is the only framework that respects the people whose labour makes these systems possible.

If this kind of framing is unfamiliar, our glossary of AI terms covers the basics in plain language.

What the law is doing

The EU AI Act, which entered into force on 1 August 2024 and is applying in phases through 2027, is the most substantial response so far. It requires providers of general-purpose AI models to document the content used in training, publish summaries of that content, and comply with copyright opt-out signals under the EU Copyright Directive. In practical terms, this means AI companies operating in the EU can no longer train on European content in the dark. The transparency obligations are real, and the compliance deadlines are close.

The UK has taken a slower path. A 2022 proposal for a broad text and data mining exception was withdrawn in 2024 after sustained opposition from creators and cultural organisations. A revised framework published in late 2024 proposes machine-readable opt-out mechanisms, but the detail remains under consultation. UK creators currently sit in a gap: relying on existing copyright protections without the transparency rules that the EU provides. For organisations navigating these choices, we have written separately about ethical technology decisions for social enterprises.

The cases that matter

Three major lawsuits have sharpened the debate. Getty Images is suing Stability AI over the alleged use of more than 12 million photographs to train its image generation model. The New York Times has brought a case against Microsoft and OpenAI alleging that millions of articles were used without licence and that the models can reproduce copyrighted text verbatim. The Authors Guild, representing writers including John Grisham and George R.R. Martin, has filed a class action against OpenAI for systematic use of copyrighted books. None has reached final judgment, but the direction is clear: courts are engaging with whether training on copyrighted content without consent constitutes infringement.

What you can actually do about it

One of the more practical responses is C2PA, the Coalition for Content Provenance and Authenticity. It is an open standard, developed by Adobe, Microsoft, the BBC, and others, that lets creators embed tamper-evident metadata directly into their files.

Here is what that looks like in practice. You take a photograph. Before you publish it, you sign it with C2PA Content Credentials. The signed file now carries a verifiable record of who created it, when, and what rights apply. You might assert "all rights reserved, no AI training permitted." That assertion is embedded in the file itself, machine-readable and tamper-evident. If an AI company later scrapes that image, they cannot claim ignorance. The metadata is there. Ignoring it is a conscious choice with legal consequences.

Camera manufacturers including Nikon, Sony, and Leica are building C2PA signing into their hardware. News organisations are using it to verify photojournalism. The standard is gaining ground.

But there is a structural problem with most content protection tools available today. They are cloud-based: you upload your images to a third-party service, which processes or monitors them on your behalf. In order to protect your content from unauthorised use, you hand it to another company and trust them not to do the same thing.

Local-first architecture removes that contradiction. When processing happens entirely on your own device, the tool itself cannot become a vector for the problem it claims to solve.

This is the approach we have taken with Jura Trace. Embedding C2PA credentials, applying invisible watermarks, fingerprinting assets, running forensic checks: all of it happens on your machine. Your content does not leave your device. There is no account, no telemetry, and no server storing your files. For the organisations most affected by unconsented data extraction, that is not a technical nicety. It is the minimum condition for trust.

The question is not whether protections will arrive

The direction across every major jurisdiction is towards greater transparency, stronger rights-holder protections, and documented consent. The EU is furthest ahead, but the UK, US, Canada, and Australia are all moving.

The question is what you do in the meantime. Embed provenance in your content now. Assert your rights in machine-readable form. And choose tools that do not ask you to hand over the very thing you are trying to protect.

Data sovereignty is not about rejecting AI. It is about insisting that AI development happens with consent, transparency, and respect for the people whose work makes it possible. That is not a radical position. It is the baseline.

Explore Jura Trace Get in Touch


References

  1. European Parliament and Council, Regulation (EU) 2024/1689 (EU AI Act), Official Journal of the European Union, 12 July 2024. eur-lex.europa.eu/eli/reg/2024/1689/oj. Key provisions: Article 53(1)(c) (copyright compliance policy), Article 53(1)(d) (training data summaries).
  2. European Parliament and Council, Directive (EU) 2019/790 on Copyright in the Digital Single Market, Article 4(3) (text and data mining opt-out). eur-lex.europa.eu/eli/dir/2019/790/oj
  3. UK Intellectual Property Office, Artificial Intelligence and Copyright: Government Response, February 2024. gov.uk/government/consultations/artificial-intelligence-and-copyright
  4. UK Department for Science, Innovation and Technology, AI Opportunities Action Plan, December 2024. gov.uk/government/publications/ai-opportunities-action-plan
  5. C2PA, C2PA Technical Specification v2.1. c2pa.org/specifications
  6. Getty Images (US) Inc. v Stability AI Inc., Case No. 1:23-cv-00135 (D. Del. 2023).
  7. The New York Times Company v Microsoft Corporation and OpenAI, Case No. 1:23-cv-11195 (S.D.N.Y. 2023).
  8. Authors Guild v OpenAI Inc., Case No. 1:23-cv-08292 (S.D.N.Y. 2023).
  9. LAION, LAION-5B: An Open Large-Scale Dataset for Training Next Generation Image-Text Models, NeurIPS 2022. laion.ai/blog/laion-5b

Jura Trace protects your content from unauthorised AI extraction. Pilot launching June 2026.

Explore Jura Trace Explore ROOTED

Get practical updates on carbon reporting, ethical AI, and local-first tools.

Subscribe to Newsletter