Getty Images Built the World's Largest Licensed Image Library. Image Identity Still Depends on Metadata.

by Nick Clark | Published March 28, 2026 | PDF

Getty Images built the world's largest commercially licensed image library, spanning Getty Images editorial and creative collections, the iStock royalty-free marketplace, the Unsplash+ premium tier, custom content commissions, and an emerging line of generative AI licensing with content credentials. The licensing infrastructure is mature; the editorial standards are widely cited; the rights-management apparatus is the reference point for the industry. But image identity in Getty's system depends on attached metadata: file names, IPTC and XMP fields, custom identifiers, EXIF blocks, and database records keyed to those identifiers. If metadata is stripped during download, sharing, re-encoding, or screenshotting, the image loses its provable connection to its license. The structural gap is between metadata-based image identity and content identity derived from the image's own structural properties — a gap that widens as generative AI multiplies the number of near-duplicates in circulation.


What Getty Images provides

Getty Images operates several distinct catalog tiers under one rights-management umbrella. The Getty Images flagship catalog supplies editorial photography, news and sports imagery, and rights-managed creative content to publishers, broadcasters, agencies, and enterprise customers. iStock supplies royalty-free creative photography, illustration, and video to small and mid-market customers at price points well below the flagship tier. Unsplash+ wraps the high-volume Unsplash community with a premium licensing layer that gives commercial customers indemnification and model releases on top of contributor work that is otherwise free for non-commercial use. Custom content commissioning brings Getty's contributor network to bear on bespoke shoots. And the recent generative AI licensing program — built on a training corpus drawn exclusively from Getty's own licensed library, with content credentials attached to outputs — represents Getty's bet that provenance will become a marketable feature of synthetic media.

Each tier inherits the same identity architecture. Every asset is assigned a Getty asset identifier; that identifier is embedded in the file's IPTC and XMP metadata blocks at delivery; the identifier resolves through Getty's database to the rights record, the contributor, the model and property releases, and the license terms negotiated for the specific transaction. The licensing infrastructure and editorial standards are not in dispute. The gap described here is about image identity architecture, not about licensing quality.

Metadata-dependent identity is strippable identity

Getty's identity architecture depends on metadata that travels with the file. IPTC photo metadata, XMP rights expressions, EXIF camera data, and Getty's custom identifier fields are all written into the file at delivery and are intended to survive into downstream use. In practice, they rarely do. Major social media platforms — Facebook, Instagram, X, TikTok — strip or rewrite metadata on upload. Content management systems regenerate derivative renditions and discard the source metadata. Image editing tools save out new files without preserving the original blocks. Screenshots, mobile camera roll exports, and messaging app forwards drop metadata as a side effect of how the platforms handle pixel data. By the time an image has moved two or three hops away from Getty's delivery endpoint, the embedded identifier is usually gone.

When the metadata is gone, the image's connection to its license is also gone — not because the license expired or was revoked, but because the artifact that linked the pixels to the rights record has been stripped. The pixels still represent the same visual content. The licensed identity does not survive the round trip. Getty's enforcement teams know this; the entire reverse-image-search apparatus exists to compensate for it.

Reverse image search is probabilistic, not structural

Getty supplements metadata-based identity with reverse image search to detect unauthorized use across the open web. This is real enforcement and it generates real takedowns. But reverse image search is probabilistic by construction: perceptual hashes, learned embeddings, and feature-matching pipelines find images that are similar enough to a reference, within a tunable threshold. They do not identify structurally identical content; they classify candidates by distance in a feature space. The threshold is a tradeoff. Loosen it and the false-positive rate rises and enforcement becomes adversarial against legitimate transformative use. Tighten it and modified images — recompressions, crops, color grades, content-aware fills, generative inpainting — slip past as different enough to clear the bar.

Generative AI makes this worse on both axes. The volume of near-duplicates in circulation grows because the marginal cost of producing a variation has collapsed. The set of legitimate transformations grows because tools that were once edge cases are now defaults in every consumer photo app. A probabilistic matcher tuned for the pre-AI distribution of derivative work is not the right tool for the post-AI distribution. The deeper problem is not the threshold; it is that the image has no intrinsic identity for the matcher to verify against. Identity is being reconstructed from feature similarity at enforcement time, not carried by the content itself.

What content anchoring provides

Content anchoring derives image identity from the image's own structural variance: spatial frequency distributions, variance gradients across regions, and structural signatures that depend on the visual content rather than on appended metadata. The signature is computed from the pixels and is reproducible from the pixels. It survives the transformations that preserve visual content — metadata stripping, recompression, format conversion, modest crops and resizes, screenshotting through a platform pipeline — because those transformations preserve the structural properties the signature depends on. It diverges when the visual content itself diverges, which is the boundary that licensing actually cares about.

Applied to Getty's catalog, content anchoring closes the gap between licensing identity and content identity. A Getty asset would carry an intrinsic structural identity registered at the moment of ingest, alongside the existing metadata-based identifier. License verification at downstream use would compute the structural identity from the pixels in hand and check it against the registry, instead of relying on metadata that may or may not have survived the trip. The same registry would support content credentials for generative AI outputs, allowing a synthetic image's provenance to be verified from its structure rather than from a credential that travels alongside the file and is subject to the same stripping. Getty's licensing apparatus is mature; what content anchoring adds is an identity layer that the apparatus can rely on after the file has left Getty's delivery endpoint.

Where adoption fits Getty's roadmap

Getty's strategic posture — the AI licensing program, the content credentials initiative, the public stance on training data provenance — is already aligned with structural identity. The remaining step is to back that posture with an identity primitive that does not depend on metadata surviving the open web. Existing customers gain enforcement that does not degrade as platforms strip metadata. The AI licensing tier gains a verifiable provenance signal that travels with the pixels. Contributor royalties gain a structural basis for usage attribution that does not require platforms to cooperate with Getty's metadata. The competitive position improves precisely where the current architecture is weakest: identity after the file leaves Getty's hands.

Nick Clark Invented by Nick Clark Founding Investors:
Anonymous, Devin Wilkie
72 28 14 36 01