320-Bit UID Construction: Multi-Segment Hashing for Negligible Collision Probability

Nick Clark

Mechanism

The unique identifier, or UID, is the structurally derived content identity at the center of the content anchoring platform. It is derived deterministically from a digital content artifact's internal variance and structural features rather than from its storage location, file name, cryptographic key, or transmission metadata. The construction described here is the quadrant decomposition and spatial sub-region fingerprinting process by which a 320-bit unique identifier is assembled. It follows global variance vector extraction, taking the artifact through a spatial decomposition pipeline that produces independent fingerprints for four non-overlapping quadrant regions, and produces a 320-bit UID together with a backward-compatible short-form identifier.

The construction proceeds through a defined pipeline: canonical normalization of the artifact, orientation canonicalization, extraction of four non-overlapping spatial quadrants, per-quadrant nine-dimensional variance extraction and hashing, rotation-invariant sorting of the four quadrant hashes, and combination of the global hash and the four sorted quadrant hashes into a five-segment 320-bit identifier. The quadrant decomposition enables detection of partial similarity, regional mutation, and spatial composition matching that is not captured by the global nine-dimensional vector alone. The result is stable under controlled transformations such as format conversion and resolution rescaling while diverging predictably as variance-shifting mutations occur.

Canonical Normalization and Orientation

The canonical normalization stage rescales the input artifact to a square canvas of 256 pixels by 256 pixels. It computes the ratio of the target dimension to the longest edge of the source artifact, applies uniform scaling along both dimensions, and centers the scaled image on a black background fill. This letterbox-style normalization preserves aspect ratio while ensuring that all artifacts occupy a consistent canonical coordinate space for quadrant extraction. Image smoothing is disabled during rescaling to prevent anti-aliasing from introducing artificial variance signals along rescaled edges.

The orientation canonicalization module then computes the dominant gradient orientation of the normalized artifact by extracting the peak bin of an eight-bin gradient histogram over the full canonical image. If the dominant orientation angle exceeds an absolute value of approximately 0.1 radians, corresponding to approximately 5.7 degrees, the artifact is rotated by that angle about the center of the canvas prior to quadrant extraction. This step ensures that artifacts with predominant orientation structure are presented in a consistent spatial frame prior to spatial decomposition, reducing inter-format and inter-resolution UID drift attributable to minor rotational variations.

Quadrant Decomposition and Per-Quadrant Hashing

Four spatial quadrant extraction modules extract non-overlapping rectangular sub-images from the canonical image: the top-left, top-right, bottom-left, and bottom-right quadrants, each comprising one-quarter of the canonical image area. Per-quadrant variance vector computation applies the full nine-dimensional extraction pipeline to each quadrant sub-image, producing an independent X, Y, Z triplet for each spatial region. This spatial decomposition is what supplies the regional resolution that the global nine-dimensional vector alone does not provide.

Per-quadrant hashing converts each quadrant's nine-dimensional vector into a 256-bit hexadecimal hash using a coarsened quantization scheme, in which the X and Y axis components are quantized at a step of 1/32 and the Z axis components at a step of 1/8 prior to hashing. This coarser quantization absorbs JPEG compression noise and format conversion artifacts at the sub-image level, where localized compression introduces greater per-pixel variance than at the full-image level. The hash function applies multiple overlapping FNV (Fowler-Noll-Vo) variant hash functions at two quantization scales, 16 and 20, and XORs the resulting 128-bit outputs to produce the final 256-bit quadrant hash.

Rotation-Invariant Sorting and 320-Bit Combination

The rotation-invariant quadrant sorting module sorts the four quadrant hashes in lexicographic order of their hexadecimal string representations, then assigns the sorted hashes to canonical positions q0 through q3. Because the sorting is by hash value rather than by spatial position, the assignment of quadrant identifiers is independent of the spatial orientation of the artifact: a rotated or mirrored version of an artifact produces the same set of sorted quadrant hashes even if the individual spatial positions of the quadrants differ.

The combined 320-bit unique identifier construction module applies a multi-segment FNV-64 hash combiner to the global 256-bit hash and the four sorted quadrant hashes, producing five 64-bit hash segments that are concatenated to form the 320-bit UID. The combiner applies five distinct initialization seeds to the same ordered concatenation of hash inputs, yielding five independent 64-bit outputs that together constitute an address space of 2^320 distinct UIDs. This address space is sufficient to ensure negligible collision probability across any foreseeable scale of digital content deployment.

The first 16 hexadecimal characters of the 320-bit UID, representing 64 bits, serve as a backward-compatible short-form identifier suitable for human-readable display, database indexing, and bandwidth-constrained transmission. The full 320-bit representation is used for anchor node assignment, slope-band routing, and similarity comparison operations.

Spatial Locality and Quadrant-Level Comparison

The quadrant decomposition is what gives the UID its spatial resolution under comparison. The comparison framework operates on UID records and produces per-axis cosine similarity scores for the X, Y, and Z axes, an aggregate directional cosine similarity, a Euclidean distance-based similarity score, and per-quadrant similarity scores together with hash match flags. Because each quadrant carries its own hash and its own variance triplet, comparison can be resolved per region rather than only over the artifact as a whole.

This supports localized mutation detection. A derivative artifact that modifies only one spatial region of a source exhibits quadrant similarity scores near 1.0 for the unchanged regions and reduced scores for the modified region, enabling spatially resolved attribution even when the global similarity score remains high. The same property allows adversarial recombinations to be surfaced: when a party attempts to register a derivative that suppresses attribution to one or more source UIDs by manipulating variance features, the slope profile of the composite UID exhibits measurable divergence from the weighted variance combination of its declared parents, and anchors may flag such entries and subject them to heightened governance review.

Variance Band and the UID Record

The UID does not carry its governance class inside its 320 bits. The variance band classification derives from the global variance value of the artifact and is carried as a supplementary field in the UID record rather than encoded into the identifier itself. The global variance value places the artifact into one of five variance bands within the global slope continuum, and that band determines which anchor cluster governs the UID for registration, resolution, and caching.

Lineage and alias bindings are likewise maintained outside the identifier. Aliases are treated as mutable, scoped symbolic references governed independently of the UID itself and resolved through slope-band-scoped anchor consensus, and lineage edges are recorded as weighted attribution annotations in a multi-root lineage graph held in the governing anchor's cache. The UID itself remains invariant under these governance and naming events, so the same content retains one structurally derived identity while its band assignment, aliases, and lineage edges evolve around it.

Application Across Modalities

The construction operates on any normalized scalar field representation of a digital artifact and is therefore not limited to raster images. The variance extraction pipeline that feeds quadrant decomposition runs identically once a modality has been reduced to a bounded two-dimensional scalar field. The audio normalization path computes a short-time Fourier transform, applies a mel filterbank, and produces a mel-spectrogram scalar field; the text normalization path applies TF-IDF weighting and positional grid mapping to produce a token frequency scalar field, supplemented by byte-level variance computed over sliding windows; and the binary object path reshapes the byte stream into a square or near-square matrix at a canonical resolution and computes per-cell statistics from sliding-window byte variance, byte-frequency entropy approximated as the variance of byte-frequency counts, and structural-section profile values.

Video artifacts are handled at two levels: each frame is processed through the full extraction and quadrant pipeline, and a clip-level UID is derived from a temporal delta vector formed by computing the cosine similarity between consecutive frame variance vectors and treating the resulting one-dimensional temporal profile as a signal for the extraction pipeline. For real-time streaming content, the derivation operates over a sliding window of the stream to produce window-level UIDs, and consecutive window UIDs are compared by cosine similarity to detect structural continuity or discontinuity. In every case the same multi-axis extraction produces a variance vector directly comparable to vectors from any other modality through the same cosine similarity operator.

Prior-Art Distinction

Conventional asset identifiers such as uniform resource locators, cryptographic hash pointers, and file-system paths derive from storage location or transmission metadata rather than from the internal structure of the content. They are invalidated by mutation, format conversion, resolution change, lossy compression, or replication, producing identity fragmentation across versions and derivatives. The UID construction described here derives the identifier from the artifact's own variance and spatial structure, so the identity travels with the content and persists across controlled transformations.

Existing perceptual hashing systems, including difference hash, average hash, and perceptual hash algorithms, produce low-dimensional binary signatures from downsampled image representations. They lack multi-scale structural analysis, directional orientation decomposition, spatial sub-region identity, and a continuously scaled similarity score suitable for slope-based banding or lineage tracing. The construction here combines multi-scale variance extraction, orientation canonicalization, four-quadrant sub-region fingerprinting, and rotation-invariant sorting to produce a 320-bit identifier that encodes spatial composition and supports cosine-similarity comparison, rather than a fixed-width binary digest.

Disclosure Scope

The UID construction described here, comprising the canonical normalization to a 256 by 256 canvas, the orientation canonicalization step, the four-quadrant spatial decomposition, the per-quadrant nine-dimensional variance extraction and coarsened FNV-variant hashing to 256-bit quadrant hashes, the rotation-invariant lexicographic sorting of the quadrant hashes, and the multi-segment FNV-64 combination of the global 256-bit hash and the four sorted quadrant hashes into a five-segment 320-bit unique identifier with a 64-bit short-form prefix, is disclosed in PCT International Application No. PCT/US26/28630.

The disclosure is non-limiting with respect to the modality-specific normalization procedures that produce the scalar field, the choice of quantization scales, and the band granularity carried in the UID record. Variations in these implementation details are within the scope of the disclosure provided that the structurally derived, variance-based identity, the four-quadrant sub-region fingerprinting, the rotation-invariant sorting, and the five-segment 320-bit combination are retained. This article is part of a series describing the content anchoring platform; related disclosures cover multi-axis variance vector extraction, variance-band anchor distribution, and multi-root composite lineage.