Loki, the Dog, and the Symbol Grounding Problem

Nick Clark

Two Problems Hiding in One Word

When a user says "my dog Loki," a system has to resolve two different things at once. It has to know that the symbol refers to something real and specific, the user's own pet, rather than to the Norse figure, the streaming series, or a slang sense, and it has to know which sense is operative in this user's context. The first is the symbol grounding problem identified by Harnad in 1990: a symbol manipulated only against other symbols, with no anchoring to a referent, has no meaning of its own. The second is the indexical reference problem: words like "my," "here," and "now" mean different things depending on who utters them and when. Pure statistical language models leave both open. A model trained on a population averages across it and returns the modal sense, so it tends to resolve an ambiguous reference to the most common public meaning rather than to the speaker's specific one, and it has no principled place to carry the fact that this particular user's dog is named Loki.

Purely symbolic systems have the opposite failure. They can represent that a named entity is the user's pet, but they cannot track how meaning drifts across registers and communities, where the same token carries a culturally specific connotation that no hand-built ontology keeps current. Each approach owns half the problem and cannot reach the other half. The substrate disposes of the choice by carrying both halves in different layers and routing each resolution to the layer that owns it.

The Hybrid Split

Indexical and personal reference is resolved by the personal lineage layer. Because each user maintains a per-user record of their own accumulated traversals, the fact that this user's prior traversals repeatedly resolved a given name to a specific personal entity is carried as that user's lineage. When the same user issues a query containing the name, the anchor's candidate transitions are reweighted by that lineage, so the personal referent is elevated for this user without being imposed on anyone else and without the system building a public dossier. "Loki is this user's dog" lives where it belongs, in the user's own portable, deletable cognitive asset, not in the model's weights.

Idiomatic and cultural sense is resolved by the statistical inference engine at the anchor, operating over the anchor's published neighborhood. The distribution of how a token is used across a register or community is exactly what a statistical model represents well, and the model is invoked precisely to score candidate senses against the discovery object's structured intent and the neighborhood's content descriptor. Neither layer alone resolves the full reference. The personal lineage carries the specific referent; the statistical model carries the cultural sense; the substrate routes each resolution to the layer that holds the right kind of evidence.

Mechanism

Resolution happens during traversal, not as a preprocessing step. A discovery object carrying the user's identity arrives at an anchor whose neighborhood publication describes the semantic territory relevant to the query. The search step produces a candidate transition set over the senses and entities reachable from the anchor. Two influences then act on that set. The statistical inference engine scores the candidates for cultural and contextual fit against the structured intent, contributing the idiomatic resolution. The personal lineage layer reweights the same admissible candidates by the user's history, contributing the indexical resolution. Both operate strictly within admissibility: the personal layer can elevate the user's specific referent only if it is already an admissible candidate, and the statistical engine proposes but does not dispose, since the execution step still evaluates the selected transition for policy, lineage, entropy, and temporal validity before it is taken. The resolved meaning is recorded in the discovery object's memory and lineage, so the basis for the resolution is auditable rather than buried in a single model pass.

Embodiments

In a personal-assistant embodiment, a reference to a named contact, place, or possession resolves to the user's own entity by their lineage, while the surrounding language is interpreted in its current cultural sense by the statistical engine. In a clinical embodiment, a term that carries a lay connotation in general usage and a precise meaning in a medical register resolves to the register appropriate to the practitioner's neighborhood and history, without the system having to be told which register applies. In a multilingual or cross-community embodiment, a token whose connotation differs across communities is resolved by the statistical engine against the relevant community's neighborhood, while a personal referent within it is still resolved by the user's own lineage. In each case the division of labor is the same: the world's shared and shifting meaning sits in the statistical layer over the shared substrate, and the user's specific meaning sits in their own portable lineage.

Prior-Art Distinction

Statistical language models ground reference by training distribution and therefore resolve ambiguous and indexical references toward population modes, with no principled, user-owned place to carry an individual's specific referent and no auditable record of why a sense was chosen. Symbolic and knowledge-graph systems can represent a specific referent but do not track cultural and register drift and depend on hand-maintained ontologies that fall out of date. Neuro-symbolic hybrids combine the two but typically fuse them inside one model or pipeline, without a structural separation between a user-owned indexical layer and a shared statistical layer, and without a governance boundary that keeps personal resolution from overriding admissibility. The distinguishing combination disclosed here is the routing of indexical and personal reference to a per-user, portable lineage layer and of idiomatic and cultural sense to a statistical engine over a shared, policy-scoped neighborhood, with both operating inside a single admissibility-gated traversal and the resolution recorded in lineage.

Disclosure Scope

The three operating modes of the discovery substrate, including answer synthesis, and the policy-scoped anchor neighborhood publication that the inference engine scores against, are disclosed in the cognition filing (U.S. Application No. 19/647,395 and its international counterpart) at Sections 10.8 and 10.4. The per-user personal lineage layer that carries indexical and personal reference, and its bounded application as a weighting overlay on admissible candidates, are disclosed in the companion article on the personal lineage layer. This article discloses their composition for reference resolution: the routing of indexical and personal reference to the user's lineage layer and of idiomatic and cultural sense to the statistical inference engine, within a single admissibility-gated traversal, with the resolved meaning recorded in the discovery object's memory and lineage. The scope extends to resolution policies not described whose behavior reduces to this division between a user-owned indexical layer and a shared statistical layer operating inside admissibility.