Glean Enterprise Search and Work AI

Nick Clark

Glean Enterprise Search and Work AI

by Nick Clark | Published April 25, 2026 | PDF

Glean operates one of the most widely deployed commercial enterprise-search and Work AI platforms, federating retrieval across SaaS connectors and feeding ranked context to large language models. The architectural element it does not natively provide — governed adaptive-traversal of a semantic graph with credentialed scope and lineage-bound results — is exactly what the semantic-discovery primitive supplies. As Glean's product surface migrates from "search the company" to "act across the company," the absence of a typed, credentialed traversal substrate becomes the load-bearing gap rather than a future-roadmap nicety.

1. Vendor and Product Reality

Glean was founded in 2019 by Arvind Jain (a co-founder of Rubrik and former Google search engineer) and a small team of ex-Google retrieval engineers, and it has reached one of the highest enterprise-software valuation trajectories of the decade. Its commercial footprint is built around two tightly coupled offerings: the underlying enterprise-search index and the Glean Assistant layered on top of it. The search product crawls and federates across the typical SaaS estate — Google Drive, Microsoft 365, Slack, Confluence, Jira, Salesforce, GitHub, Zendesk, ServiceNow, Notion, Box, Dropbox, and dozens of other connectors — building a unified semantic index that respects each source system's access-control list. Glean Assistant then composes ranked retrieval over that index with generative reasoning, producing answers, summaries, and increasingly agentic actions branded as Work AI. The platform is sold to large knowledge-worker organizations on the value proposition that employees recover meaningful working hours per week from time previously spent looking for information across fragmented systems.

The product is RBAC-aware by design, and that design choice is what distinguishes Glean from generic vector-database stacks and from the raw retrieval layers shipped by foundation-model vendors. Every document, message, ticket, and code artifact carries the permissions of its source, and Glean's permission-mirroring pipeline enforces those at query time so that retrieval cannot leak content outside a user's entitlements. This is a meaningful engineering achievement; it requires keeping a faithful, near-real-time replica of complex permission models from dozens of source systems, and it is the principal reason large enterprises deploy Glean over the build-it-yourself alternative. Glean has raised at multibillion-dollar valuations from Sequoia, Kleiner Perkins, and Lightspeed, has expanded internationally, and is widely adopted across the Fortune 500 knowledge-worker market — including in regulated verticals such as financial services and life sciences where the permission-mirror property is procurement-critical.

The Work AI roadmap extends well beyond retrieval. Glean is shipping agents, prompts, and actions that read from connected systems and, in a growing number of cases, write back into them — ticket creation in Jira, draft replies in Gmail, lookups and updates against Salesforce, code suggestions surfaced from internal repositories. The platform's developer-extensibility surface (Glean Actions, custom agents, the prompt library, and the emerging AgentRunner-style execution model) reflects a deliberate move from "answer this question from the index" to "execute this multi-step workflow across the index and its source systems." The center of gravity, however, remains a permission-aware index feeding ranked context windows to LLMs, with action capability layered on top through connector-specific write paths. That layering is rational from a product-evolution standpoint, but it leaves the agentic surface architecturally underweighted relative to the retrieval surface beneath it.

2. Architectural Gap

Permission-mirrored retrieval and ranked context are not the same architectural object as governed adaptive-traversal of a semantic graph. Glean's index treats each connector's content as a flat, scored corpus; relevance is computed against a query, and results are returned as a list of passages with citations back to source. There is no first-class structure that encodes which traversals are permitted across the relations between objects (customer to opportunity to ticket to commit to reviewer; employee to manager to compensation record to approval workflow), nor any lineage-bound proof that a given retrieval path was taken under a particular credential at a particular time. The graph that the agent is implicitly walking exists only in the prompts and tool calls of the assistant, not in the substrate that feeds it.

When Work AI moves from "answer this question" to "execute this multi-step workflow across CRM, ticketing, and code review," that gap becomes load-bearing. An agent reasoning over Glean's results is implicitly traversing a graph — customer to opportunity to ticket to commit to reviewer — but that traversal is unstructured, untyped, and unauditable. RBAC at the document boundary does not constrain the action types an agent may invoke at each hop, and ranking signals are not the same thing as governance signals. A permission mirror tells the system whether a user can read a document; it does not tell the system whether the agent is permitted to traverse from that document to a related action surface, nor does it produce a lineage record proving which path was taken and which alternative paths were considered and rejected.

The gap is also visible in the audit story. Customers in regulated industries can today reconstruct, with effort, that a particular user asked a particular question and received a particular set of citations. They cannot reconstruct, with cryptographic confidence, the traversal path that the assistant or agent followed across the connected systems, the credential under which each hop was admissible, or the alternative branches the agent declined to take. For internal-use generative AI this is uncomfortable; for regulated workflows, multi-tenant SaaS deployments where an embedded Glean instance is acting on behalf of a downstream customer, and any cross-domain agent composition that crosses entitlement boundaries, the absence is procurement-blocking.

The result is a platform that is excellent at producing context and increasingly capable of producing action, but architecturally thin precisely where regulated industries, multi-tenant deployments, and cross-domain agent compositions require it to be thickest. That thinness cannot be patched by improving ranking, by adding more connectors, or by tightening permission mirroring; those improvements operate on the corpus dimension while the missing element is on the traversal dimension.

3. What the AQ Primitive Provides

The semantic-discovery primitive supplies governed adaptive-traversal of a semantic graph as a typed, credentialed, lineage-bound operation. Each traversal carries an action type, a scope derived from the requesting credential, and an emitted lineage record proving which nodes and edges were visited under which authority. Discovery is no longer a ranked list against a flat index; it is a structured walk whose shape is constrained by policy and whose history is reconstructible. The primitive does not replace ranking — ranking continues to operate over the candidate set produced by an admissible walk — but it places ranking inside a governance envelope rather than letting it determine the envelope.

Action-typed traversal means the same graph can answer a "read for summarization" query and a "stage a cross-system mutation" query with different admissible edge sets. A summarization action might admit edges across all linked documents the user can read; a mutation-staging action might admit only edges into systems where the user holds an explicit write credential and where the policy allows the requested mutation type at this moment. Credentialed traversal scope means the boundary of what is reachable is computed against the actor's authority at the moment of the call, not pre-baked into the index, and it is computed against the current state of that authority rather than the authority at index-build time. Lineage-bound results mean that downstream consumers — whether human reviewers, audit systems, or other agents — receive not just the retrieved content but the verifiable path that produced it, including which edges were considered and rejected and on what grounds.

The primitive is technology-neutral with respect to the underlying graph store, the embedding model, and the ranking algorithm; it specifies the structural property that traversal is action-typed, credentialed, and lineage-bound, leaving the implementation choices to the deploying platform. It composes hierarchically: a tenant-scoped traversal can be embedded inside a multi-tenant SaaS provider's own traversal, with credentials and lineage at each level, so that an embedded Glean instance acting on behalf of a downstream customer can produce lineage that the downstream customer can independently verify.

4. Composition Pathway

Glean's existing connector fabric and permission-mirroring layer compose cleanly with semantic-discovery rather than competing with it. The connector layer continues to ingest and normalize content; the permission mirror continues to anchor source-system entitlements. Above those, semantic-discovery materializes the typed graph from the relations the connector layer already extracts, attaches credentialed traversal scope to each request originating from Glean Assistant or a Work AI agent, and emits lineage observations as a side effect of every walk. The composition is additive: nothing in the existing pipeline has to be torn out, and the existing investments in ranking quality, index freshness, and connector coverage retain their full value.

In practice this means Glean Assistant calls become discovery events rather than search calls. The assistant proposes a traversal intent — what is being looked for, under whose authority, for which downstream action — and the primitive returns a scoped subgraph plus a lineage record. Existing ranking models still operate, now over a governance-bounded candidate set rather than the full index. Agents built on the AgentRunner-style pattern that Glean is moving toward gain a substrate where each hop is admissible-by-construction; the agent author no longer has to reason about whether a tool call is permissible from where in the graph it currently sits, because the primitive will refuse, defer, or partially admit traversals that exceed the credentialed scope.

The integration points are concrete. Connector ingestion emits relation-extraction events that the primitive admits as graph observations. Permission mirroring becomes one credential class among several rather than the entire access story. Glean Actions register their action types and required credentials with the primitive, which uses those declarations to gate traversal. Audit and compliance integrations consume lineage observations through the same enterprise SIEM and DLP fabrics customers already operate. For customers, the upgrade path is transparent: existing Glean deployments gain governed traversal and lineage as a substrate property, without re-papering their Glean contract or rebuilding their connector inventory.

5. Commercial and Licensing Implication

For Glean the commercial implication is favorable on every dimension that matters for the next phase of growth. Adopting semantic-discovery as a substrate does not displace the connector economics, the index, the ranking stack, or the Assistant brand — it strengthens the regulated-industry, public-sector, and multi-tenant SaaS-vendor segments where the absence of governed traversal is currently a procurement blocker. It also positions Work AI agents as auditable from the first deployment rather than retrofitted later under regulatory pressure, which matters for the EU AI Act high-risk classification, for emerging SEC and banking-supervisor expectations on agentic systems, and for customer-side AI governance committees that have begun gating internal agent rollouts on substrate-level evidence rather than vendor attestation.

The competitive frame shifts as well. Microsoft Copilot, Google's enterprise AI, and a long tail of vector-database vendors all compete on retrieval quality and connector breadth. A primitive that adds governed traversal as a first-class object differentiates Glean in the procurement conversations where retrieval quality has already commoditized, and it does so on a dimension that the hyperscaler competitors cannot easily replicate without rearchitecting their own retrieval surfaces. For multi-tenant SaaS vendors who today consider embedding Glean but balk at the lineage-and-audit story, governed traversal is the missing piece that converts evaluation into commitment.

Because semantic-discovery is provided as an architectural primitive rather than an application-layer feature, its licensing slots beneath Glean's existing per-seat and per-connector commercial model. The primitive licenses the substrate; Glean continues to own the surface. A natural arrangement is an embedded substrate license in which Glean sublicenses chain participation to its enterprise customers as part of the platform subscription, with pricing aligned to credentialed-authority count or governed-traversal volume rather than to seat count. For Glean's enterprise customers this means access to governed traversal and lineage without re-papering the Glean contract itself, and for Glean it means access to a defensible architectural moat that ranking and connector breadth cannot supply on their own. Honest framing — the primitive does not replace Glean's product; it gives Glean's product the substrate the Work AI roadmap requires and that retrieval engineering, however excellent, cannot produce on its own.