Skip to content

Search and indexing

SpecStar provides query + search without forcing developers to touch a database.

Search is based on a simple model:

  • All queries run against resource-level metadata (ResourceMeta)
  • Plus an extracted, searchable projection of data: indexed_data

No revision scanning is performed.

What is indexed_data?

indexed_data is a dictionary stored in ResourceMeta:

indexed_data: dict[str, Any]

It is extracted from the resource data (T) during:

  • create()
  • update()
  • modify() (when data changes)

Important: indexed_data is not a copy of the full resource payload. Only the fields explicitly declared in indexed_fields are extracted into it.

The main purpose:

  • enable fast filtering / sorting without decoding or scanning revisions
  • keep search focused on “the current version” (HEAD) of a resource

Data types

indexed_data supports any JSON value (as stored by your encoding):

  • scalars: string / number / bool / null
  • objects / arrays

Storage backends may impose practical constraints depending on implementation, but the semantic contract is “any JSON”.

Flattened (shallow) keys

SpecStar treats indexed fields as flattened keys, e.g.:

  • "user.email" is stored directly as a key in indexed_data

So query comparison is shallow:

  • no nested traversal is required at query time
  • the index builder already produced the flattened key

This matches the Query Builder behavior:

  • QB["user.email"] targets the key "user.email" in indexed_data

If a configured path is missing or cannot be resolved, SpecStar simply skips that value instead of failing the indexing step. In practice, missing-key situations are ignored and the field is left out of indexed_data.

What does search query over?

A search query only considers:

  1. ResourceMeta fields (built-in)
  2. indexed_data fields (extracted from data)

It does not:

  • scan revision history
  • search inside old revisions
  • scan raw payload bytes

This makes performance predictable and aligns with the “current state” semantics of most APIs.

Small example

Imagine the current resource data looks like this:

{
  "name": "Alice",
  "profile": {
    "email": "alice@example.com"
  },
  "status": "active"
}

A practical indexed projection could look like this:

{
  "name": "Alice",
  "profile.email": "alice@example.com",
  "status": "active"
}

That is why a query such as QB["profile.email"] == "alice@example.com" can run efficiently without scanning the whole revision payload.

Sorting

Sorting is supported on:

  • ResourceMeta fields (e.g. created_time, updated_time)
  • indexed_data fields (e.g. "user.email", "score")

SpecStar provides a Query Builder (“QB”) that makes filtering readable and safe.

Examples (conceptual):

QB.resource_id().eq("...")
QB.created_time().last_n_days(7)
QB.is_deleted() == False
QB["user.email"].eq("a@b.com")
QB.all(QB["age"] > 18, QB["status"] == "active")
QB.any(QB["tier"] == "gold", QB["tier"] == "platinum")

The Query Builder is the recommended way to build complex conditions, because:

  • it is expressive
  • it avoids hand-writing condition JSON
  • it matches the behavior of indexed_data flattening

API surface

In HTTP APIs, search/list endpoints generally accept:

  • qb= (recommended)
  • or structured JSON parameters such as:

  • conditions=...

  • data_conditions=...
  • sorts=...
  • plus pagination (limit, offset)
  • plus response shaping (returns=..., partial=...)

See also: