Backend setup: metadata, resources, blobs, and jobs¶
For most real projects, the first important adoption step is not route generation. It is choosing where your backend state will live.
This guide intentionally shows two setup levels:
- the higher-level
backend=API for a unified backend configuration story - the lower-level factory path for teams that want finer control over storage and queue wiring
Both are valid and fully supported. This guide starts with the higher-level backend API because it keeps the backend story unified, then moves to factories for cases that need more explicit control.
SpecStar separates backend setup into four concerns:
| Concern | What it stores | Typical choices |
|---|---|---|
| metadata store | resource IDs, revisions, search/index metadata, lifecycle state | memory, local disk, PostgreSQL, S3-backed SQLite |
| resource store | the structured resource payload itself | memory, disk, PostgreSQL, S3 |
| blob store | binary files, uploads, and job log artifacts | memory, local disk, S3 |
| message queue | background job delivery and retries | simple in-process, RabbitMQ, Celery |
If you choose these four pieces deliberately at the start, the rest of the product is much easier to adapt.
The golden rule: configure first¶
Configure your backend before registering models.
from specstar import BackendBinding, BackendConfig, ConnectionProfile, Schema, spec
spec.configure(
backend=BackendConfig(
connections={
"local": ConnectionProfile(
type="disk",
options={"rootdir": "./data"},
)
},
meta=BackendBinding(use="local"),
resource=BackendBinding(use="local"),
blob=BackendBinding(use="local"),
)
)
spec.add_model(Schema(User, "v1"))
spec.apply(app)
That order keeps metadata, resource data, blob behavior, and queue behavior aligned from the beginning. The lower-level storage_factory= and message_queue_factory= arguments still work well when you want more explicit composition in Python.
Two setup levels at a glance¶
| Level | Entry point | Best for | Tradeoff |
|---|---|---|---|
| higher-level | spec.configure(backend=...) |
most projects, shared config files, easier onboarding | less explicit low-level wiring in user code |
| lower-level | spec.configure(storage_factory=..., message_queue_factory=...) |
advanced deployments and precise backend composition | more setup detail and more concepts to manage |
Recommended starting points¶
| Situation | Recommended setup | Blob behavior | Queue choice |
|---|---|---|---|
| tests or throwaway demos | backend= with in-memory bindings |
in memory | default simple queue is enough |
| local development / MVP | backend= with a disk connection |
local filesystem under the same data root | simple queue if you use jobs |
| recommended production path | backend= with PostgreSQL metadata, disk resource storage, S3 blobs |
S3 | RabbitMQ |
| object-storage-first production | backend= with PostgreSQL + S3 bindings |
S3 | RabbitMQ or Celery |
| advanced custom composition | lower-level storage and queue factories | depends on your factory choice | depends on your queue factory |
A common progression is to begin with the unified backend API and move to the lower-level factories only when you need more explicit control.
1. Install the integrations you actually need¶
Common combinations:
specstar[postgresql]for PostgreSQL metadata and resource storagespecstar[s3]for S3-compatible data and blob storagespecstar[mq]for RabbitMQ or Celery queue backends
If you only need local persistence, the base package plus DiskStorageFactory is often enough.
2. JSON config-file setup¶
If you want deployment-friendly backend setup, place the unified config in a JSON file and load it directly.
{
"version": 1,
"connections": {
"local": {
"type": "disk",
"options": {
"rootdir": "./data"
}
},
"jobs": {
"type": "simple",
"options": {
"max_retries": 3
}
}
},
"meta": {"use": "local"},
"resource": {"use": "local"},
"blob": {"use": "local"},
"mq": {"use": "jobs"}
}
This keeps connection information centralized and makes it easier to share the same backend setup across environments. JSON values also support environment-variable expansion such as ${POSTGRES_DSN}.
3. Local persistent setup for a real MVP¶
This is the simplest durable setup for a single-node deployment using the higher-level backend API.
from fastapi import FastAPI
from msgspec import Struct
from specstar import BackendBinding, BackendConfig, ConnectionProfile, Schema, spec
class User(Struct):
name: str
email: str
app = FastAPI()
spec.configure(
backend=BackendConfig(
connections={
"local": ConnectionProfile(
type="disk",
options={"rootdir": "./data"},
)
},
meta=BackendBinding(use="local"),
resource=BackendBinding(use="local"),
blob=BackendBinding(use="local"),
)
)
spec.add_model(Schema(User, "v1"))
spec.apply(app)
What this gives you:
- persistent metadata on local disk
- persistent resource payloads on local disk
- persistent blobs under the same local data area
- no extra infrastructure to operate
Use this when you want the fastest path from demo to something your team can restart safely.
4. Recommended production setup¶
The current recommended production shape is:
- PostgreSQL for searchable metadata
- Disk for resource payload storage
- S3 for durable blobs and uploaded files
- RabbitMQ for background workers
import os
from fastapi import FastAPI
from msgspec import Struct
from specstar import BackendBinding, BackendConfig, BackendDefaults, ConnectionProfile, Schema, spec
class Document(Struct):
title: str
content: str
app = FastAPI()
spec.configure(
backend=BackendConfig(
defaults=BackendDefaults(
table_prefix="app_",
blob_prefix="uploads/",
),
connections={
"pg": ConnectionProfile(
type="postgres",
options={"dsn": os.environ["POSTGRES_DSN"]},
),
"blob-s3": ConnectionProfile(
type="s3",
options={
"bucket": os.environ["S3_BUCKET"],
"access_key_id": os.environ["AWS_ACCESS_KEY_ID"],
"secret_access_key": os.environ["AWS_SECRET_ACCESS_KEY"],
"endpoint_url": os.getenv("S3_ENDPOINT_URL"),
},
),
"jobs": ConnectionProfile(
type="rabbitmq",
options={"amqp_url": os.environ["RABBITMQ_URL"]},
),
},
meta=BackendBinding(use="pg"),
resource=BackendBinding(
type="disk",
options={"rootdir": "./data"},
),
blob=BackendBinding(use="blob-s3"),
mq=BackendBinding(use="jobs"),
)
)
spec.add_model(Schema(Document, "v1"))
spec.apply(app)
This production layout keeps:
- searchable metadata in PostgreSQL
- resource payloads on local or mounted disk
- blobs in S3-compatible storage
- RabbitMQ-backed job workers
If you prefer object storage for both resource payloads and blobs, use S3 for both the resource and blob bindings.
5. When to use the lower-level factory path¶
The factory-style configuration is still a strong option when you want explicit control over the storage and queue objects being wired into SpecStar.
from specstar import spec
from specstar.message_queue import RabbitMQMessageQueueFactory
from specstar.resource_manager import PostgresDiskS3StorageFactory
spec.configure(
storage_factory=PostgresDiskS3StorageFactory(
connection_string="postgresql://user:pass@host:5432/appdb",
rootdir="./data",
s3_bucket="my-blob-bucket",
),
message_queue_factory=RabbitMQMessageQueueFactory(),
)
Use this path when you want to:
- construct backend objects directly in Python
- expose advanced options through concrete factory classes
- control storage composition at a lower level than the unified config schema
If your team is deciding between the two styles, think of backend= as the easier unified entry point and factories as the deeper control surface.
6. Understand what each storage factory really does¶
The easiest way to avoid surprises is to map the factory to the four backend concerns.
| Factory | Metadata | Resource data | Blob data |
|---|---|---|---|
MemoryStorageFactory() |
memory | memory | memory |
DiskStorageFactory("./data") |
disk-backed metadata | local files | local files |
S3StorageFactory(...) |
SQLite synced to S3 | S3 | S3 |
PostgresStorageFactory(...) |
PostgreSQL | PostgreSQL | memory by default |
PostgreSQLS3StorageFactory(...) |
PostgreSQL | S3 | S3 |
PostgresDiskStorageFactory(...) |
PostgreSQL | local disk | memory by default |
PostgresDiskS3StorageFactory(...) |
PostgreSQL | local disk | S3 |
Two important consequences:
- If your resource includes binary uploads, do not assume every PostgreSQL-based setup automatically persists blobs.
- The current recommended production shape is
PostgresDiskS3StorageFactory(...)together withRabbitMQMessageQueueFactory()for workers.
7. Choose a queue only when jobs matter¶
If your app never uses Job[...] resources or background execution, you can keep the default simple setup.
When jobs matter:
- use
SimpleMessageQueueFactory()for local development or same-process consumers - use
RabbitMQMessageQueueFactory()for a broker-backed worker fleet - use
CeleryMessageQueueFactory()if your platform already standardizes on Celery
A minimal local job setup looks like this:
from specstar import Schema, spec
from specstar.message_queue import SimpleMessageQueueFactory
from specstar.resource_manager import DiskStorageFactory
spec.configure(
storage_factory=DiskStorageFactory("./data"),
message_queue_factory=SimpleMessageQueueFactory(),
)
spec.add_model(Schema(TrainingJob, "v1"), job_handler=training)
mgr = spec.get_resource_manager(TrainingJob)
mgr.start_consume(block=False)
If jobs stay in pending, check that:
- the queue backend is configured
- a consumer or worker is actually running
- the broker is reachable from the process that handles jobs
8. First deployment checklist¶
Before calling your backend ready for adoption, verify all of the following:
- the app uses
spec.configure(...)beforeadd_model(...) - restarts do not lose metadata or resource payloads
- binary uploads still exist after restart or redeploy
- any required worker process is running for job execution
- the chosen extras are installed for the selected backend
- one create, one search, and one blob upload succeed in the target environment
A quick persistence smoke test is simple:
- create one resource
- restart the app
- fetch the same resource again
- if you use blobs, upload one file and download it after restart