Skip to content

Architecture at a glance

The shape

MerchOS is a pnpm + Turborepo monorepo with one API and three frontends, deployed to two clouds:

yourcustommerch/merchos
├── apps/
│ ├── api/ NestJS — single backend for everything
│ ├── storefront/ Next.js — store.yourcustommerch.ca
│ ├── company-admin/ Next.js — admin.yourcustommerch.ca
│ ├── internal-admin/ Next.js — internal.yourcustommerch.ca
│ └── docs/ Astro Starlight — this handbook
├── packages/
│ ├── types/ Shared TS types (Prisma → public types)
│ └── utils/ Cross-app helpers
├── prisma/ Schema + migrations + seed
├── docs/ ADRs, service notes, runbooks, PHASES.md
└── scripts/ Operator-run jobs (backfills, audits, etc.)

One API, three frontends, two clouds is the rule of thumb. If you’re about to write a fourth frontend or a second API, stop and ask whether you’re trying to fit a square peg.

What runs where

URLBackendPurpose
yourcustommerch.ca + www.*Vercel (Astro)Marketing site (separate repo)
store.yourcustommerch.caVercel (Next.js)Storefront (employees + anonymous)
admin.yourcustommerch.caVercel (Next.js)Company admin portal
internal.yourcustommerch.caVercel (Next.js)YCM internal admin
api.yourcustommerch.caFly.io (Toronto / yyz)NestJS API
cdn.yourcustommerch.caCloudflare R2Image CDN (mockups, BrandKit assets)
staging-*.yourcustommerch.camirrors aboveStaging environment

Two clouds: Vercel for frontends (fast global CDN, instant rollbacks, preview deploys per PR), Fly.io for the API (long-lived processes, cron jobs, region-pinned in Toronto for sub-millisecond Postgres hops). Each cloud’s strengths line up with the workload — splitting them isn’t accidental.

Why Fly + yyz specifically

Postgres is Fly Managed Postgres (flympg) in yyz. The API runs in yyz. The pgbouncer between them runs in yyz. All three colocated. Latency from API → DB is sub-millisecond. Latency from API → user is dominated by Vercel’s edge, not by Fly.

There’s a second reason: Canadian data residency. Some YCM customers — churches, nonprofits, municipal departments — have policies against personal data leaving Canada. Hosting in yyz removes that as a conversation. CloudFront / IAD / EWR are off the table.

The dependency map

Browser
├──→ Vercel edge (static + SSR for the 3 Next.js apps + Astro marketing + Astro docs)
│ │
│ └──→ api.yourcustommerch.ca (Fly.io edge → Toronto machines)
│ │
│ └──→ Fly Managed Postgres (via pgbouncer, session mode)
└──→ cdn.yourcustommerch.ca (Cloudflare R2)

Every browser request goes through Vercel for HTML, then directly to the Fly API for data. The Fly API never talks to Vercel — it doesn’t know Vercel exists. R2 is for object storage (mockup PNGs, BrandKit logos); the API generates signed URLs and the browser fetches directly.

Third-party integrations

Read these as switches the API flips. Every integration has a service module under apps/api/src/modules/; canonical reading is the relevant service note under docs/services/.

IntegrationModuleWhat it doesService note
Stripebilling/Subscriptions, Checkout, PaymentIntents, webhooks, Stripe Taxbilling.md
Postmarknotifications/Transactional email (intake, invites, password reset, order updates)notifications.md
Gelatosuppliers/Apparel + non-apparel print-on-demand. CA fulfillment via gln-toronto hub.suppliers.md
Printfulsuppliers/Second supplier. EU/US-fulfilled with CA-friendly catalog subset.suppliers.md
Anthropic Claude APIai-classifier/, brand-scraper/Vertical-suggestion classification on intake + BrandKit synthesis.ai-classifier.md, brand-scraper.md
Brandfetchbrand-scraper/Logo + brand-color discovery on intake.brand-scraper.md
Cloudflare R2images/Object storage for mockups + BrandKit assets.images.md
Cloudflare Turnstileturnstile/Bot detection on the public intake form._small-modules.md §2
Twilionotifications/SMS (stubbed; not active pre-launch).notifications.md
Google Places API(intake flow only)Address autocomplete on intake.n/a

Process architecture

Inside the Fly API process:

  • Express HTTP server on port 4000 — every request hits a NestJS controller.
  • Cron jobs registered via @Cron(...) decorators — currently a dozen, including nightly Gelato sync, weekly Gelato CA-availability refresh, every-30-min SkuVariantCellState refresh, Sunday weekly tier audit.
  • BullMQ workers for mockup composition + designer-queue events. Backed by Upstash Redis.

There’s one Fly app, two machines, both running the full process (cron + HTTP + workers). After 2026-05-19 the fleet is at min_machines_running = 2 so detached operator scripts survive the autoscaler.

Data plane

Postgres is the durable store. There is no Redis-backed cache layer for application data, no Elasticsearch for search, no separate read replica. Everything flows through one Prisma client → one pgbouncer → one Postgres primary.

Two materialized views earn their keep:

  • raw_catalog_styles — aggregates ~177K SupplierVariantRaw rows into ~5K style-group rows so the Raw Catalog UI can render without OOM-killing the API.
  • master_sku_summary — denormalized fulfillment-readiness flags for the Master catalog list.

Both refresh on schedule (nightly) + on supplier-sync completion. See Catalog pipeline for why these exist and what they’re carrying.

Deployment cadence

  • Vercel auto-deploys every push to main for the four frontends.
  • Fly deploys manually via flyctl deploy --strategy=immediate per backend commit. Per CLAUDE.md §1.4.1: backend commits must be deployed + verified before the next commit. This rule is the painful lesson from Phase 0124 (10.5h outage) and Phase 0125 (stacked-commit migration cascade).
  • Migrations run as Fly release_command — a separate ephemeral machine that runs prisma migrate deploy before the new image starts taking traffic. If migrations fail, the deploy aborts and traffic stays on the old image.

See Operations for the deploy runbook + rollback procedure.

What’s next

  • Domain model — the entities and their relationships.
  • Pricing models — the three families that drive billing + UX divergence.

Canonical sources

  • STACK.md — the source-of-truth list of dependencies + cloud accounts.
  • apps/api/fly.toml — Fly config (min_machines_running, memory, release_command).
  • CLAUDE.md §11 — production URL table.
  • docs/operations/db-connection.md — pgbouncer session-mode rationale.
  • docs/operations/fly-deploy-strategy.md — why --strategy=immediate.

Triggers for update

Update this chapter if you:

  • Add or retire a frontend (a new app/ directory).
  • Move a service between clouds (Vercel ↔ Fly).
  • Add or retire a third-party integration listed in the table.
  • Change the pgbouncer / Postgres topology.
  • Add a third materialized view or a new cache layer.