Architecture at a glance
The shape
MerchOS is a pnpm + Turborepo monorepo with one API and three frontends, deployed to two clouds:
yourcustommerch/merchos├── apps/│ ├── api/ NestJS — single backend for everything│ ├── storefront/ Next.js — store.yourcustommerch.ca│ ├── company-admin/ Next.js — admin.yourcustommerch.ca│ ├── internal-admin/ Next.js — internal.yourcustommerch.ca│ └── docs/ Astro Starlight — this handbook├── packages/│ ├── types/ Shared TS types (Prisma → public types)│ └── utils/ Cross-app helpers├── prisma/ Schema + migrations + seed├── docs/ ADRs, service notes, runbooks, PHASES.md└── scripts/ Operator-run jobs (backfills, audits, etc.)One API, three frontends, two clouds is the rule of thumb. If you’re about to write a fourth frontend or a second API, stop and ask whether you’re trying to fit a square peg.
What runs where
| URL | Backend | Purpose |
|---|---|---|
yourcustommerch.ca + www.* | Vercel (Astro) | Marketing site (separate repo) |
store.yourcustommerch.ca | Vercel (Next.js) | Storefront (employees + anonymous) |
admin.yourcustommerch.ca | Vercel (Next.js) | Company admin portal |
internal.yourcustommerch.ca | Vercel (Next.js) | YCM internal admin |
api.yourcustommerch.ca | Fly.io (Toronto / yyz) | NestJS API |
cdn.yourcustommerch.ca | Cloudflare R2 | Image CDN (mockups, BrandKit assets) |
staging-*.yourcustommerch.ca | mirrors above | Staging environment |
Two clouds: Vercel for frontends (fast global CDN, instant rollbacks, preview deploys per PR), Fly.io for the API (long-lived processes, cron jobs, region-pinned in Toronto for sub-millisecond Postgres hops). Each cloud’s strengths line up with the workload — splitting them isn’t accidental.
Why Fly + yyz specifically
Postgres is Fly Managed Postgres (flympg) in yyz. The API runs in yyz. The pgbouncer between them runs in yyz. All three colocated. Latency from API → DB is sub-millisecond. Latency from API → user is dominated by Vercel’s edge, not by Fly.
There’s a second reason: Canadian data residency. Some YCM customers — churches, nonprofits, municipal departments — have policies against personal data leaving Canada. Hosting in yyz removes that as a conversation. CloudFront / IAD / EWR are off the table.
The dependency map
Browser │ ├──→ Vercel edge (static + SSR for the 3 Next.js apps + Astro marketing + Astro docs) │ │ │ └──→ api.yourcustommerch.ca (Fly.io edge → Toronto machines) │ │ │ └──→ Fly Managed Postgres (via pgbouncer, session mode) │ └──→ cdn.yourcustommerch.ca (Cloudflare R2)Every browser request goes through Vercel for HTML, then directly to the Fly API for data. The Fly API never talks to Vercel — it doesn’t know Vercel exists. R2 is for object storage (mockup PNGs, BrandKit logos); the API generates signed URLs and the browser fetches directly.
Third-party integrations
Read these as switches the API flips. Every integration has a service module under apps/api/src/modules/; canonical reading is the relevant service note under docs/services/.
| Integration | Module | What it does | Service note |
|---|---|---|---|
| Stripe | billing/ | Subscriptions, Checkout, PaymentIntents, webhooks, Stripe Tax | billing.md |
| Postmark | notifications/ | Transactional email (intake, invites, password reset, order updates) | notifications.md |
| Gelato | suppliers/ | Apparel + non-apparel print-on-demand. CA fulfillment via gln-toronto hub. | suppliers.md |
| Printful | suppliers/ | Second supplier. EU/US-fulfilled with CA-friendly catalog subset. | suppliers.md |
| Anthropic Claude API | ai-classifier/, brand-scraper/ | Vertical-suggestion classification on intake + BrandKit synthesis. | ai-classifier.md, brand-scraper.md |
| Brandfetch | brand-scraper/ | Logo + brand-color discovery on intake. | brand-scraper.md |
| Cloudflare R2 | images/ | Object storage for mockups + BrandKit assets. | images.md |
| Cloudflare Turnstile | turnstile/ | Bot detection on the public intake form. | _small-modules.md §2 |
| Twilio | notifications/ | SMS (stubbed; not active pre-launch). | notifications.md |
| Google Places API | (intake flow only) | Address autocomplete on intake. | n/a |
Process architecture
Inside the Fly API process:
- Express HTTP server on port 4000 — every request hits a NestJS controller.
- Cron jobs registered via
@Cron(...)decorators — currently a dozen, including nightly Gelato sync, weekly Gelato CA-availability refresh, every-30-minSkuVariantCellStaterefresh, Sunday weekly tier audit. - BullMQ workers for mockup composition + designer-queue events. Backed by Upstash Redis.
There’s one Fly app, two machines, both running the full process (cron + HTTP + workers). After 2026-05-19 the fleet is at min_machines_running = 2 so detached operator scripts survive the autoscaler.
Data plane
Postgres is the durable store. There is no Redis-backed cache layer for application data, no Elasticsearch for search, no separate read replica. Everything flows through one Prisma client → one pgbouncer → one Postgres primary.
Two materialized views earn their keep:
raw_catalog_styles— aggregates ~177KSupplierVariantRawrows into ~5K style-group rows so the Raw Catalog UI can render without OOM-killing the API.master_sku_summary— denormalized fulfillment-readiness flags for the Master catalog list.
Both refresh on schedule (nightly) + on supplier-sync completion. See Catalog pipeline for why these exist and what they’re carrying.
Deployment cadence
- Vercel auto-deploys every push to
mainfor the four frontends. - Fly deploys manually via
flyctl deploy --strategy=immediateper backend commit. Per CLAUDE.md §1.4.1: backend commits must be deployed + verified before the next commit. This rule is the painful lesson from Phase 0124 (10.5h outage) and Phase 0125 (stacked-commit migration cascade). - Migrations run as Fly
release_command— a separate ephemeral machine that runsprisma migrate deploybefore the new image starts taking traffic. If migrations fail, the deploy aborts and traffic stays on the old image.
See Operations for the deploy runbook + rollback procedure.
What’s next
- Domain model — the entities and their relationships.
- Pricing models — the three families that drive billing + UX divergence.
Canonical sources
STACK.md— the source-of-truth list of dependencies + cloud accounts.apps/api/fly.toml— Fly config (min_machines_running, memory, release_command).- CLAUDE.md §11 — production URL table.
docs/operations/db-connection.md— pgbouncer session-mode rationale.docs/operations/fly-deploy-strategy.md— why--strategy=immediate.
Triggers for update
Update this chapter if you:
- Add or retire a frontend (a new app/ directory).
- Move a service between clouds (Vercel ↔ Fly).
- Add or retire a third-party integration listed in the table.
- Change the pgbouncer / Postgres topology.
- Add a third materialized view or a new cache layer.