An AI-native insights engine that gives small-business sellers plain-English answers, verified metrics, and actionable recommendations — grounded in real data, never hallucinated.
Point-of-sale systems capture rich transaction data, but turning that data into decisions still requires time, expertise, and tools most sellers don't have.
Square sellers generate thousands of transactions daily. But raw dashboards don't tell you what to do. Sellers need answers, not charts — "Should I reorder oat milk?" not "Here's a line graph."
Generic AI chatbots will confidently fabricate sales numbers. A seller asking "What was my revenue last week?" deserves a number computed from their actual data — not a plausible guess.
A coffee shop's Tuesday revenue is shaped by weather, a nearby farmers market, and a bad Yelp review — signals invisible to traditional POS analytics. Sellers operate in neighborhoods, not spreadsheets.
Seller Ops Copilot turns natural-language questions into data-grounded insights with actionable recommendations — no SQL, no dashboards, no guesswork.
Ask "How were sales this week?" or "Am I staffed properly for peak hours?" in plain English. The copilot understands context, handles ambiguity, and asks clarifying questions when needed.
Every number displayed comes from a tool call against your actual data. Revenue, order counts, top items, staffing ratios — all computed, never generated. Zod schema validation ensures structural integrity.
Up to 3 ranked action cards with confidence levels (high/med/low), rationale grounded in data, expected impact estimates, and explicit assumptions. You know exactly what the copilot is confident about and why.
Flip one switch to overlay weather forecasts, local events, and review sentiment onto your analysis. The copilot weaves external signals into recommendations while clearly labeling what comes from where.
Every response includes a sources array citing which tools were called, what parameters were used, and which heuristic rules were applied. Full traceability from insight back to data.
Bookmark any insight for later reference. Saved insights persist locally and can be reviewed on a dedicated page — building an institutional memory for the business.
A clean three-layer architecture: React UI → Express API → LLM Agent with typed tool calls against structured data.
┌─────────────────────────────┐
│ React + Vite UI │ Port 5173
│ Chat interface, Metric & │ (Tailwind CSS)
│ Action cards, Save/Review │
└──────────┬──────────────────┘
│ POST /api/chat
│ { message, neighborhoodContextEnabled }
▼
┌─────────────────────────────┐
│ Express API Server │ Port 3001
│ │
│ ┌───────────────────────┐ │
│ │ LLM Agent Loop │ │ GPT-4o (tool_choice: auto)
│ │ System Prompt + Zod │ │ Max 8 rounds
│ │ Schema Validation │ │ Temperature: 0.2
│ └───────────┬───────────┘ │
│ │ │
│ ┌───────────▼───────────┐ │
│ │ Tool Execution Layer │ │ 6 typed functions
│ │ getSalesSummary │ │ ├── Internal data (5 tools)
│ │ getHourlySales │ │ └── Neighborhood context (1 tool, gated)
│ │ getTopItems │ │
│ │ getInventoryStatus │ │
│ │ getStaffingSignals │ │
│ │ getNeighborhoodContext│ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌───────────▼───────────┐ │
│ │ Stub Data (JSON) │ │ 30 days of deterministic
│ │ sales, inventory, │ │ coffee-shop data
│ │ staffing, weather, │ │ (seed.cjs generator)
│ │ events, reviews │ │
│ └───────────────────────┘ │
└─────────────────────────────┘
| Layer | Technology | Why |
|---|---|---|
| Frontend | React 19 TypeScript Vite Tailwind CSS | Fast iteration, type safety, modern DX |
| Backend | Express TypeScript tsx | Minimal footprint, easy to extend |
| AI | OpenAI GPT-4o function calling | Best-in-class tool use, structured output |
| Validation | Zod | Runtime schema enforcement on LLM output |
| Data | Stubbed JSON + deterministic seed | Offline-first, reproducible demos |
| Routing | React Router v7 | Client-side SPA routing for Chat/Saved pages |
The core insight: AI should call tools, not generate numbers. Every architectural choice reinforces data integrity.
Instead of asking the LLM to "write SQL" or "estimate revenue," we give it typed tool functions. The model decides which tools to call and with what parameters — but the actual computation happens in deterministic TypeScript code against real data. This means:
// The model calls this — it never computes revenue itself
export function getSalesSummary(args: {
startDate: string;
endDate: string;
}) {
const sales = loadJSON("sales.json");
const range = filterByRange(sales, args.startDate, args.endDate);
return {
total_revenue: round2(range.reduce(...)),
order_count: range.reduce(...),
avg_order_value: round2(...),
};
}
// Zod validates every LLM response at runtime
export const OutputSchema = z.object({
answer_markdown: z.string().trim().min(1).max(2500),
metrics: z.array(MetricCardSchema).max(6),
actions: z.array(ActionCardSchema).max(3),
sources: z.array(SourceSchema).min(1),
warnings: z.array(z.string()).max(10),
followups: z.array(z.string()).max(3),
});
// If the LLM returns invalid JSON, we throw immediately
const result = OutputSchema.safeParse(parsed);
if (!result.success) throw new Error(`Schema validation failed`);
The LLM's final output is parsed as JSON and validated against a strict Zod schema at runtime. This is not optional — if the model produces malformed output, extra fields, or missing required properties, the request fails immediately rather than surfacing garbage to the seller.
The schema enforces constraints like: metrics arrays capped at 6 cards, confidence must be
one of ["low","med","high"], sources must contain at least 1 entry, and
the answer must be under 2,500 characters. This keeps outputs predictable and UI-safe.
Gated Neighborhood Context. The getNeighborhoodContext tool is physically removed from the tool list when the toggle is off. The model cannot call what it cannot see — a defense-in-depth approach that doesn't rely on prompt compliance alone.
seed.cjs script uses a mulberry32 PRNG seeded with a fixed value, producing identical data on every run for reproducible demos.sources.min(1) — the model must cite at least one tool or rule. Combined with the system prompt, this creates accountability for every claim.Small-business sellers don't need more dashboards. They need someone to tell them what to do — and why.
A seller currently spends 30+ minutes per week manually reviewing dashboards, exporting CSVs, and trying to spot patterns. The copilot delivers the same insight in a single conversational turn — typically under 10 seconds.
"Your revenue was $8,200 this week" is informational. "Add a morning barista shift to capture ~$200/day in unmet demand during your 7-9 AM peak" is actionable. Every recommendation includes confidence, rationale, and assumptions.
Research from the San Francisco Fed shows weather alone can shift retail spending by ~3.25%. Local events can drive 10-30% foot traffic spikes within 0.3 miles. The copilot weaves these signals into recommendations automatically.
The primary persona is a small-business seller (coffee shop owner, bakery operator, boutique retail manager) who uses Square as their POS system. They're typically:
The BI market is massive and growing, but SMBs remain underserved. AI-native interfaces can bridge the gap.
Three converging trends make this the right moment for AI-native seller tools:
1. LLM tool-calling maturity. GPT-4o's function calling is reliable enough for production use. Models can reason about which tools to invoke, handle multi-step queries, and produce structured output. This wasn't possible 18 months ago.
2. Square's data moat. With ~289K online stores and millions of in-person sellers, Square already has the transaction data. The missing piece is an AI layer that makes this data conversationally accessible — especially for micro-merchants who will never open a BI tool.
3. Agentic commerce is here. BCG reports that AI-powered retail agents are redefining commerce, with 54% of retailers citing faster decision-making as their top AI priority. Sellers who don't have AI-powered insights will fall behind sellers who do.
Existing solutions fall short for the small-business seller:
| Approach | Limitation |
|---|---|
| Square Dashboard | Charts and tables, not conversational. No recommendations. No neighborhood context. |
| Generic ChatGPT | No access to seller data. Hallucinates numbers. No structured output. |
| Enterprise BI (Tableau, Looker) | Expensive, complex, requires data engineering. Overkill for a coffee shop. |
| Vertical SaaS analytics | Pre-built dashboards, not conversational. No AI-native interaction model. |
Seller Ops Copilot occupies a unique position: conversational + data-grounded + neighborhood-aware + actionable. No existing product combines all four.
This MVP demonstrates the core interaction pattern. Here's how it evolves into a production-grade feature within Square's ecosystem.
Fully functional prototype with stubbed data. Demonstrates the complete AI loop: natural-language input → tool calling → validated structured output → rendered UI. Runs locally in under 3 minutes of setup.
Connect to Square's APIs (Transactions, Catalog, Labor, Inventory) behind a feature flag. Add real weather (OpenWeather), events (PredictHQ), and review (Yelp Fusion) integrations. Implement OAuth 2.0 flow for seller authentication.
Build an evaluation harness with golden-answer test cases for tool-calling accuracy. Add streaming responses (SSE) for perceived latency improvement. Implement conversation memory for multi-turn follow-ups. Add guardrail monitoring and LLM output logging.
Embed as a native feature within Square Dashboard. Proactive alerts ("Your espresso beans run out tomorrow"). Cross-seller benchmarking with privacy-preserving aggregation. Export insights to email, Slack, or PDF. Mobile-optimized interface for on-the-go sellers.
Move from recommendations to execution. Auto-generate purchase orders when inventory hits reorder points. Suggest and publish shift schedule adjustments. Dynamic pricing recommendations for perishable items. The copilot becomes an operating system for the business.
Prerequisites: Node.js ≥ 18 and an OpenAI API key with GPT-4o access.
git clone https://github.com/rithwikgokhale/Seller-Ops-Copilot.git && cd Seller-Ops-Copilot
This installs root, server, and client packages automatically.
npm install
cp .env.example .env && open .env
Launches Express on :3001 and Vite on :5173 concurrently.
npm run dev
Navigate to http://localhost:5173 and ask your first question.
Demo tip: Try these questions to explore all capabilities: "How were my sales this week?" · "What are my top-selling items?" · "Do I have inventory issues?" · "Am I properly staffed for peak hours?" · Toggle Neighborhood Context on and ask "How might weather and events affect my business?"
Scoping is a product skill. Here's what I deliberately excluded from v1 and why.
Stub JSON keeps the demo self-contained and offline. The tool interface is identical whether reading JSON or querying Postgres — swapping is a one-file change.
Each question is independent. This simplifies the demo while avoiding context window management complexity. Phase 3 adds conversation memory.
A demo doesn't need OAuth. The architecture is auth-ready: the API layer naturally supports middleware injection for Square's OAuth flow.
This runs locally. Production would add rate limiting, error monitoring, LLM output logging, and cost controls — all straightforward with the current architecture.
This project was built as a work sample for Block/Square's AI-native Associate Product Manager program. It demonstrates product thinking (problem framing, scoping, roadmap), technical depth (AI tool-calling architecture, schema validation, guardrails), and shipping quality (polished UI, easy setup, comprehensive documentation).