Seller Ops Copilot – AI-Powered Insights for Small Business Sellers

The Problem

Small businesses are drowning in data they can't use

Point-of-sale systems capture rich transaction data, but turning that data into decisions still requires time, expertise, and tools most sellers don't have.

⚠

Data Overload, Insight Poverty

Square sellers generate thousands of transactions daily. But raw dashboards don't tell you what to do. Sellers need answers, not charts — "Should I reorder oat milk?" not "Here's a line graph."

✖

AI Hallucination Risk

Generic AI chatbots will confidently fabricate sales numbers. A seller asking "What was my revenue last week?" deserves a number computed from their actual data — not a plausible guess.

🌎

Missing Neighborhood Context

A coffee shop's Tuesday revenue is shaped by weather, a nearby farmers market, and a bad Yelp review — signals invisible to traditional POS analytics. Sellers operate in neighborhoods, not spreadsheets.

The Solution

Ask questions. Get verified answers.

Seller Ops Copilot turns natural-language questions into data-grounded insights with actionable recommendations — no SQL, no dashboards, no guesswork.

💬

Natural-Language Queries

Ask "How were sales this week?" or "Am I staffed properly for peak hours?" in plain English. The copilot understands context, handles ambiguity, and asks clarifying questions when needed.

📈

Verified Metric Cards

Every number displayed comes from a tool call against your actual data. Revenue, order counts, top items, staffing ratios — all computed, never generated. Zod schema validation ensures structural integrity.

🎯

Actionable Recommendations

Up to 3 ranked action cards with confidence levels (high/med/low), rationale grounded in data, expected impact estimates, and explicit assumptions. You know exactly what the copilot is confident about and why.

🌦

Neighborhood Context Toggle

Flip one switch to overlay weather forecasts, local events, and review sentiment onto your analysis. The copilot weaves external signals into recommendations while clearly labeling what comes from where.

🔖

Source Citations

Every response includes a sources array citing which tools were called, what parameters were used, and which heuristic rules were applied. Full traceability from insight back to data.

💾

Save & Review Insights

Bookmark any insight for later reference. Saved insights persist locally and can be reviewed on a dedicated page — building an institutional memory for the business.

Architecture

System design & technical stack

A clean three-layer architecture: React UI → Express API → LLM Agent with typed tool calls against structured data.

  ┌─────────────────────────────┐
  │        React + Vite UI      │     Port 5173
  │  Chat interface, Metric &   │     (Tailwind CSS)
  │  Action cards, Save/Review  │
  └──────────┬──────────────────┘
             │  POST /api/chat
             │  { message, neighborhoodContextEnabled }
             ▼
  ┌─────────────────────────────┐
  │       Express API Server    │     Port 3001
  │                             │
  │  ┌───────────────────────┐  │
  │  │     LLM Agent Loop    │  │     GPT-4o (tool_choice: auto)
  │  │  System Prompt + Zod  │  │     Max 8 rounds
  │  │  Schema Validation    │  │     Temperature: 0.2
  │  └───────────┬───────────┘  │
  │              │              │
  │  ┌───────────▼───────────┐  │
  │  │   Tool Execution Layer │  │     6 typed functions
  │  │  getSalesSummary      │  │     ├── Internal data (5 tools)
  │  │  getHourlySales       │  │     └── Neighborhood context (1 tool, gated)
  │  │  getTopItems          │  │
  │  │  getInventoryStatus   │  │
  │  │  getStaffingSignals   │  │
  │  │  getNeighborhoodContext│  │
  │  └───────────┬───────────┘  │
  │              │              │
  │  ┌───────────▼───────────┐  │
  │  │   Stub Data (JSON)    │  │     30 days of deterministic
  │  │  sales, inventory,    │  │     coffee-shop data
  │  │  staffing, weather,   │  │     (seed.cjs generator)
  │  │  events, reviews      │  │
  │  └───────────────────────┘  │
  └─────────────────────────────┘

Tech Stack

Layer	Technology	Why
Frontend	`React 19` `TypeScript` `Vite` `Tailwind CSS`	Fast iteration, type safety, modern DX
Backend	`Express` `TypeScript` `tsx`	Minimal footprint, easy to extend
AI	`OpenAI GPT-4o` function calling	Best-in-class tool use, structured output
Validation	`Zod`	Runtime schema enforcement on LLM output
Data	Stubbed JSON + deterministic seed	Offline-first, reproducible demos
Routing	`React Router v7`	Client-side SPA routing for Chat/Saved pages

AI-Native Design

How we prevent hallucination and ensure trust

The core insight: AI should call tools, not generate numbers. Every architectural choice reinforces data integrity.

Tool-Calling Over Raw Generation

Instead of asking the LLM to "write SQL" or "estimate revenue," we give it typed tool functions. The model decides which tools to call and with what parameters — but the actual computation happens in deterministic TypeScript code against real data. This means:

Every number is traceable to a specific tool call
The model can never fabricate a sales figure
Tool outputs are included in the sources array
If data is missing, the tool returns empty — not the model

// The model calls this — it never computes revenue itself
export function getSalesSummary(args: {
  startDate: string;
  endDate: string;
}) {
  const sales = loadJSON("sales.json");
  const range = filterByRange(sales, args.startDate, args.endDate);
  return {
    total_revenue: round2(range.reduce(...)),
    order_count: range.reduce(...),
    avg_order_value: round2(...),
  };
}
        

// Zod validates every LLM response at runtime
export const OutputSchema = z.object({
  answer_markdown: z.string().trim().min(1).max(2500),
  metrics: z.array(MetricCardSchema).max(6),
  actions: z.array(ActionCardSchema).max(3),
  sources: z.array(SourceSchema).min(1),
  warnings: z.array(z.string()).max(10),
  followups: z.array(z.string()).max(3),
});

// If the LLM returns invalid JSON, we throw immediately
const result = OutputSchema.safeParse(parsed);
if (!result.success) throw new Error(`Schema validation failed`);
        

Zod Schema as a Guardrail

The LLM's final output is parsed as JSON and validated against a strict Zod schema at runtime. This is not optional — if the model produces malformed output, extra fields, or missing required properties, the request fails immediately rather than surfacing garbage to the seller.

The schema enforces constraints like: metrics arrays capped at 6 cards, confidence must be one of ["low","med","high"], sources must contain at least 1 entry, and the answer must be under 2,500 characters. This keeps outputs predictable and UI-safe.

🔒

Gated Neighborhood Context. The getNeighborhoodContext tool is physically removed from the tool list when the toggle is off. The model cannot call what it cannot see — a defense-in-depth approach that doesn't rely on prompt compliance alone.

Design Decisions Summary

Tool-calling vs. raw SQL: Model calls typed functions; never writes SQL. Eliminates injection risk and hallucination of query results.
Temperature 0.2: Low temperature reduces creative drift while allowing the model enough flexibility to synthesize tool results into natural language.
Max 8 tool rounds: Safety circuit breaker prevents infinite loops if the model keeps requesting tools. In practice, most queries resolve in 1-2 rounds.
Markdown fence stripping: Even when instructed to output raw JSON, models sometimes wrap in code fences. The parser strips these defensively.
Deterministic seed data: The seed.cjs script uses a mulberry32 PRNG seeded with a fixed value, producing identical data on every run for reproducible demos.
Source citation requirement: The schema mandates sources.min(1) — the model must cite at least one tool or rule. Combined with the system prompt, this creates accountability for every claim.

Business Value

Why this matters for sellers

Small-business sellers don't need more dashboards. They need someone to tell them what to do — and why.

⏱

Minutes, Not Hours

A seller currently spends 30+ minutes per week manually reviewing dashboards, exporting CSVs, and trying to spot patterns. The copilot delivers the same insight in a single conversational turn — typically under 10 seconds.

💡

Actionable, Not Informational

"Your revenue was $8,200 this week" is informational. "Add a morning barista shift to capture ~$200/day in unmet demand during your 7-9 AM peak" is actionable. Every recommendation includes confidence, rationale, and assumptions.

🌎

Neighborhood-Aware Decisions

Research from the San Francisco Fed shows weather alone can shift retail spending by ~3.25%. Local events can drive 10-30% foot traffic spikes within 0.3 miles. The copilot weaves these signals into recommendations automatically.

Who It Serves

The primary persona is a small-business seller (coffee shop owner, bakery operator, boutique retail manager) who uses Square as their POS system. They're typically:

Time-constrained (wearing multiple hats)
Data-curious but not data-literate
Making daily operational decisions (staffing, ordering, promotions)
Hyper-local (their business is shaped by their physical neighborhood)

Seller Impact Framework

Inventory waste reduction: Low-stock alerts + days-until-stockout prevent both stockouts and over-ordering
Labor cost optimization: Staffing signals flag under/overstaffed shifts against actual peak traffic
Revenue opportunity capture: Unmet demand during peaks, event-driven traffic, and weather-adjusted planning
Decision confidence: Explicit confidence levels + assumptions give sellers warranted trust in recommendations

Market Context

The opportunity in AI-native seller tools

The BI market is massive and growing, but SMBs remain underserved. AI-native interfaces can bridge the gap.

$31.8B Global BI & analytics market (2026)

2x Revenue growth for SMBs using analytics vs. those that don't

8% Of retailers have scaled AI analytics beyond pilots

3.25% Revenue impact of weather on retail spending (SF Fed)

Why Now

Three converging trends make this the right moment for AI-native seller tools:

1. LLM tool-calling maturity. GPT-4o's function calling is reliable enough for production use. Models can reason about which tools to invoke, handle multi-step queries, and produce structured output. This wasn't possible 18 months ago.

2. Square's data moat. With ~289K online stores and millions of in-person sellers, Square already has the transaction data. The missing piece is an AI layer that makes this data conversationally accessible — especially for micro-merchants who will never open a BI tool.

3. Agentic commerce is here. BCG reports that AI-powered retail agents are redefining commerce, with 54% of retailers citing faster decision-making as their top AI priority. Sellers who don't have AI-powered insights will fall behind sellers who do.

Competitive Landscape

Existing solutions fall short for the small-business seller:

Approach	Limitation
Square Dashboard	Charts and tables, not conversational. No recommendations. No neighborhood context.
Generic ChatGPT	No access to seller data. Hallucinates numbers. No structured output.
Enterprise BI (Tableau, Looker)	Expensive, complex, requires data engineering. Overkill for a coffee shop.
Vertical SaaS analytics	Pre-built dashboards, not conversational. No AI-native interaction model.

Seller Ops Copilot occupies a unique position: conversational + data-grounded + neighborhood-aware + actionable. No existing product combines all four.

Roadmap

From MVP to platform feature

This MVP demonstrates the core interaction pattern. Here's how it evolves into a production-grade feature within Square's ecosystem.

Phase 1 — MVP (Current)

Fully functional prototype with stubbed data. Demonstrates the complete AI loop: natural-language input → tool calling → validated structured output → rendered UI. Runs locally in under 3 minutes of setup.

Shipped 6 tools Zod validation Neighborhood toggle Offline-first

Phase 2 — Real Data Integration

Connect to Square's APIs (Transactions, Catalog, Labor, Inventory) behind a feature flag. Add real weather (OpenWeather), events (PredictHQ), and review (Yelp Fusion) integrations. Implement OAuth 2.0 flow for seller authentication.

Square API OAuth 2.0 Feature flags Real-time data

Phase 3 — Evaluation & Reliability

Build an evaluation harness with golden-answer test cases for tool-calling accuracy. Add streaming responses (SSE) for perceived latency improvement. Implement conversation memory for multi-turn follow-ups. Add guardrail monitoring and LLM output logging.

Eval harness SSE streaming Multi-turn context Guardrail monitoring

Phase 4 — Platform Integration

Embed as a native feature within Square Dashboard. Proactive alerts ("Your espresso beans run out tomorrow"). Cross-seller benchmarking with privacy-preserving aggregation. Export insights to email, Slack, or PDF. Mobile-optimized interface for on-the-go sellers.

Square Dashboard embed Proactive alerts Cross-seller benchmarks Mobile-first

Phase 5 — Autonomous Actions

Move from recommendations to execution. Auto-generate purchase orders when inventory hits reorder points. Suggest and publish shift schedule adjustments. Dynamic pricing recommendations for perishable items. The copilot becomes an operating system for the business.

Auto-reorder Shift scheduling Dynamic pricing Agentic workflows

Getting Started

Run it locally in under 3 minutes

Prerequisites: Node.js ≥ 18 and an OpenAI API key with GPT-4o access.

Clone the repository

git clone https://github.com/rithwikgokhale/Seller-Ops-Copilot.git && cd Seller-Ops-Copilot

Install all dependencies

This installs root, server, and client packages automatically.

npm install

Add your OpenAI API key

cp .env.example .env && open .env

Start development servers

Launches Express on :3001 and Vite on :5173 concurrently.

npm run dev

Open the app

Navigate to http://localhost:5173 and ask your first question.

💡

Demo tip: Try these questions to explore all capabilities: "How were my sales this week?" · "What are my top-selling items?" · "Do I have inventory issues?" · "Am I properly staffed for peak hours?" · Toggle Neighborhood Context on and ask "How might weather and events affect my business?"

Scope & Non-Goals

What this MVP intentionally does not do

Scoping is a product skill. Here's what I deliberately excluded from v1 and why.

No real database

Stub JSON keeps the demo self-contained and offline. The tool interface is identical whether reading JSON or querying Postgres — swapping is a one-file change.

No multi-turn memory

Each question is independent. This simplifies the demo while avoiding context window management complexity. Phase 3 adds conversation memory.

No authentication

A demo doesn't need OAuth. The architecture is auth-ready: the API layer naturally supports middleware injection for Square's OAuth flow.

No production deployment

This runs locally. Production would add rate limiting, error monitoring, LLM output logging, and cost controls — all straightforward with the current architecture.