POC · End-to-end personalisation

Recoshop — a personalised
recommendation system, built end to end.

Three rec surfaces, two ML models, content embeddings, MMR diversity, LLM-written explanations, an admin dashboard with live KPIs — all wired up against a real catalogue and 50 simulated shoppers.

Frontend
Next.js 14 · Tailwind · recharts · TypeScript
Backend
FastAPI · SQLAlchemy · APScheduler · Pydantic
Data
PostgreSQL · Redis · Synthetic Amazon-style catalogue
Recommenders
implicit ALS · mlxtend FP-Growth · sentence-transformers
LLM
Groq · llama-3.3-70b-versatile · Pre-generated explanations
Infra
Docker · Railway (API + DB + cache) · Vercel (UI)

Dataset & simulated shoppers

The loader first tries the McAuley Amazon Electronics 5-core ratings file. If that download fails (firewalled environments, Railway sandboxes), it falls back to a synthetic catalogue. Either path produces realistic shapes that the models can actually learn from.

  • · ~75 products across Electronics, Home & Kitchen, Fitness, Books, Clothing
  • · 5 named demo users (Ava, Ben, Cleo, Dan, Erin) + 45 background users so co-purchase signal exists
  • · Each user gets 1 dominant + 1 secondary category so SVD picks up clustering
  • · Interaction weights: purchase 5 / cart 3 / view 1, ratings override when present
  • · Cart pre-seeded for the first two demo users so recovery has something to fire on
# load_dataset.py
Seeding synthetic catalogue + interactions...
Catalogue:      75 products across 5 categories
Users:           5 demo + 45 background
Interactions:   ~1,100 weighted (purchase/cart/view)
Cart seeds:      2 demo users
Training:
  ALS              factors=32 · iter=20
  FP-Growth        min_support=0.02
  Content (MiniLM) 75 product embeddings
Pre-generating Groq explanations (top 50 × 20)...
Done.

The recommendation engine

Two models cover three placements. A content-embedding model fills the gap when collaborative signal is too sparse, and an MMR pass keeps the feed from collapsing into a single category.

#1
Personalised homepage feed

ALS scores every catalogue item against the user's latent vector. The top candidates are re-ranked with MMR (λ = 0.7) over content embeddings so the feed spans categories instead of stacking lookalikes.

ALS + MMR + Groq
#2
You might also like

ALS item-item cosine similarity gives the base ranking. A content-embedding similarity is blended in (0.6 ALS + 0.4 content) so cold items still surface, and a same-category boost keeps the rail coherent.

ALS + content blend
#3
Customers who bought this also bought

FP-Growth mines purchase baskets for frequent itemsets, then association rules surface co-purchase candidates by confidence × lift. Same-category boost again to keep the suggestions relevant.

FP-Growth rules
#4
Cart recovery

Same FP-Growth rule book, antecedents now match items already in the cart. Falls back to category-popular items if no rule fires, so the slot is never empty.

FP-Growth + popularity fallback
#5
Groq explanations

Every recommended card gets a single 15-word sentence written by Groq's llama-3.3-70b-versatile, grounded in the user's last 5 viewed titles. The top 50 products × 20 most active users are pre-generated after each retrain so the first page load is instant.

Groq · pre-cached in Redis
#6
Background retraining + caching

APScheduler retrains all three models every 24 h. Redis caches each surface for 30 min and each LLM explanation for an hour. The admin Retrain button busts the cache on demand.

APScheduler · Redis

Under the hood

ALS

Implicit-feedback matrix factorisation. Replaced TruncatedSVD because ALS is purpose-built for weighted view/cart/purchase data.

FP-Growth

Faster than Apriori — tree-based mining of frequent itemsets, no candidate explosion. Adaptive min_support halving so it always finds rules.

Sentence Transformers

all-MiniLM-L6-v2 encodes title + category as 384-dim vectors. Used both for the similar-items blend and the MMR diversity pass.

MMR

Maximal Marginal Relevance balances per-item relevance against similarity to already-picked items. λ = 0.7 keeps the top end relevant.

What the admin dashboard surfaces

Everything the storefront generates flows into a single dashboard with a sidebar, status banner, date-range filter and optional 30-second auto-refresh.

  • · Model health banner — ALS / FP-Growth / Content fitted + last-trained relative time
  • · Six live KPIs with Δ% vs previous period and tiny inline sparklines
  • · Served per day line chart, range-aware
  • · Surface breakdown — impressions, clicks, converts and CTR per surface (homepage / similar / co-purchase / cart)
  • · Conversion funnel with drop-off % between each stage
  • · Top recommended products with CTR, category filter
  • · User segments — new / active / dormant by 14-day window
  • · Retrain button — fires ALS + FP-Growth + Content + Groq pre-gen, toasts on completion
Served (7d)
1,284
+22%
CTR
4.6%
+0.8 pp
Conversion
1.9%
+0.3 pp
Abandonment
62%
−4 pp
ALS items
75
fitted
FP-Growth rules
126
fresh

Sample shape — live numbers shown on the actual dashboard.

Want to poke at it?

Switch between five demo shoppers, browse the storefront, then flip over to the admin dashboard and watch the KPIs tick up.