Three rec surfaces, two ML models, content embeddings, MMR diversity, LLM-written explanations, an admin dashboard with live KPIs — all wired up against a real catalogue and 50 simulated shoppers.
The loader first tries the McAuley Amazon Electronics 5-core ratings file. If that download fails (firewalled environments, Railway sandboxes), it falls back to a synthetic catalogue. Either path produces realistic shapes that the models can actually learn from.
Seeding synthetic catalogue + interactions... Catalogue: 75 products across 5 categories Users: 5 demo + 45 background Interactions: ~1,100 weighted (purchase/cart/view) Cart seeds: 2 demo users Training: ALS factors=32 · iter=20 FP-Growth min_support=0.02 Content (MiniLM) 75 product embeddings Pre-generating Groq explanations (top 50 × 20)... Done.
Two models cover three placements. A content-embedding model fills the gap when collaborative signal is too sparse, and an MMR pass keeps the feed from collapsing into a single category.
ALS scores every catalogue item against the user's latent vector. The top candidates are re-ranked with MMR (λ = 0.7) over content embeddings so the feed spans categories instead of stacking lookalikes.
ALS item-item cosine similarity gives the base ranking. A content-embedding similarity is blended in (0.6 ALS + 0.4 content) so cold items still surface, and a same-category boost keeps the rail coherent.
FP-Growth mines purchase baskets for frequent itemsets, then association rules surface co-purchase candidates by confidence × lift. Same-category boost again to keep the suggestions relevant.
Same FP-Growth rule book, antecedents now match items already in the cart. Falls back to category-popular items if no rule fires, so the slot is never empty.
Every recommended card gets a single 15-word sentence written by Groq's llama-3.3-70b-versatile, grounded in the user's last 5 viewed titles. The top 50 products × 20 most active users are pre-generated after each retrain so the first page load is instant.
APScheduler retrains all three models every 24 h. Redis caches each surface for 30 min and each LLM explanation for an hour. The admin Retrain button busts the cache on demand.
Implicit-feedback matrix factorisation. Replaced TruncatedSVD because ALS is purpose-built for weighted view/cart/purchase data.
Faster than Apriori — tree-based mining of frequent itemsets, no candidate explosion. Adaptive min_support halving so it always finds rules.
all-MiniLM-L6-v2 encodes title + category as 384-dim vectors. Used both for the similar-items blend and the MMR diversity pass.
Maximal Marginal Relevance balances per-item relevance against similarity to already-picked items. λ = 0.7 keeps the top end relevant.
Everything the storefront generates flows into a single dashboard with a sidebar, status banner, date-range filter and optional 30-second auto-refresh.
Sample shape — live numbers shown on the actual dashboard.
Switch between five demo shoppers, browse the storefront, then flip over to the admin dashboard and watch the KPIs tick up.