Projects/AI Shopping Assistant

AI Shopping Assistant

Completed

A conversational agent using RAG, hybrid search, and SSE streaming

Vision

The goal at Bestomer was to combine our commercial data and product search capabilities into a unified, conversational interface. We wanted a chatbot that could understand a user's intent and retrieve relevant products or past purchase history to answer specific questions.

Problem Statement

  • Context Window Limits: We couldn't feed a user's entire history or the product catalog into an LLM context window.
  • Latency: Conversational interfaces need to feel instantaneous, but fetching data and running inference is slow.
  • Accuracy: Users need reliable product information, but early LLMs required extensive grounding to avoid hallucinations.

Methodology

I architected and built the real-time RAG infrastructure:

  • Hybrid Search: Leveraged Weaviate's native hybrid search to combine vector embeddings with BM25 keyword scoring, ensuring precise product retrieval.
  • Data Orchestration: Integrated structured user data from PostgreSQL with unstructured product embeddings from Weaviate to dynamically construct the LLM context.
  • Real-time Streaming: Built Server-Sent Events (SSE) pipelines in Python and TypeScript to stream LLM tokens directly to the Android and Swift mobile apps, minimizing perceived latency.
  • Model Evaluation: Leveraged OpenRouter to iterate on different models and prompting strategies, optimizing for response quality and speed.

Impact

  • Scalable Architecture: Established a robust streaming RAG pipeline that successfully bridged disparate data sources (Postgres, Weaviate) with mobile experiences.
  • Improved Retrieval: Hybrid search implementation significantly outperformed naive vector-only approaches for specific product queries.
  • Strategic Direction: The implementation provided critical insights into the capabilities of LLMs for high-trust commercial tasks, shaping the platform's future interaction models.