Vision

The goal at Bestomer was to combine our commercial data and product search capabilities into a unified, conversational interface. We wanted a chatbot that could understand a user's intent and retrieve relevant products or past purchase history to answer specific questions.

Problem Statement

Context Window Limits: We couldn't feed a user's entire history or the product catalog into an LLM context window.
Latency: Conversational interfaces need to feel instantaneous, but fetching data and running inference is slow.
Accuracy: Users need reliable product information, but early LLMs required extensive grounding to avoid hallucinations.

Methodology

I architected and built the real-time RAG infrastructure:

Hybrid Search: Leveraged Weaviate's native hybrid search to combine vector embeddings with BM25 keyword scoring, ensuring precise product retrieval.
Data Orchestration: Integrated structured user data from PostgreSQL with unstructured product embeddings from Weaviate to dynamically construct the LLM context.
Real-time Streaming: Built Server-Sent Events (SSE) pipelines in Python and TypeScript to stream LLM tokens directly to the Android and Swift mobile apps, minimizing perceived latency.
Model Evaluation: Leveraged OpenRouter to iterate on different models and prompting strategies, optimizing for response quality and speed.

Impact

Scalable Architecture: Established a robust streaming RAG pipeline that successfully bridged disparate data sources (Postgres, Weaviate) with mobile experiences.
Improved Retrieval: Hybrid search implementation significantly outperformed naive vector-only approaches for specific product queries.
Strategic Direction: The implementation provided critical insights into the capabilities of LLMs for high-trust commercial tasks, shaping the platform's future interaction models.

AI Shopping Assistant

Vision

Problem Statement

Methodology

Impact