Running Billion-Scale Similarity Search at LLM Production Systems

Search databases are integral to information retrieval and facilitation of similarity-based content discovery, playing a pivotal role in generative AI tech stack. Within this framework, the synergy of lexical and vector search algorithms enables the combination of structured and unstructured data, optimizing the precision, diversity and efficiency of Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) systems. As a result, the demand for powerful search infrastructure is higher than ever: organization data is growing exponentially, while millions to billions of documents are becoming standard and the search queries become more resource-intensive, presenting significant performance, scalability and cost challenges at production-grade systems. In this session we will explore various techniques to address those challenges, from space reduction to approximation methods as well as a range of acceleration technologies, which are highly parallelizable and designed for large scale real-time production systems. Transitioning from lab to production is not always seamless and contributing to the failure of numerous AI projects, the session target is to provide insights and solutions directly impacting the efficiency and success of future Gen AI applications.

Details

Tuesday, September 24 3:05pm-3:45pm in Continental BR 1-3

Track: Search

Speakers

Ohad Levi photograph

Ohad Levi

CEO and Co-Founder at Hyperspace