I am creating a RAG application with streamlit and I am using Chroma DB to store my collections. Depending on the persist_collection parameter, I am using either chromadb.PersistentClient or chromadb.EphemeralClient.
def __get_chroma_client(self) -> Chroma:
# Return persistent client
if self.persist_collection:
return Chroma(
collection_name=self.collection_name,
embedding_function=OllamaEmbeddings(model="all-minilm"),
client=PersistentClient(
path="./data/chroma_db",
settings=Settings(
anonymized_telemetry=False,
)
)
)
# Return ephemeral client
return Chroma(
collection_name=self.collection_name,
embedding_function=OllamaEmbeddings(model="all-minilm"),
client=EphemeralClient(
settings=Settings(
anonymized_telemetry=False,
)
)
)
The collection is then initiated as self.__collection = self.__get_chroma_client()
Every time there are new documents I am asynchronously updating the collection
await self.__collection.aadd_documents(documents=chunks)
Then I am using the as_retriever method to get the context for my LLM model
retriever = self.__collection.as_retriever(
search_type=self.search_function,
search_kwargs=self.search_params
)
context = ' '.join(
[doc.page_content for doc in retriever.invoke(query)]
)
The problem is only present when working with chromadb.EphemeralClient.
- After the collection is created initially and is populated with documents, I can get the context.
- In the next run, new documents are added to the collection, the retriever is created, and when calling the
invokemethod the following exception is thrown:
sqlite3.OperationalError: no such table: collections
I couldn't find the solution for that yet, does anyone know the underlying rootcause? As mentioned above, it does NOT happen with PersistentClient.