Cannot load persisted db using Chroma / Langchain

Question

I ingested all docs and created a collection / embeddings using Chroma. I have a local directory db. Within db there is chroma-collections.parquet and chroma-embeddings.parquet. These are not empty. Chroma-collections.parquet when opened returns a collection name, uuid, and null metadata.

When I load it up later using langchain, nothing is here.

from langchain.vectorstores import Chroma

embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
CHROMA_SETTINGS = Settings(
        chroma_db_impl='duckdb+parquet',
        persist_directory='db',
        anonymized_telemetry=False
)

db = Chroma(persist_directory='db', embedding_function=embeddings, client_settings=CHROMA_SETTINGS)

db.get() returns {'ids': [], 'embeddings': None, 'documents': [], 'metadatas': []}

I've tried lots of other alternate approaches online. E.g.

import chromadb

client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet",
                                    persist_directory='./db'))
coll = client.get_or_create_collection("langchain", embedding_function=embeddings)
coll.count() returns 0

I'm expecting all the docs and embeddings to be available. What am I missing?

directly remove the chroma_db_impl in chroma_settings. But I still meeting the problem that the database files didn't created after db.persist(). — Fenix Lam
– Fenix Lam, Commented Oct 27, 2023 at 3:07
another alternative is to downgrade the langchain to 0.0.322, chromadb==0.3.29, keep install duckdb==0.71 — Fenix Lam
– Fenix Lam, Commented Oct 27, 2023 at 4:16

deepak walia · Accepted Answer · 2024-01-15 05:30:01Z

3

We need to add collection_name while saving/loading Chromadb.

save to disk

db2 = Chroma.from_documents(docs, embedding_function,  persist_directory="./chroma_db", collection_name='v_db')
db2.persist()
docs = db2.similarity_search(query)

load from disk

db3 = Chroma(collection_name='v_db', persist_directory="./chroma_db", embedding_function)
docs = db3.similarity_search(query)
print(docs[0].page_content)

edited Jan 15, 2024 at 5:30

answered Jan 15, 2024 at 5:22

deepak walia

414 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Per Feldvoss Olsen · Accepted Answer · 2024-03-31 11:28:18Z

1

It looks like the langchain dokumentation was wrong https://github.com/langchain-ai/langchain/issues/19807

You can change

from langchain_community.vectorstores import Chroma

to

from langchain_community.vectorstores.chroma import Chroma

answered Mar 31, 2024 at 11:28

Per Feldvoss Olsen

112 bronze badges

1 Comment

Satyaprakash Nayak Over a year ago

Thanks buddy. I was looking all over the internet for the RC. The moment I switched to this package it stated working and now it all makes sense.

hu zang · Accepted Answer · 2023-12-19 07:45:40Z

I got the problem too and found it is beacause my program ran chromadb in jupyter lab (or jupyter notebook which is the same).

In chromadb official git repo example, it says:

In a notebook, we should call persist() to ensure the embeddings are written to disk. This isn't necessary in a script - the database will be automatically persisted when the client object is destroyed.

So, If your program is also ran in jupyter env，the best way is to call client.persist() everytime when you need to save your modification to chromadb's local persistence. The example code is as follow:

import chromadb

client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet",
                                    persist_directory='./db'))
coll = client.get_or_create_collection("langchain", embedding_function=embeddings)

... # any modifications on chromadb, include create, upsert, delete...

client.persist() # save modifications above to chroma's local persistence

ListenSoftware Louise Ai Agent · Accepted Answer · 2024-07-09 23:11:39Z

0

if you use PersistentClient the collection will automatically be saved to the database on add or update or upsert

client = chromadb.PersistentClient("C:\\Users\me\\python_files\\python-deep-learning-master")

answered Jul 9, 2024 at 23:11

ListenSoftware Louise Ai Agent

4,3432 gold badges31 silver badges39 bronze badges

Comments

C L · Accepted Answer · 2025-04-17 16:40:26Z

0

your question is posted 18 months ago and I just meet the same trouble today. MAY BE you have already solved it, but I still write my solution down here:

when you create a chroma database with something like this:

persist_folder = "D:\\collection"
vector_db2 = Chroma.from_documents(
    documents=split_docs,
    embedding=embeddings,
    persist_directory=persist_folder,
    collection_name="latest_knowledge",
)

And you load it with something like this:

vector_db = Chroma(persist_directory="D:\\collection", 
    embedding_function=embeddings,
    collection_name="latest_knowledge")

LOADING FAILED

You have to use:

vector_db = Chroma(persist_directory=persist_folder , 
    embedding_function=embeddings,
    collection_name="latest_knowledge")

See? You have to reference the persist folder with exactly SAME WAY(in a string variable OR a hardcoded string) when you create the DB and load the DB.

I didn't read the source code of langchain, But I guess the trouble roots in some bugs when they handle the parameter "persist_directory".

Hope this may help you.

edited Apr 17 at 16:40

answered Apr 17 at 16:38

C L

11 bronze badge

1 Comment

ctek Sep 8 at 19:01

I don't have enough reputation points to upvote but this saved me a ton of trouble! I would never guess there could be such a bug! Thank you very much for sharing this knowledge!

Collectives™ on Stack Overflow

Cannot load persisted db using Chroma / Langchain

5 Answers 5

save to disk

load from disk

Comments

1 Comment

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

save to disk

load from disk

Comments

1 Comment

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related