OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Why am I getting an error in HuggingFace.py for SentenceTransformerEmbeddings with ChromaDB?

  • Thread starter Thread starter Ken Tola
  • Start date Start date
K

Ken Tola

Guest
I am writing a Python program that imports JSON files into ChromaDB using Langchain with the following code:

Code:
chroma_db = Chroma(persist_directory=db_directory, collection_name=collection_name, embedding_function=embedding_function,
                           collection_metadata={"hnsw:space": "cosine"},
                           relevance_score_fn=lambda distance: 1.0 - distance / 2)
        docs = None
        try:
            text_splitter = RecursiveJsonSplitter(max_chunk_size=2000)
            docs = text_splitter.create_documents(json_object)
        except:
            docs = None
            logging.error("Failed to parse document with JSON - attempting regular text splitter")
            state["persistent_logs"].append("Failed to parse document with JSON - attempting regular text splitter")

        if docs is None:
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
            docs = text_splitter.create_documents(json_object)

        if doc_ids is None:
            doc_ids = [str(uuid.uuid4()) for i in range(1, len(docs) + 1)]
        else:
            # We look to see if the document exists:
            result = chroma_db.get(doc_ids)
            if result is not None and len(result) > 0:
                # This is an update:
                chroma_db.update_documents(doc_ids, docs)
                return doc_ids

        chroma_db.from_documents(docs, embedding_function, ids=doc_ids)

What I see in the logs is that the RecursiveJsonSplitter almost always fails - despite me manually ensuring that the JSON object is valid - and that the documents get entered using the RecursiveCharacterTextSplitter.

When I then attempt to obtain a similarity score using:

Code:
results = chroma_db.similarity_search_with_relevance_scores(query_object, k=1)

I get the following error:

Code:
File "/venv/lib/python3.12/site-packages/langchain_community/embeddings/huggingface.py", line 99, in <lambda>
    texts = list(map(lambda x: x.replace("\n", " "), texts))
                               ^^^^^^^^^
AttributeError: 'dict' object has no attribute 'replace'

I have been looking everywhere for a solution to this problem but, thus far, nothing has helped. I am using sentence-transformers, version 3.0.1.

Can somebody please help?
<p>I am writing a Python program that imports JSON files into ChromaDB using Langchain with the following code:</p>
<pre><code>chroma_db = Chroma(persist_directory=db_directory, collection_name=collection_name, embedding_function=embedding_function,
collection_metadata={"hnsw:space": "cosine"},
relevance_score_fn=lambda distance: 1.0 - distance / 2)
docs = None
try:
text_splitter = RecursiveJsonSplitter(max_chunk_size=2000)
docs = text_splitter.create_documents(json_object)
except:
docs = None
logging.error("Failed to parse document with JSON - attempting regular text splitter")
state["persistent_logs"].append("Failed to parse document with JSON - attempting regular text splitter")

if docs is None:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
docs = text_splitter.create_documents(json_object)

if doc_ids is None:
doc_ids = [str(uuid.uuid4()) for i in range(1, len(docs) + 1)]
else:
# We look to see if the document exists:
result = chroma_db.get(doc_ids)
if result is not None and len(result) > 0:
# This is an update:
chroma_db.update_documents(doc_ids, docs)
return doc_ids

chroma_db.from_documents(docs, embedding_function, ids=doc_ids)
</code></pre>
<p>What I see in the logs is that the RecursiveJsonSplitter almost always fails - despite me manually ensuring that the JSON object is valid - and that the documents get entered using the RecursiveCharacterTextSplitter.</p>
<p>When I then attempt to obtain a similarity score using:</p>
<pre><code>results = chroma_db.similarity_search_with_relevance_scores(query_object, k=1)
</code></pre>
<p>I get the following error:</p>
<pre><code>File "/venv/lib/python3.12/site-packages/langchain_community/embeddings/huggingface.py", line 99, in <lambda>
texts = list(map(lambda x: x.replace("\n", " "), texts))
^^^^^^^^^
AttributeError: 'dict' object has no attribute 'replace'
</code></pre>
<p>I have been looking everywhere for a solution to this problem but, thus far, nothing has helped. I am using sentence-transformers, version 3.0.1.</p>
<p>Can somebody please help?</p>
 

Latest posts

Top