148 questions
Best practices
0
votes
1
replies
103
views
Regarding rag for telephony with deepgram
I'm building a voice-based calling system where users can create AI agents that make outbound phone calls.
The agent uses Deepgram for real-time transcription and ElevenLabs/Cartesia for speech ...
0
votes
0
answers
22
views
How to exclude metadata from embedding?
I'm using LlamaIndex 0.14.7. I would like to embed document text without concatenating metadata, because I put a long text in metadata. Here's my code:
table_vec_store: SimpleVectorStore = ...
1
vote
1
answer
115
views
Why does answer_relevancy return NaN when evaluating RAG with Ragas?
I’m trying to evaluate my Retrieval-Augmented Generation (RAG) pipeline using Ragas.
.
Here’s a complete version of my code:
"""# RAG Evaluation"""
from datasets import ...
1
vote
0
answers
50
views
Why does my LangChain RAG chatbot sometimes miss relevant chunks in semantic search?
I built a RAG chatbot using LangChain + ChromaDB + OpenAI embeddings. The pipeline works, but sometimes the chatbot doesn’t return the most relevant PDF content, even though it exists in the vector DB....
0
votes
0
answers
24
views
RAG Pipeline Memory Leak - Vector Embeddings Not Releasing After Context Switch in Memo AI
Question:
I'm building a memory-augmented AI system using RAG with persistent vector storage, but facing memory leaks and context contamination between sessions.
Problem:
Vector embeddings aren't ...
0
votes
0
answers
44
views
Why does LanceDB's full-text-search fail to find matches where the exact text is present?
I am trying to use lancedb to perform FTS, but getting spurious results.
Here is a minimal example:
# Data generation
import lancedb
import polars as pl
from string import ascii_lowercase
words = [...
0
votes
0
answers
54
views
How to send extra headers from RAGFlow Agent to a Spring Boot MCP server tool call?
I am using RAGFlow
connected to a Spring Boot MCP server.
My agent flow is simple:
Begin node → collects inputs (auth_token, tenant_id, x_request_status)
Agent (gpt-4o) → connected to
MCP Tool (server)...
0
votes
0
answers
75
views
How to accelerate my corpus embedding to the chromadb
I have the corpus.jsonl which has 6.5gb storage.And i use the one h100 gpu to embedding the corpus to the chromadb,but it seems very slowly.I want to find how can i accelerate the progress(gpu,cpu,io)....
2
votes
1
answer
164
views
Why is FAISS document retrieval slow and inconsistent on EC2 t3.micro instance?
I'm building a document Q&A system using FAISS for vector search on an AWS EC2 t3.micro instance (1 vCPU, 1GB RAM). My FAISS index is relatively small (8.4MB .faiss + 1.4MB .pkl files), but I'm ...
0
votes
1
answer
118
views
How do I prevent duplicate messages in context window, when using rag and memory?
When using rag and memory, multiple identical copies of the same information is sent to the ai, when asking related questions.
I have
import java.util.ArrayList;
import java.util.List;
import dev....
-1
votes
1
answer
250
views
ImportError: cannot import name 'Client' from 'pinecone' (unknown location)
The problem with this piece of code is that I am unable to import Client from the pinecone library. I tried to uninstalling and reinstalling different versions none of them worked. I also tried it ...
1
vote
0
answers
185
views
How to handle follow-up confirmations in Spring AI 1.0.0 without losing context during tool selection using RAG?
I'm building a web application using Spring Boot 3.4.5 and Spring AI 1.0.0 with Llama3.2(Ollama) model integration. I've implemented tool calling, and because I have many tools in the application, I'm ...
0
votes
1
answer
558
views
AttributeError: 'LlmAgent' object has no attribute 'invoke'
I am trying to call Flask API which i alrady running on port 5000 on my system, i am desgning a agentic AI code which will invoke GET and then POSt based on some condition , and using google-adk. I ...
1
vote
0
answers
55
views
Scaling RAG QA with Large Docs, Tables, and 30K+ Chunks (No LangChain)
I'm building a RAG-based document QA system using Python (no LangChain), LLaMA (50K context), PostgreSQL with pgvector, and Docling for parsing. Users can upload up to 10 large documents (300+ pages ...
0
votes
0
answers
56
views
Using llama-index with the deployed LLM
I wanted to make a web app that uses llama-index to answer queries using RAG from specific documents. I have locally set up Llama3.2-1B-instruct llm and using that locally to create indexes of the ...
0
votes
0
answers
169
views
Llamaindex returns "Empty Response"
I have a RAG system using llamaindex. I am upgrading library from 0.10.44 to 0.12.33.
I see a different behaviour now.
Before when there were not results from vectors store it seems it called the LLM ...
1
vote
1
answer
134
views
Embedding model `all-mpnet-base-v2` not able to classify user prompt properly
I am using this model to embed a product catalog for a rag. In the product catalog, there are no red shirts for men, but there are red shirts for women. How can I make sure the model doesnt output ...
0
votes
2
answers
74
views
SitemapLoader(sitemap_url).load() hangs
from langchain_community.document_loaders import SitemapLoader
def crawl(self):
print("Starting crawler...")
sitemap_url = "https://gringo.co.il/sitemap.xml"
...
1
vote
0
answers
44
views
how to deal with evolving information in RAG?
I'm trying to index a series of articles to use in a RAG knowledge base, I cannot find any best practice or recommendation documented about dealing with information that changes or evolves in time.
...
0
votes
0
answers
70
views
Firebase Genkit onCall function provide context from client app
I'm following along with the Firebase Genkit docs covering context. From reading the docs it seems as though I should be able to pass context to the flow from where I call the function in my client ...
-4
votes
1
answer
141
views
I want to know where to locate the file I upload though the ragflow system, how to find it in the windows system
I use Ollama and RagFlow to manage my own knowledge files, I upload some files to a knowledge,and they works well in the system. I start the ragflow with docker commands.
Who can help me to find the ...
1
vote
1
answer
257
views
RegexTextSplitter does not exist in langchain_text_splitters?
Trying to import RegexTextSplitter using
from langchain.text_splitter import RegexTextSplitter ,RecursiveCharacterTextSplitter
And I get the error
from langchain.text_splitter import RegexTextSplitter ...
0
votes
1
answer
223
views
What to include in context precision/recall for RAG LLM evaluation
I am doing evaluation for my RAG LLM application using ragas. I have the prompt instruction to describe some rules, the retrieved content from my retriever, and chat history together for the LLM to do ...
0
votes
0
answers
40
views
I have a pipeline for the RAG application for the csv query search
from llama_index.core.query_pipeline import (
QueryPipeline as QP,
Link,
InputComponent,
)
from llama_index.experimental.query_engine.pandas import (
PandasInstructionParser,
)
from ...
1
vote
1
answer
516
views
Python request LM Studio Model failed but Curl successful
I tried to request local model by using Python with below code,
import requests
import json
url = 'http://localhost:1234/v1/chat/completions'
headers = {
'Content-Type': 'application/json'
}
...