61 questions
0
votes
1
answer
324
views
How to properly install llama-cpp-python on windows 11 with GPU support
I have been trying to install llama-cpp-python for windows 11 with GPU support for a while, and it just doesn't work no matter how I try. I installed the necessary visual studio toolkit packages, ...
1
vote
0
answers
187
views
llama-cpp-python installing for x86_64 instead of arm64
I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python.
Even when I run
CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \
...
0
votes
0
answers
99
views
llama-cpp and transformers with pyinstaller in creation of .exe file
I am attempting to bundle a rag agent into a .exe.
However on usage of the .exe i keep running into the same two problems.
The first initial problem is with locating llama-cpp, which i have fixed.
The ...
0
votes
0
answers
90
views
Generating an n-gram dataset based on an LLM
I want a dataset of common n-grams and their log likelihoods. Normally I would download the Google Books Ngram Exports, but I wonder if I can generate a better dataset using a large language model. ...
0
votes
0
answers
258
views
Why Does Running LLaMA 13B Model with llama_cpp on CPU Take Excessive Time and Produce Poor Outputs?
I'm experiencing significant performance and output quality issues when running the LLaMA 13B model using the llama_cpp library on my laptop. The same setup works efficiently with the LLaMA 7B model. ...
2
votes
1
answer
1k
views
How to make a llm remember previous runtime chats
I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...
0
votes
0
answers
180
views
Unable to set top_k value in Llama cpp Python server
I start llama cpp Python server with the command:
python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary
Then I run my Python script which ...
0
votes
2
answers
856
views
How do I stream output as it is being generated by an LLM in Streamlit?
code:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain import PromptTemplate
from langchain_community.llms import ...
0
votes
1
answer
596
views
Does langchain with llama-cpp-python fail to work with very long prompts?
I'm trying to create a service using the llama3-70b model by combining langchain and llama-cpp-python on a server workstation. While the model works well with short prompts(question1, question2), it ...
0
votes
1
answer
642
views
Unable for sending multiple input using Llama CPP and Llama-index
I am using Mistral 77b-instruct model with llama-index and load the model using llamacpp, and when I am trying to run multiple inputs or prompts ( open 2 website and send 2 prompts) , and it give me ...
0
votes
1
answer
244
views
(Windows) Setting environment variables with spaces in text
I am trying to install llama-cpp-python on Windows 11. I have installed and set up the CMAKE_ARGS environment variable to point to the MinGW gcc.exe and g++.exe to compile C and C++, but am struggling ...
1
vote
0
answers
866
views
How can I set just give main answer from llama-3-8B-Instruct and not talk to itself?
I want to use llama-3 with llama-cpp-python and get main answer for user questions like llama-2.
But answers generated by llama-3 not main answer like llama-2:
Output: Hey! 👋 What can I help you ...
3
votes
1
answer
4k
views
Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU
I'm reaching out to the community for some assistance with an issue I'm encountering in llama.cpp. Previously, the program was successfully utilizing the GPU for execution. However, recently, it seems ...
1
vote
0
answers
65
views
Chat model provides answers without source docs
I created embeddings for only one document so far. But when I ask questions which might are in the context but are definitely not part of this single document I would expect an answer like "I ...
0
votes
0
answers
842
views
llama-cpp-python with metal acceleration on Apple silicon failing
I am following the instructions from the official documentation on how to install llama-cpp with GPU support in Apple silicon Mac.
Here is my Dockerfile:
FROM python:3.11-slim
WORKDIR /code
RUN pip ...
1
vote
2
answers
7k
views
Failed to install llama-cpp-python with Metal on M2 Ultra
I followed the instruction on https://llama-cpp-python.readthedocs.io/en/latest/install/macos/.
My macOS version is Sonoma 14.4, and xcode-select is already installed (version: 15.3.0.0.1.1708646388).
...
0
votes
2
answers
4k
views
Loading embedding model form Hugging Face in Llama Index throws up an attribute error
I am trying to load embeddings like this.I changed the code to reflect the current version change in LlamaIndex but it shows up an attribute error.
from llama_index.embeddings.huggingface import ...
1
vote
0
answers
340
views
Langserve Streaming with Llamacpp
I have built a RAG app with Llamacpp and Langserve and it generally works. However I can't find a way to stream my responses, which would be very important for the application. Here is my code:
from ...
0
votes
2
answers
751
views
TypeError in Python 3.11 when Using BasicModelRunner from llama-cpp-python
I'm currently taking the DeepAI's Finetuning Coursera course and encountered a bug while trying to run one of their demonstrations locally in a Jupyter notebook.
Environment:
Python version: 3.11
...
1
vote
3
answers
6k
views
Enable GPU for Python programming with VS Code on Windows 10 (llama-cpp-python)
I struggled alot while enabling GPU on my 32GB Windows 10 machine with 4GB Nvidia P100 GPU during Python programming. My LLMs did not use the GPU of my machine while inferencing. After spending few ...
0
votes
2
answers
2k
views
CMAKE in requirements.txt file: Install llama-cpp-python for Mac
I have put my application into a Docker and therefore I have created a requirements.txt file. Now I need to install llama-cpp-python for Mac, as I am loading my LLM with from langchain.llms import ...
1
vote
0
answers
1k
views
llama-cpp-python on GPU: Delay between prompt submission and first token generation with longer prompts
I've been building a RAG pipeline using the llama-cpp-python OpenAI compatible server functionality and have been working my way up from running on just a laptop to running this on a dedicated ...
1
vote
0
answers
347
views
Llama-2, Q4-Quantized model's response time on different CPUs
I am running quantized llama-2 model from here. I am using 2 different machines.
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz 2.80 GHz
16.0 GB (15.8 GB usable)
Inference time on this machine is ...
4
votes
1
answer
3k
views
How can I install llama-cpp-python with cuBLAS using poetry?
I can install llama cpp with cuBLAS using pip as below:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
However, I don't know how to install it with cuBLAS when ...
1
vote
1
answer
1k
views
PandasQueryEngine from llama-index is unable to execute code with the following error: invalid syntax (, line 0)
I have the following code. I am trying to use the local llama2-chat-13B model. The instructions appear to be good but the final output is erroring out.
import logging
import sys
from IPython.display ...