Newest 'llama-cpp-python' Questions

0 votes

1 answer

324 views

How to properly install llama-cpp-python on windows 11 with GPU support

I have been trying to install llama-cpp-python for windows 11 with GPU support for a while, and it just doesn't work no matter how I try. I installed the necessary visual studio toolkit packages, ...

MiszS

11

asked Oct 4 at 22:29

1 vote

0 answers

187 views

llama-cpp-python installing for x86_64 instead of arm64

I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python. Even when I run CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \ ...

Dennis Losett

11

asked Jun 22 at 16:45

0 votes

0 answers

99 views

llama-cpp and transformers with pyinstaller in creation of .exe file

I am attempting to bundle a rag agent into a .exe. However on usage of the .exe i keep running into the same two problems. The first initial problem is with locating llama-cpp, which i have fixed. The ...

Arnab Mandal

1

asked May 23 at 6:57

0 votes

0 answers

90 views

Generating an n-gram dataset based on an LLM

I want a dataset of common n-grams and their log likelihoods. Normally I would download the Google Books Ngram Exports, but I wonder if I can generate a better dataset using a large language model. ...

evashort

1

asked Feb 9 at 14:21

0 votes

0 answers

258 views

Why Does Running LLaMA 13B Model with llama_cpp on CPU Take Excessive Time and Produce Poor Outputs?

I'm experiencing significant performance and output quality issues when running the LLaMA 13B model using the llama_cpp library on my laptop. The same setup works efficiently with the LLaMA 7B model. ...

Farzand Ali

3

asked Dec 13, 2024 at 16:45

2 votes

1 answer

1k views

How to make a llm remember previous runtime chats

I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...

QUARKS

29

asked Oct 26, 2024 at 18:02

0 votes

0 answers

180 views

Unable to set top_k value in Llama cpp Python server

I start llama cpp Python server with the command: python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary Then I run my Python script which ...

Jengi829

1

asked Aug 29, 2024 at 15:13

0 votes

2 answers

856 views

How do I stream output as it is being generated by an LLM in Streamlit?

code: from langchain_community.vectorstores import FAISS from langchain_community.embeddings import HuggingFaceEmbeddings from langchain import PromptTemplate from langchain_community.llms import ...

Ashish Sawant

1

asked Jul 26, 2024 at 5:27

0 votes

1 answer

596 views

Does langchain with llama-cpp-python fail to work with very long prompts?

I'm trying to create a service using the llama3-70b model by combining langchain and llama-cpp-python on a server workstation. While the model works well with short prompts(question1, question2), it ...

bibiibibin

1

asked Jul 18, 2024 at 15:39

0 votes

1 answer

642 views

Unable for sending multiple input using Llama CPP and Llama-index

I am using Mistral 77b-instruct model with llama-index and load the model using llamacpp, and when I am trying to run multiple inputs or prompts ( open 2 website and send 2 prompts) , and it give me ...

HelloALive

1

asked May 17, 2024 at 5:50

0 votes

1 answer

244 views

(Windows) Setting environment variables with spaces in text

I am trying to install llama-cpp-python on Windows 11. I have installed and set up the CMAKE_ARGS environment variable to point to the MinGW gcc.exe and g++.exe to compile C and C++, but am struggling ...

Leo Turoff

1

asked Apr 26, 2024 at 22:59

1 vote

0 answers

866 views

How can I set just give main answer from llama-3-8B-Instruct and not talk to itself?

I want to use llama-3 with llama-cpp-python and get main answer for user questions like llama-2. But answers generated by llama-3 not main answer like llama-2: Output: Hey! 👋 What can I help you ...

Dalipboy M

11

asked Apr 22, 2024 at 6:45

3 votes

1 answer

4k views

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

I'm reaching out to the community for some assistance with an issue I'm encountering in llama.cpp. Previously, the program was successfully utilizing the GPU for execution. However, recently, it seems ...

Montassar Jaziri

31

asked Apr 18, 2024 at 8:26

1 vote

0 answers

65 views

Chat model provides answers without source docs

I created embeddings for only one document so far. But when I ask questions which might are in the context but are definitely not part of this single document I would expect an answer like "I ...

m1ch4

51

asked Apr 10, 2024 at 15:31

0 votes

0 answers

842 views

llama-cpp-python with metal acceleration on Apple silicon failing

I am following the instructions from the official documentation on how to install llama-cpp with GPU support in Apple silicon Mac. Here is my Dockerfile: FROM python:3.11-slim WORKDIR /code RUN pip ...

Kristada673

3,764

asked Mar 28, 2024 at 11:43

1 vote

2 answers

7k views

Failed to install llama-cpp-python with Metal on M2 Ultra

I followed the instruction on https://llama-cpp-python.readthedocs.io/en/latest/install/macos/. My macOS version is Sonoma 14.4, and xcode-select is already installed (version: 15.3.0.0.1.1708646388). ...

ooyeon

81

asked Mar 19, 2024 at 9:03

0 votes

2 answers

4k views

Loading embedding model form Hugging Face in Llama Index throws up an attribute error

I am trying to load embeddings like this.I changed the code to reflect the current version change in LlamaIndex but it shows up an attribute error. from llama_index.embeddings.huggingface import ...

Rahul_51

1

asked Feb 20, 2024 at 3:15

1 vote

0 answers

340 views

Langserve Streaming with Llamacpp

I have built a RAG app with Llamacpp and Langserve and it generally works. However I can't find a way to stream my responses, which would be very important for the application. Here is my code: from ...

Maxl Gemeinderat

565

asked Feb 1, 2024 at 13:49

0 votes

2 answers

751 views

TypeError in Python 3.11 when Using BasicModelRunner from llama-cpp-python

I'm currently taking the DeepAI's Finetuning Coursera course and encountered a bug while trying to run one of their demonstrations locally in a Jupyter notebook. Environment: Python version: 3.11 ...

Hofbr

1,020

asked Jan 27, 2024 at 21:47

1 vote

3 answers

6k views

Enable GPU for Python programming with VS Code on Windows 10 (llama-cpp-python)

I struggled alot while enabling GPU on my 32GB Windows 10 machine with 4GB Nvidia P100 GPU during Python programming. My LLMs did not use the GPU of my machine while inferencing. After spending few ...

Umaima Tinwala

11

asked Jan 17, 2024 at 6:03

0 votes

2 answers

2k views

CMAKE in requirements.txt file: Install llama-cpp-python for Mac

I have put my application into a Docker and therefore I have created a requirements.txt file. Now I need to install llama-cpp-python for Mac, as I am loading my LLM with from langchain.llms import ...

Maxl Gemeinderat

565

asked Jan 4, 2024 at 12:28

1 vote

0 answers

1k views

llama-cpp-python on GPU: Delay between prompt submission and first token generation with longer prompts

I've been building a RAG pipeline using the llama-cpp-python OpenAI compatible server functionality and have been working my way up from running on just a laptop to running this on a dedicated ...

jhthompson12

69

asked Dec 26, 2023 at 19:19

1 vote

0 answers

347 views

Llama-2, Q4-Quantized model's response time on different CPUs

I am running quantized llama-2 model from here. I am using 2 different machines. 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz 2.80 GHz 16.0 GB (15.8 GB usable) Inference time on this machine is ...

Muhammad Burhan

48

asked Nov 29, 2023 at 11:56

4 votes

1 answer

3k views

How can I install llama-cpp-python with cuBLAS using poetry?

I can install llama cpp with cuBLAS using pip as below: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python However, I don't know how to install it with cuBLAS when ...

KimuGenie

41

asked Nov 23, 2023 at 2:43

1 vote

1 answer

1k views

PandasQueryEngine from llama-index is unable to execute code with the following error: invalid syntax (, line 0)

I have the following code. I am trying to use the local llama2-chat-13B model. The instructions appear to be good but the final output is erroring out. import logging import sys from IPython.display ...

Birender Singh

11

asked Nov 8, 2023 at 12:57

Collectives™ on Stack Overflow

How to properly install llama-cpp-python on windows 11 with GPU support

llama-cpp-python installing for x86_64 instead of arm64

llama-cpp and transformers with pyinstaller in creation of .exe file

Generating an n-gram dataset based on an LLM

Why Does Running LLaMA 13B Model with llama_cpp on CPU Take Excessive Time and Produce Poor Outputs?

How to make a llm remember previous runtime chats

Unable to set top_k value in Llama cpp Python server

How do I stream output as it is being generated by an LLM in Streamlit?

Does langchain with llama-cpp-python fail to work with very long prompts?

Unable for sending multiple input using Llama CPP and Llama-index

(Windows) Setting environment variables with spaces in text

How can I set just give main answer from llama-3-8B-Instruct and not talk to itself?

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

Chat model provides answers without source docs

llama-cpp-python with metal acceleration on Apple silicon failing

Failed to install llama-cpp-python with Metal on M2 Ultra

Loading embedding model form Hugging Face in Llama Index throws up an attribute error

Langserve Streaming with Llamacpp

TypeError in Python 3.11 when Using BasicModelRunner from llama-cpp-python

Enable GPU for Python programming with VS Code on Windows 10 (llama-cpp-python)

CMAKE in requirements.txt file: Install llama-cpp-python for Mac

llama-cpp-python on GPU: Delay between prompt submission and first token generation with longer prompts

Llama-2, Q4-Quantized model's response time on different CPUs

How can I install llama-cpp-python with cuBLAS using poetry?

PandasQueryEngine from llama-index is unable to execute code with the following error: invalid syntax (, line 0)

Hot Network Questions