Newest 'inference' Questions

0 votes

0 answers

27 views

Does SGLang’s OpenAI-compatible API support async/await non-streaming calls?

I’m using SGLang’s OpenAI-compatible server (e.g., --port 30000, /v1/chat/completions) and calling it via the openai SDK with an async client: from openai import AsyncOpenAI client = AsyncOpenAI(...

Erfan Mhi

95

asked Oct 28 at 15:06

0 votes

0 answers

47 views

How do I create a multitask GPyTorch model with a user-specified noise covariance matrix?

I've implemented standard homoskedastic multitask Gaussian process regression using GPyTorch as follows: class MyModel(gpytorch.models.ExactGP): def __init__(self, X, Y, likelihood): super(...

SirAndy3000

1

asked Oct 13 at 0:14

0 votes

0 answers

33 views

Databricks group cluster fails to read CSV (TextFileFormatEdge$.disabled) while personal cluster works

I have a PySpark function that reads a reference CSV file inside a larger ETL pipeline. On my personal Databricks cluster, this works fine. On the group cluster, it return empty dataframe, the same ...

Codie

1

asked Sep 25 at 14:32

0 votes

0 answers

102 views

Why `mul_mat` in ggml slower than llama.cpp?

I use the following command to compile an executable file for Android: cmake \ -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -...

XUHAO77

11

asked May 13 at 6:49

0 votes

0 answers

36 views

How to fit truncated distributions to truncated data? [duplicate]

I am familiar with the fitdistrplus package, which offers relevant tool for statistical inferences. Meanwhile, I have trouble to understand what needs to be done when facing a sample that is left and/...

yeahman269

779

asked Apr 11 at 7:23

0 votes

0 answers

26 views

In .yaml files for tracking algorithm, what is the use of the EPOCH parameter in the TEST section?

i am currently studying tracking algorithm such as Seqtrack, ARTrack and ODTrack, for the configuration I have found this .yaml file in the experiments folder : https://github.com/microsoft/VideoX/...

Chloé c

1

asked Mar 21 at 16:56

1 vote

0 answers

110 views

Streaming write using ray's write_parquet for vllm inference

I need to do inference using vllm for large dataset, code structure as below: ds = ray.data.read_parquet(my_input_path) ds = input_data.map_batches( VLLMPredictor, concurrency=ray_concurrency, ...

cnmdestroyer

21

asked Mar 12 at 21:09

0 votes

0 answers

75 views

Trueskill with teams in Infer.Net

I'm trying to build trueskill model with two teams and 5 players each with Infer.Net. However when inferring the skills the means of the distribution get way too big or small. Below is code of my ...

Ranersss

1

asked Feb 18 at 11:32

0 votes

0 answers

35 views

How to perform model inference with delayed inputs while ensuring real-time performance?

I need to perform model inference using a deep learning model on a stream of data. However, the challenge I’m facing is that the inputs to the model might not arrive continuously, but rather with some ...

conmeobeo

1

asked Dec 21, 2024 at 12:33

0 votes

1 answer

68 views

What is source of this error in time series inference model

Problem: I have created my encoder-decoder model to forecast time series. Model trains well, but I struggle with the error in the inference model and I dont know how to troubleshoot it: WARNING:...

Art

11

asked Nov 19, 2024 at 23:15

-2 votes

1 answer

624 views

Why do BF16 models have slower inference on Mac M-series chips compared to F16 models?

I read on https://github.com/huggingface/smollm/tree/main/smol_tools (mirror 1): All models are quantized to 16-bit floating-point (F16) for efficient inference. Training was done on BF16, but in our ...

Franck Dernoncourt

84.7k

asked Nov 7, 2024 at 17:32

0 votes

2 answers

4k views

ONNX Runtime Inference using GPU : libcublasLt.so.11 not found

Im trying to run inference using ONNX runtime on my server GPU. However im getting this error: 2024-08-10 23:53:29.404983674 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA]...

Mhmdfad

11

asked Aug 10, 2024 at 21:06

2 votes

1 answer

329 views

Saving Fine-tune Falcon HuggingFace LLM Model

I'm trying to save my model so it won't need to re-download the base model every time I want to use it but nothing seems to work for me, I would love your help with it. The following parameters are ...

Lidor Eliyahu Shelef

1,362

asked Jul 15, 2024 at 14:20

1 vote

0 answers

96 views

Why does type inference fail in this Java program?

Let's consider the following Java program: import java.util.*; import java.util.stream.Collectors; public class Main { record Foo(String id, List<Bar> bars) {} record Bar(String id) {}...

Robin Dos Anjos

369

asked Jun 26, 2024 at 22:15

1 vote

0 answers

465 views

How to Deploy a Hugging Face Transformers Model for Inference Using KServe (without KServe 0.13v)?

I'm working on deploying a pre-trained Hugging Face Transformer models for inference using KServe, but my Kubernetes environment does not support KServe 0.13v. I've researched the topic and found ...

Reehan

11

asked Jun 22, 2024 at 6:20

0 votes

0 answers

28 views

How to Fix Issues in R Code for Metropolis-Hastings Algorithm Applied to Gumbel Type II Distribution?

I am trying to implement the Metropolis-Hastings (M-H) algorithm in R to sample from the posterior distribution of a Gumbel Type II distribution. However, I'm encountering issues with my ...

Carlos Souto Dos Santos Filho

1

asked Jun 7, 2024 at 12:02

0 votes

1 answer

48 views

Sagemaker does not recognize training job to launch inference

I successfully launched a training job in sagemaker. However, when I try to use the model to run inference, sagemaker is unable to find the model. import sagemaker from sagemaker.transformer import ...

Cyrus Mohammadian

5,213

asked May 17, 2024 at 20:29

0 votes

1 answer

424 views

How to save a keras model just for inference?

I trained a CNN model and saved it as a .keras file. Now I want other people to use it for making predictions. I am planning on deploying it using a flask server and package the whole thing in an exe. ...

CuriousRabbit

9

asked Apr 29, 2024 at 16:16

0 votes

0 answers

313 views

Pytorch ViT inference on GPU A100 is very slow

I'm using a GPU server which has 4 A100 chips. I'm studying how to use ViT (in timm). My local GPU is a GTX 1650 Super but it is faster than a A100. The A100 takes almost 1 hour to finish the ...

Hyeongjun Cho

1

asked Apr 16, 2024 at 8:23

-3 votes

2 answers

1k views

Minesweeper AI - A problem with some kind of edge case of inferring knowledge about safe cell [closed]

I am doing a CS50’s Introduction to Artificial Intelligence with Python course and I enjoy it very much. When I run my script, it seems its all working well, but CS50 checker finds some kind of edge ...

Maciej Zamojski

13

asked Apr 6, 2024 at 8:08

1 vote

1 answer

216 views

SageMaker batchTransform MultiRecord error - Unable to parse data as JSON. Make sure the Content-Type header is set to "application/json"

I am trying to invoke sagemaker batch transform Input file example.jsonl {"number":"0060540745","brand_name":"XYZ","generic_keywords":"123"} ...

Jeya Kumar

1,112

asked Mar 23, 2024 at 6:45

0 votes

1 answer

51 views

Yolov7 Weights Trained in Remote Server works there only

Weights in the server work in server only, when i download the weights and run it in my local pc then im noticing that it doesnt detect any object at all. Commands used python train.py --epochs 10 --...

Aditya Kushal

11

asked Mar 19, 2024 at 10:52

0 votes

1 answer

199 views

Fuseki config.ttl file for inference using the TransitiveReasoner with TDB2

I am able to run Apache Jena Fuseki 4.6.1 under Windows 10 with no problems when using a config file that includes the following: <#service1> rdf:type fuseki:Service ; # . . . fuseki:dataset &...

Ted

11

asked Mar 4, 2024 at 16:31

1 vote

1 answer

2k views

llama-cpp-python Log printing on Ubuntu

I use llama-cpp-python to run LLMs locally on Ubuntu. While generating responses it prints its logs. How to stop printing of logs?? I found a way to stop log printing for llama.cpp but not for llama-...

San Vik

11

asked Jan 29, 2024 at 3:22

1 vote

0 answers

62 views

High Latency Issue with 4 GPUs on Mixtral 8x7B Model During Inference

I'm working with a machine that has four A100 GPUs, and I'm using them for inference on the Mixtral 8x7B model with text-generation-inference. Strangely, I've noticed that using all 4 GPUs increases ...

doNothing

11

asked Jan 25, 2024 at 13:15

Collectives™ on Stack Overflow

Does SGLang’s OpenAI-compatible API support async/await non-streaming calls?

How do I create a multitask GPyTorch model with a user-specified noise covariance matrix?

Databricks group cluster fails to read CSV (TextFileFormatEdge$.disabled) while personal cluster works

Why `mul_mat` in ggml slower than llama.cpp?

How to fit truncated distributions to truncated data? [duplicate]

In .yaml files for tracking algorithm, what is the use of the EPOCH parameter in the TEST section?

Streaming write using ray's write_parquet for vllm inference

Trueskill with teams in Infer.Net

How to perform model inference with delayed inputs while ensuring real-time performance?

What is source of this error in time series inference model

Why do BF16 models have slower inference on Mac M-series chips compared to F16 models?

ONNX Runtime Inference using GPU : libcublasLt.so.11 not found

Saving Fine-tune Falcon HuggingFace LLM Model

Why does type inference fail in this Java program?

How to Deploy a Hugging Face Transformers Model for Inference Using KServe (without KServe 0.13v)?

How to Fix Issues in R Code for Metropolis-Hastings Algorithm Applied to Gumbel Type II Distribution?

Sagemaker does not recognize training job to launch inference

How to save a keras model just for inference?

Pytorch ViT inference on GPU A100 is very slow

Minesweeper AI - A problem with some kind of edge case of inferring knowledge about safe cell [closed]

SageMaker batchTransform MultiRecord error - Unable to parse data as JSON. Make sure the Content-Type header is set to "application/json"

Yolov7 Weights Trained in Remote Server works there only

Fuseki config.ttl file for inference using the TransitiveReasoner with TDB2

llama-cpp-python Log printing on Ubuntu

High Latency Issue with 4 GPUs on Mixtral 8x7B Model During Inference

Hot Network Questions