637 questions
0
votes
0
answers
27
views
Does SGLang’s OpenAI-compatible API support async/await non-streaming calls?
I’m using SGLang’s OpenAI-compatible server (e.g., --port 30000, /v1/chat/completions) and calling it via the openai SDK with an async client:
from openai import AsyncOpenAI
client = AsyncOpenAI(...
0
votes
0
answers
47
views
How do I create a multitask GPyTorch model with a user-specified noise covariance matrix?
I've implemented standard homoskedastic multitask Gaussian process regression using GPyTorch as follows:
class MyModel(gpytorch.models.ExactGP):
def __init__(self, X, Y, likelihood):
super(...
0
votes
0
answers
33
views
Databricks group cluster fails to read CSV (TextFileFormatEdge$.disabled) while personal cluster works
I have a PySpark function that reads a reference CSV file inside a larger ETL pipeline.
On my personal Databricks cluster, this works fine. On the group cluster, it return empty dataframe, the same ...
0
votes
0
answers
102
views
Why `mul_mat` in ggml slower than llama.cpp?
I use the following command to compile an executable file for Android:
cmake \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-...
0
votes
0
answers
36
views
How to fit truncated distributions to truncated data? [duplicate]
I am familiar with the fitdistrplus package, which offers relevant tool for statistical inferences.
Meanwhile, I have trouble to understand what needs to be done when facing a sample that is left and/...
0
votes
0
answers
26
views
In .yaml files for tracking algorithm, what is the use of the EPOCH parameter in the TEST section?
i am currently studying tracking algorithm such as Seqtrack, ARTrack and ODTrack, for the configuration I have found this .yaml file in the experiments folder : https://github.com/microsoft/VideoX/...
1
vote
0
answers
110
views
Streaming write using ray's write_parquet for vllm inference
I need to do inference using vllm for large dataset, code structure as below:
ds = ray.data.read_parquet(my_input_path)
ds = input_data.map_batches(
VLLMPredictor,
concurrency=ray_concurrency,
...
0
votes
0
answers
75
views
Trueskill with teams in Infer.Net
I'm trying to build trueskill model with two teams and 5 players each with Infer.Net. However when inferring the skills the means of the distribution get way too big or small.
Below is code of my ...
0
votes
0
answers
35
views
How to perform model inference with delayed inputs while ensuring real-time performance?
I need to perform model inference using a deep learning model on a stream of data. However, the challenge I’m facing is that the inputs to the model might not arrive continuously, but rather with some ...
0
votes
1
answer
68
views
What is source of this error in time series inference model
Problem: I have created my encoder-decoder model to forecast time series. Model trains well, but I struggle with the error in the inference model and I dont know how to troubleshoot it:
WARNING:...
-2
votes
1
answer
624
views
Why do BF16 models have slower inference on Mac M-series chips compared to F16 models?
I read on https://github.com/huggingface/smollm/tree/main/smol_tools (mirror 1):
All models are quantized to 16-bit floating-point (F16) for efficient inference. Training was done on BF16, but in our ...
0
votes
2
answers
4k
views
ONNX Runtime Inference using GPU : libcublasLt.so.11 not found
Im trying to run inference using ONNX runtime on my server GPU. However im getting this error:
2024-08-10 23:53:29.404983674 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA]...
2
votes
1
answer
329
views
Saving Fine-tune Falcon HuggingFace LLM Model
I'm trying to save my model so it won't need to re-download the base model every time I want to use it but nothing seems to work for me, I would love your help with it.
The following parameters are ...
1
vote
0
answers
96
views
Why does type inference fail in this Java program?
Let's consider the following Java program:
import java.util.*;
import java.util.stream.Collectors;
public class Main {
record Foo(String id, List<Bar> bars) {}
record Bar(String id) {}...
1
vote
0
answers
465
views
How to Deploy a Hugging Face Transformers Model for Inference Using KServe (without KServe 0.13v)?
I'm working on deploying a pre-trained Hugging Face Transformer models for inference using KServe, but my Kubernetes environment does not support KServe 0.13v. I've researched the topic and found ...
0
votes
0
answers
28
views
How to Fix Issues in R Code for Metropolis-Hastings Algorithm Applied to Gumbel Type II Distribution?
I am trying to implement the Metropolis-Hastings (M-H) algorithm in R to sample from the posterior distribution of a Gumbel Type II distribution. However, I'm encountering issues with my ...
0
votes
1
answer
48
views
Sagemaker does not recognize training job to launch inference
I successfully launched a training job in sagemaker. However, when I try to use the model to run inference, sagemaker is unable to find the model.
import sagemaker
from sagemaker.transformer import ...
0
votes
1
answer
424
views
How to save a keras model just for inference?
I trained a CNN model and saved it as a .keras file. Now I want other people to use it for making predictions. I am planning on deploying it using a flask server and package the whole thing in an exe. ...
0
votes
0
answers
313
views
Pytorch ViT inference on GPU A100 is very slow
I'm using a GPU server which has 4 A100 chips.
I'm studying how to use ViT (in timm).
My local GPU is a GTX 1650 Super but it is faster than a A100.
The A100 takes almost 1 hour to finish the ...
-3
votes
2
answers
1k
views
Minesweeper AI - A problem with some kind of edge case of inferring knowledge about safe cell [closed]
I am doing a CS50’s Introduction to Artificial Intelligence with Python course and I enjoy it very much. When I run my script, it seems its all working well, but CS50 checker finds some kind of edge ...
1
vote
1
answer
216
views
SageMaker batchTransform MultiRecord error - Unable to parse data as JSON. Make sure the Content-Type header is set to "application/json"
I am trying to invoke sagemaker batch transform
Input file example.jsonl
{"number":"0060540745","brand_name":"XYZ","generic_keywords":"123"}
...
0
votes
1
answer
51
views
Yolov7 Weights Trained in Remote Server works there only
Weights in the server work in server only, when i download the weights and run it in my local pc then im noticing that it doesnt detect any object at all.
Commands used
python train.py --epochs 10 --...
0
votes
1
answer
199
views
Fuseki config.ttl file for inference using the TransitiveReasoner with TDB2
I am able to run Apache Jena Fuseki 4.6.1 under Windows 10 with no problems when using a config file that includes the following:
<#service1> rdf:type fuseki:Service ;
# . . .
fuseki:dataset &...
1
vote
1
answer
2k
views
llama-cpp-python Log printing on Ubuntu
I use llama-cpp-python to run LLMs locally on Ubuntu. While generating responses it prints its logs.
How to stop printing of logs??
I found a way to stop log printing for llama.cpp but not for llama-...
1
vote
0
answers
62
views
High Latency Issue with 4 GPUs on Mixtral 8x7B Model During Inference
I'm working with a machine that has four A100 GPUs, and I'm using them for inference on the Mixtral 8x7B model with text-generation-inference. Strangely, I've noticed that using all 4 GPUs increases ...