495 questions
0
votes
0
answers
26
views
Assistance with Data Processing Insurance Premiums
I have been set a task by my manager to try and predict insurance premiums based on some categories such as job description, number of people employed and turnover. I am comparing between K-Nearest ...
0
votes
1
answer
58
views
Multivalued column cannot be transformed
Im working with Stackoverflow 2024 survey. In the csv file there are several multivalued variables (separated by ;). I want to apply One-hot encoding to the variables Employment and LanguageAdmire by ...
0
votes
0
answers
15
views
Does Modifying an Attribute of a Custom Dataset Affect Both Subsets After random_split in PyTorch?
I am working on a binary classification task using an audio dataset, which is already divided into training and testing sets. However, I also need a validation set, so I split the training set into ...
0
votes
1
answer
47
views
Is there a way to set the data_min and the data_max in MinMaxScaler()?
I'm currently using MinMaxScaler() on my dataset. However, because my dataset is large I'm doing a first iteration pass in batches to compute the Min and Max Values for my Scaler. i'm using ...
0
votes
0
answers
17
views
How to combine columns with nested lists with each other using pandas? [duplicate]
I'm working on a padas DataFrame that contains columns with lists and currently trying the method explode, but I'm not getting the desired output, instead, it does a Cartesian Product, combining all ...
2
votes
0
answers
65
views
kernel died when I run : dataset = Dataset.from_dict(data_dict)
I am fine-tuning sam model for my dataset containing train_images and train_masks. I am able to create dict, but when calling last command i.e. to load dataset from dict, kernel dies. It happened ...
0
votes
1
answer
62
views
Varying embedding dim due to changing padding in batch size
I want to train a simple neural network, which has embedding_dim as a parameter:
class BoolQNN(nn.Module):
def __init__(self, embedding_dim):
super(BoolQNN, self).__init__()
self....
-1
votes
1
answer
189
views
Capitalized words in sentiment analysis
I'm currently working with data of customers reviews on products from Sephora. my task to classify them to sentiments : negative, neutral , positive .
A common technique of text preprocessing is to ...
1
vote
0
answers
23
views
how can I transform the categorical data entered by the user using Target Encoding?
When fitting the model in google collab there doesnt seem to be any problem. However, when I try to create an interface using streamlit and pickle, Target encoder doesnt work and I am unable to solve ...
0
votes
0
answers
52
views
How can I preprocess a feature that contains a list of number codes?
I have to preprocess a feature which is basically a list of number codes enocoded as a string, and I want to encode it such that the output is an array of frequencies of each of these numbers. The ...
1
vote
2
answers
682
views
How can I create a custom sigmoid function?
I am trying to build a custom sigmoid-shaped function because I want to scale my data during preprocessing. Basically, the goal is to obtain a sigmoid shaped function that outputs from 0 to 1 and only ...
1
vote
0
answers
85
views
How do I ensure unique non-overlapping values in each column?
I have the following input:
data = {
'Group_A': ['0&1', '1&5', '0&5', '1&7', '3&8', '4&8', '3&5', '4&4'],
'Group_B': ['1&0', '5&7', '0&5'...
0
votes
1
answer
838
views
SageMaker Processing Job permission denied to save csv file under /opt/ml/processing/<folder>
I am working on a project involving Step Functions with SageMaker. I have an existing Step Function that I need to integrate SageMaker into, and I tried adding steps such as processing, model training,...
-4
votes
1
answer
64
views
Is there an excel function to assign a binary result to a predefine data cell?
Sorry for the title, I know it might be pretty wide and not so much informative. I am facing a problem regarding the analysis of a data set. The participants of my experiments were randomly assigned ...
0
votes
1
answer
374
views
Filtering Pandas DataFrame by Substring Match at Start of Strings [duplicate]
Trying to filter out rows in which the data of specific column start with a given substring.
I have a pandas.DataFrame as shown below (simplified):
price
DRUG_CODE
123
A12D958
234
B564F3C
...
...
I'm ...
0
votes
1
answer
33
views
Sklearn Column Transformer not working for mixed data types
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder
from sklearn.pipeline import Pipeline
from sklearn.model_selection import ...
0
votes
0
answers
39
views
Failed to convert a NumPy array to a Tensor for LSTM
Trying to run an LSTM model where the data is separated into few columns in csv and i'm trying to prepare date from such csv's.
Getting the error of
ValueError: Failed to convert a NumPy array to a ...
1
vote
1
answer
2k
views
Why is my GPU not being used despite having turned it on in Kaggle?
I've uploaded a dataset on kaggle(approx. 73GB), and I'm trying to preprocess this data for model training purposes. This dataset has a large no. of missing values, which I am trying to interpolate ...
0
votes
1
answer
620
views
Issue when padding and packing sequences in LSTM networks using PyTorch
I'm trying to make a simple lstm neural network. I've got time series data which I am splitting into sequences and batches using Pytorch's Dataset and DataLoader. To account for the variable lengths ...
0
votes
0
answers
55
views
TypeError: Cannot do positional indexing on RangeIndex with these indexers of type DataFrame
I'm new with python so I'm sorry if this is a basic one. However, after I ran the code, I got this:
TypeError: cannot do positional indexing on RangeIndex with these indexers [ Year Average of PM ...
0
votes
1
answer
102
views
Feature Scaling with MinMaxScaler()
I have 31 features to be input into an ML algorithm. Of these 22 feature values are in the range of 0 to 1 already. The remaining 9 features vary between 0 to 750. My doubt is if I choose to apply ...
1
vote
1
answer
38
views
Using sklearn where the label a combination of multiple inputs [closed]
I'm performing data analysis on a dataset with categorical labels are interrelated.
My labels track experimental conditions.
In my case, labels track concentrations of combinations of two chemicals ...
0
votes
0
answers
95
views
Sklearn inverse_transformation does not work as expected, any alternatives?
from sklearn.preprocessing import MinMaxScaler
values = df[['Close']] #values is floats ranging from 0.06 to 190.08
sc = MinMaxScaler()
scaled_values = sc.fit_transform(values)
descaled_values = sc....
0
votes
0
answers
57
views
Is there a faster method to process pandas list of string values
There are 13000 values approximately for a given column. The below function works in a way that the input is a list of strings and does the NER tagging for each word in the list. On an average there ...
0
votes
0
answers
93
views
Worse performance with increased direct_num_workers when running preprocessing of DLRM with Apache Beam
I am now trying to run preprocessing tasks of DLRM with Apache Beam https://github.com/tensorflow/models/tree/master/official/recommendation/ranking/preprocessing. The dataset is Criteo Kaggle 10GB ...