1,775 questions
2
votes
1
answer
96
views
How to fix "AttributeError: 'Series' object has no attribute 'codes'" using pandas.Categorical
I am trying to convert a string that is a categorical data type into a numeric. I found out that I can use pandas.Categorical,
unfortunately, accessing the codes attribute give me an error.
Here is a ...
5
votes
5
answers
194
views
Assign a number for every matching value in list [duplicate]
I have a long list of items that I want to assign a number to that increases by one every time the value in the list changes. Basically I want to categorize the values in the list.
It can be assumed ...
0
votes
0
answers
32
views
The ML.NET wizard fails to build the model when the scenario is set to data classification, and the selected label column has a high cardinality
When a label column of type string with high cardinality (>500) is selected in the data classification scenario, the model fails to train even with extended training time. Are there any solutions ...
1
vote
1
answer
113
views
Can polars have a boolean in a 'with_columns' statement?
I am using polars to hash some columns in a data set. One column is contains lists of strings and the other column strings. My approach is to cast each column as type string and then hash the columns....
0
votes
0
answers
86
views
Pandas concat not retaining category dtype for large DataFrames
I am working with Pandas 1.5.3 and using pd.concat to merge two large DataFrames that contain categorical columns. Initially, everything works fine, but after running continuously at scale, the ...
-1
votes
1
answer
57
views
How to find the date a categorical variable was last active?
I have this data fram and I want to create an additional column that tells me the date the category was previously active.
DF <- data.frame(
Date = rep(c("10-12-2024", "10-17-2024&...
2
votes
1
answer
103
views
How to assess and analyze different clinical symptoms at the same time in R
I am currently working on a study, in which we aim to compare the post-surgical complications of patients who underwent a specific type of brain surgery. In one of our analyses, we would like to ...
2
votes
1
answer
85
views
Difference between factor and category in R
In the Hmisc::describe documentation (at page 76) there is written:
This function determines whether the variable is character, factor, category, binary, discrete numeric, and continuous numeric, and ...
1
vote
2
answers
620
views
How to approach classification with many categorical features
I'm new to ML and would like to know more about classification. I have a small dataset of n=600 scored samples and thousands of potential metrics, all categorical (True or False). Basically, I would ...
0
votes
1
answer
37
views
How to analyse nominal and categorical vairablescontaining more than one responses [duplicate]
I am new on to R Programming
I have tried several codes to analyse the data below in a way that each question will have their responses stacked on each other in a bar chat to no avail
Cookies ...
2
votes
1
answer
200
views
Finding confidence interval for response variables in ordinal logistic model
While working with private data, I noticed that the ordinal logistic model fitted using the polr function from the MASS package, along with the confidence intervals provided by broom::tidy, does not ...
0
votes
1
answer
117
views
How to change state border color in R Plotly choropleth map?
I have replicated the choropleth map for discrete colors in R using the method suggested in this link: How to create a chloropleth map in R Plotly based on a Categorical variable?
However, as you will ...
1
vote
1
answer
92
views
Why concatenation can't handle Nones in categorical columns when the DF can hold it in the first place
I have 2 DFs with object type columns, which work fine with concatenation.
Code
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', None]})
df2 = pd.DataFrame({'A': ['A4', 'A5'], 'B': [None, None]})
...
0
votes
1
answer
64
views
Errors in result when performing logistic regression with one dependent and one independent binary variable
I have a dataset, df, with one dependent variable with levels "0" and "1" and one independent variable with levels "1" and "2". On performing logistic ...
0
votes
3
answers
58
views
sorting python dataframe by values in a column based on a list
I have a pandas dataframe which I am trying to sort on the basis of values in a column, but the sorting is not alphabetical. The sorting is based on a "sorter" list (i.e. a list which gives ...
1
vote
0
answers
38
views
What if using Friedman test on dependent variable that is nominal data?
I have a group of 30 participants who did a test for three times. Each time, The independent variable was changed and the participants reported their answers (dependent variable). The independent ...
0
votes
0
answers
35
views
In boxplot, one of the categories (x-axis) doesn't show the range, it only shows median and outliers
I create a boxplot. I have two categorical variables, 'class' (ENVST 3210, ENVST 2050, and ENVST 555# - x-axis) and 'High Impact' (these are basically number of high impact learning experiences, viz, ...
0
votes
2
answers
175
views
Pandas categorical columns to factorize tables
I am working on a huge denormalized table on a SQL server (10 columns x 130m rows). Take this as data example :
import pandas as pd
import numpy as np
data = pd.DataFrame({
'status' : ['pending', ...
0
votes
1
answer
24
views
Package for category overlines on scatterplot in ggplot
I have data organised as in this example:
data1 <- tibble(seq = factor(1:20),
value = rnorm(20, 10, 2),
par_a = c(rep("S1", 6), rep("S2", 14)),
...
0
votes
0
answers
98
views
Excluding within-category interactions with step_interact()
I am having some trouble getting step_interact() from tidymodels to produce the desired set of predictor variables. I want to include pairwise interactions, but exclude all interactions which are ...
0
votes
1
answer
97
views
Upsampling categorical variable in time series data in R
I apologize if this is redundant, but I have tried to look for solutions, and have not found anything that appears to be the answer to my question. So, I have time series data for a bunch of variables....
0
votes
1
answer
104
views
How to make a line and dot chart to represent frequencies of categorical variables?
I would like to make a chart like in this picture instead of a barplot to represent frequencies of several categorical variables.
This is a snippet of my data for the variable of interest:
c("...
1
vote
1
answer
45
views
Error while creating a proportions table in R: Error in table(st2.affect) : attempt to make a table with \>= 2^31 elements
I got this error:
Error in table(st2.affect) : attempt to make a table with >= 2^31 elements
when I tried to use function (or any other proportions function) such as:
proportions(table(st2.affect)...
0
votes
1
answer
939
views
ValueError: When categorical type is supplied, DMatrix parameter `enable_categorical` must be set to `True` , XGBoost Regression
i'm trying to use categorical variable support of XGBoost. I'm following XGBoost's own documentation for categorical data. (linked here : https://xgboost.readthedocs.io/en/stable/tutorials/categorical....
1
vote
4
answers
596
views
dplyr mutate with conditional values AND OR to create a group category
I am having a dataset that has a variable called individuals with many options and it comes like that.
I have observations for a given Day on different individuals (Individual_ID)
The different ...