13,846 questions
4
votes
5
answers
366
views
How to compute the union and intersection of time intervals with covariates by group id?
I am looking for an efficient way to compute the union and intersection of time intervals (start–stop format) by group (id), while keeping the covariates associated with each interval.
Each patient (...
1
vote
1
answer
111
views
Loading R6 objects that have data.table fields
This is very closely related to: Adding new columns to a data.table by-reference within a function not always working
How do you make setalloccol work on data tables (data.table_1.16.0) that are a ...
0
votes
0
answers
60
views
How to select rows of a data.table using a vector when vector name and column name are the same? [duplicate]
I have a data.table with a column name that matches a vector with the same name:
library(data.table)
dt <- data.table("colA" = c(1:5), "colB" = LETTERS[1:5])
colA = c(1,3)
I ...
3
votes
4
answers
237
views
Fast unnest complex column with data.table
I have a dataset where the column to unnest contains data with unequal rows and columns rather than data with equal dimensions. I'm looking for a fast approach to unnest this dataset using data.table.
...
5
votes
5
answers
342
views
R: Sum with a Multiple criteria for each Row. Processing time is huge
I am trying to do a sum over a vector CountB by "filtering" a DTX with multiple criteria
same location1,
same location2, and
only rows with CountA strictly less than the CountA in that ...
4
votes
6
answers
446
views
How to get the highest values for two unrelated columns?
Let's say I have a data.table that looks like this:
library(data.table)
dt <-
rowwiseDT(
group=, a=, b=,
"a", 1, 10,
"a", 10, 1,
"a", 9, 9,
&...
1
vote
0
answers
89
views
Is it normal that filtering rows of a data.table with a variable named as one of its columns does not work? [duplicate]
For instance, if I do this:
library(data.table)
foo <- data.table(a=c(1,2))
foo[a==1,,]
I get
> a
> <num>
> 1: 1
If I do this instead:
a <- 1
foo[a==a,,]
I get
&...
4
votes
0
answers
251
views
Writing simple functions that take data.table column name(s) as arguments
I am new to data.table and would like to figure out the best way of doing the following:
I would like to write a function that takes multiple column names as arguments. I am fine needing the pass the ...
3
votes
2
answers
116
views
Filter grouped data with filter function and if else condition in R
Consider the data frame
id <-c(1,1,1,2,2,3,3,3,3)
x1 <-c("no","yes","yes","no","no","no","no","no","yes")...
1
vote
1
answer
140
views
How to track which observations are new in each year?
I have a dataset that looks like this:
library(data.table)
library(ggplot2)
set.seed(123)
years <- 2010:2020
max_colors <- 50
data <- data.frame()
for (year in years) {
n_colors <- ...
0
votes
0
answers
42
views
R LLM classification using tidyllm with data.tables doesn't work in j? [duplicate]
I'm using R to classify consultation notes on a number of criteria. My code works when I run it row by row, but not when I try to run it with data.table operators (I've tried data.frame transform ...
4
votes
2
answers
146
views
Bin Granges with Gaps
I am try to split Granges to specific n of bins, usually, GenomicRanges::tile could work for this. However, my Granges has some gaps, for example:
# if (!require("BiocManager", quietly = ...
2
votes
2
answers
91
views
How can I duplicate groups of rows in a data.table when each group needs duplication a different number of times?
I have two data.tables. The first (dt1) has N sets of observations per individual. The second (dt2) contains pairings of two individuals. I want the output (dt3) to contain columns of observations for ...
16
votes
1
answer
503
views
Is there an equivalent of dplyr data pronouns in data.table?
Is there a way to tell data.table to look for an external variable instead of a column name, just like what you can do with the .env pronoun in dplyr?
Imagine you have a dataframe with the column name ...
5
votes
1
answer
121
views
Why don't ITime columns in data.frame get the column name from vector?
library(data.table)
DateTime<-as.POSIXct(c("2025-05-16 00:00:02 CDT", "2025-05-16 00:00:03 CDT", "2025-05-16 00:00:06 CDT", "2025-05-16 00:00:07 CDT"))
...
1
vote
1
answer
126
views
What Happens When a Raster File Is Transformed into a data.table?
The as.data.frame (xy = true) function in terra has been used before for raster data to data frame conversion.
Suddenly I realised that I could use it in conjunction with the data.table package. I'm ...
4
votes
3
answers
143
views
Getting mean of multiple rows based on interval dataframe in R
Let's say I have the following dataframe
df1=read.table(text="ID POSITION S1 S2
1 1 10 10
1 2 20 0
1 3 10 0
1 4 20 0
1 5 10 50
2 1 10 0
2 2 20 10
2 3 20 10
2 4 20 10
2 5 20 ...
5
votes
3
answers
218
views
data.table two‐dot pronoun (..) in i or tidyverse bang bang !! equivalent in data.table
I'm trying to filter a data.table by comparing a column to an external R variable using the "two‐dot" pronoun (..), but I keep getting
Error in `[.data.table`(dt, reg == ..reg) : object '.....
1
vote
2
answers
168
views
Efficient rolling, non-equi joins
Looking for the current most efficient approach in either R, python or c++ (with Rcpp).
Taking an example with financial data,
df
time bid ask time_msc ...
1
vote
2
answers
82
views
Unify two columns skipping NAs [duplicate]
Having a data.table like the following:
a <- data.table(col1 = c(1, 2, 3, NA, NA),
col2 = c(NA, NA, NA, 4, 5),
col3 = c("a", "b", "c", ...
3
votes
1
answer
148
views
Reading through multiple files in chunks in R
I'm trying to read through multiple compressed tables that are 5GB+ in size in R, and because I have insufficient memory to read them into memory all at once I need to process them one chunk at a time,...
5
votes
1
answer
239
views
LOESS on very large dataset
I'm working with a very large dataset containing CWD (Cumulative Water Deficit) and EVI (Enhanced Vegetation Index) measurements across different landcover types. The current code uses LOESS ...
1
vote
1
answer
103
views
Merge tabular data with raster based on key value in raster cell ("left join")
I'd like to join tabular data to a raster using the current cell values as a key. Is there an way to do this with large rasters (100M- 1B cells)? Maybe there's something obvious in terra:: but nothing ...
6
votes
2
answers
223
views
Comparing the values of a certain number previous rows with the current row [closed]
In a database containing firm and patent class values, I want to calculate the following variables:
Technological abandonment: Number of previously active technological patent classes abandoned ...
0
votes
1
answer
158
views
Setting a flag based on two samples' dates
I have written the following code in R which adds a date (dp_date) and creates a flag (dp_flag) in dt.all sample based on columns from data tables dt.all and Info. The issue that I cannot create ...