Performing multiple operations on multiple data.tables

Question

I have 30 tables I created. Their names are structured as follows:
mdl_(race)_(wage quartile).
(race) is one of the following: whites, blacks, hispanics, asians, others, or all.
(wage quartile) is one of the following: Q1, Q2, Q3, Q4, and allQ.
Since I have 6 race categories and 5 wage quartiles, I have 6*5 = 30 objects!

Ex: Linear model that includes only hispanics in the 1st quartile of wage distribution => mdl_hispanics_Q1
Ex: Linear model that includes all races and all wage quartiles => mdl_all_allQ

All tables are formatted identically, with different values of course:

          Variables     Estimate   Std. Error    t value      Pr(>|t|)
 1:       Intercept 37.231178895 9.486380e-02 392.469814  0.000000e+00
 2:         forborn -0.612941167 5.174224e-02 -11.846051  2.300944e-32
 3:          female -3.238655089 4.797890e-02 -67.501655  0.000000e+00
 4:        numchild  0.583390602 2.239027e-02  26.055543 1.841656e-149
 5: numchild_female  0.371351058 9.086739e-02   4.086736  4.376191e-05
 6:              hs  0.173864095 9.180975e-02   1.893743  5.826025e-02
 7:         somecol  0.595612050 9.407851e-02   6.331011  2.439689e-10
 8:         college  1.593917949 9.929766e-02  16.051918  5.923264e-58
 9:        advanced  0.171443556 1.983952e-03  86.415175  0.000000e+00
10:              rw -0.001207904 1.460021e-05 -82.731964  0.000000e+00
11:      rw_squared -0.954029880 3.252520e-02 -29.332024 8.456547e-189

What I want to do is get a numeric vector with 30 values, where each value is the estimate for the variable "forborn" if its statistically significant Pr(>|t|) < 0.1 and zero otherwise. I am a beginner to R, and only know how to do this table by table. This is painfully tedious and takes up so much code. Is there a way I could take advantage of the fact the tables are named similarly and loop this operation in one sweep?

As a beginner, your help is very much appreciated. This might seem like an easy task for you, but a monumental one for me! — AaronSzcz
– AaronSzcz, Commented Dec 4, 2021 at 19:55
There is a function called tables() in the data.table packages that summarizes all the existing data.tables- you can use it with mget — David Arenburg
– David Arenburg, Commented Dec 5, 2021 at 9:03
Using p-values in this way does not represent good statistical practice. — Frank Harrell
– Frank Harrell, Commented Dec 5, 2021 at 14:53
@FrankHarrell, I am aware. However, I am doing this moreso as an exercise in coding — AaronSzcz
– AaronSzcz, Commented Dec 5, 2021 at 16:54

Andre Wildberg · Accepted Answer · 2021-12-04 21:16:15Z

1

You can try mget to iterate over the data frames, then fetch the data from them with sapply.

EDIT, changed the data frame names to match your description.

ls()
#[1] "mdl_hispanics_..."  "mdl_blacks_..." etc.

as.vector( sapply( mget( 
  grep("mdl_.*[whites|blacks|hispanics|asians|others|all]", 
  ls(), value=T) ), function(x) 
  ifelse( x[x$Variables == "forborn","Pr(>|t|)"] < 0.1,
          x[x$Variables == "forborn","Pr(>|t|)"], 0) ) )
#[1] 2.300944e-32 2.300944e-32 0.000000e+00

edited Dec 4, 2021 at 21:16

answered Dec 4, 2021 at 20:46

Andre Wildberg

19.9k4 gold badges20 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

langtang · Accepted Answer · 2021-12-04 23:33:23Z

1

This might be considered a better way, and it returns a vector of the Estimate for forborn if p-value<0.1, or 0 [not the p-value itself]

rbindlist(lapply(ls(pattern="mdl_"),get))[
  Variables=="forborn",fifelse(`Pr(>|t|)`<0.1,Estimate,0)
  ]

Note: just adjust the pattern param in ls() if you need further specificity on the objects

answered Dec 4, 2021 at 23:33

langtang

25.3k1 gold badge14 silver badges32 bronze badges

Comments

Rui Barradas · Accepted Answer · 2021-12-04 20:37:12Z

Write a function to extract the column Estimate conditional on the p-value and lapply it to the list.

library(data.table)

fextrac <- function(x){
  y <- x[, Estimate := ifelse(`Pr(>|t|)` < 0.1, Estimate, 0)][["Estimate"]]
  y[x$Variables == "forborn"]
}

Estimates_list <- sapply(dt_list, fextrac)
Estimates_list
#[1] -0.6129412 -0.6129412

Test data

dt1 <- read.table(text = "
         Variables     Estimate   'Std. Error'    't value'      'Pr(>|t|)'
 1:       Intercept 37.231178895 9.486380e-02 392.469814  0.000000e+00
 2:         forborn -0.612941167 5.174224e-02 -11.846051  2.300944e-32
 3:          female -3.238655089 4.797890e-02 -67.501655  0.000000e+00
 4:        numchild  0.583390602 2.239027e-02  26.055543 1.841656e-149
 5: numchild_female  0.371351058 9.086739e-02   4.086736  4.376191e-05
 6:              hs  0.173864095 9.180975e-02   1.893743  5.826025e-02
 7:         somecol  0.595612050 9.407851e-02   6.331011  2.439689e-10
 8:         college  1.593917949 9.929766e-02  16.051918  5.923264e-58
 9:        advanced  0.171443556 1.983952e-03  86.415175  0.000000e+00
10:              rw -0.001207904 1.460021e-05 -82.731964  0.000000e+00
11:      rw_squared -0.954029880 3.252520e-02 -29.332024 8.456547e-189
", header = TRUE, check.names = FALSE)

set.seed(2021)
dt2 <- dt1
dt2$`Pr(>|t|)`[sample(nrow(dt2), nrow(dt2)/3)] <- 0.1

setDT(dt1)
setDT(dt2)
dt_list <- list(dt1, dt2)

Collectives™ on Stack Overflow

Performing multiple operations on multiple data.tables

3 Answers 3

Comments

Comments

Test data

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Test data

Comments

Your Answer

Sign up or log in

Post as a guest

Related