0

I have 30 tables I created. Their names are structured as follows:
mdl_(race)_(wage quartile).
(race) is one of the following: whites, blacks, hispanics, asians, others, or all.
(wage quartile) is one of the following: Q1, Q2, Q3, Q4, and allQ.
Since I have 6 race categories and 5 wage quartiles, I have 6*5 = 30 objects!

  • Ex: Linear model that includes only hispanics in the 1st quartile of wage distribution => mdl_hispanics_Q1
  • Ex: Linear model that includes all races and all wage quartiles => mdl_all_allQ

All tables are formatted identically, with different values of course:

          Variables     Estimate   Std. Error    t value      Pr(>|t|)
 1:       Intercept 37.231178895 9.486380e-02 392.469814  0.000000e+00
 2:         forborn -0.612941167 5.174224e-02 -11.846051  2.300944e-32
 3:          female -3.238655089 4.797890e-02 -67.501655  0.000000e+00
 4:        numchild  0.583390602 2.239027e-02  26.055543 1.841656e-149
 5: numchild_female  0.371351058 9.086739e-02   4.086736  4.376191e-05
 6:              hs  0.173864095 9.180975e-02   1.893743  5.826025e-02
 7:         somecol  0.595612050 9.407851e-02   6.331011  2.439689e-10
 8:         college  1.593917949 9.929766e-02  16.051918  5.923264e-58
 9:        advanced  0.171443556 1.983952e-03  86.415175  0.000000e+00
10:              rw -0.001207904 1.460021e-05 -82.731964  0.000000e+00
11:      rw_squared -0.954029880 3.252520e-02 -29.332024 8.456547e-189

What I want to do is get a numeric vector with 30 values, where each value is the estimate for the variable "forborn" if its statistically significant Pr(>|t|) < 0.1 and zero otherwise. I am a beginner to R, and only know how to do this table by table. This is painfully tedious and takes up so much code. Is there a way I could take advantage of the fact the tables are named similarly and loop this operation in one sweep?

4
  • As a beginner, your help is very much appreciated. This might seem like an easy task for you, but a monumental one for me! Commented Dec 4, 2021 at 19:55
  • There is a function called tables() in the data.table packages that summarizes all the existing data.tables- you can use it with mget Commented Dec 5, 2021 at 9:03
  • Using p-values in this way does not represent good statistical practice. Commented Dec 5, 2021 at 14:53
  • @FrankHarrell, I am aware. However, I am doing this moreso as an exercise in coding Commented Dec 5, 2021 at 16:54

3 Answers 3

1

You can try mget to iterate over the data frames, then fetch the data from them with sapply.

EDIT, changed the data frame names to match your description.

ls()
#[1] "mdl_hispanics_..."  "mdl_blacks_..." etc.

as.vector( sapply( mget( 
  grep("mdl_.*[whites|blacks|hispanics|asians|others|all]", 
  ls(), value=T) ), function(x) 
  ifelse( x[x$Variables == "forborn","Pr(>|t|)"] < 0.1,
          x[x$Variables == "forborn","Pr(>|t|)"], 0) ) )
#[1] 2.300944e-32 2.300944e-32 0.000000e+00
Sign up to request clarification or add additional context in comments.

Comments

1

This might be considered a better way, and it returns a vector of the Estimate for forborn if p-value<0.1, or 0 [not the p-value itself]

rbindlist(lapply(ls(pattern="mdl_"),get))[
  Variables=="forborn",fifelse(`Pr(>|t|)`<0.1,Estimate,0)
  ]

Note: just adjust the pattern param in ls() if you need further specificity on the objects

Comments

0

Write a function to extract the column Estimate conditional on the p-value and lapply it to the list.

library(data.table)

fextrac <- function(x){
  y <- x[, Estimate := ifelse(`Pr(>|t|)` < 0.1, Estimate, 0)][["Estimate"]]
  y[x$Variables == "forborn"]
}

Estimates_list <- sapply(dt_list, fextrac)
Estimates_list
#[1] -0.6129412 -0.6129412

Test data

dt1 <- read.table(text = "
         Variables     Estimate   'Std. Error'    't value'      'Pr(>|t|)'
 1:       Intercept 37.231178895 9.486380e-02 392.469814  0.000000e+00
 2:         forborn -0.612941167 5.174224e-02 -11.846051  2.300944e-32
 3:          female -3.238655089 4.797890e-02 -67.501655  0.000000e+00
 4:        numchild  0.583390602 2.239027e-02  26.055543 1.841656e-149
 5: numchild_female  0.371351058 9.086739e-02   4.086736  4.376191e-05
 6:              hs  0.173864095 9.180975e-02   1.893743  5.826025e-02
 7:         somecol  0.595612050 9.407851e-02   6.331011  2.439689e-10
 8:         college  1.593917949 9.929766e-02  16.051918  5.923264e-58
 9:        advanced  0.171443556 1.983952e-03  86.415175  0.000000e+00
10:              rw -0.001207904 1.460021e-05 -82.731964  0.000000e+00
11:      rw_squared -0.954029880 3.252520e-02 -29.332024 8.456547e-189
", header = TRUE, check.names = FALSE)

set.seed(2021)
dt2 <- dt1
dt2$`Pr(>|t|)`[sample(nrow(dt2), nrow(dt2)/3)] <- 0.1

setDT(dt1)
setDT(dt2)
dt_list <- list(dt1, dt2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.