0

How do I plot a bar graph using data from 3 columns in a dataframe read from a csv file? I tried doing it with the following code but had some difficulty getting my desired output:

setwd("\\path\\to\\csv")
df = read.csv("xxxx.csv")

# All hospitals in AL
AL = df[grep("AL", df$State),]

hos <-subset(AL,Hospital.Name=='COOPER GREEN MERCY HOSPITAL')


# Gives me "Error in -0.01 * height : non-numeric argument to binary operator"

hos <- data.frame (HeartAttack=hos$Heart.Attack.Mortality,HeartFailure=hos$Heart.Failure..Mortality,
                   Pneumonia=hos$Pneumonia.Mortality)

# Gives me the graph without displaying the x-axis values 
# but completely defeats the purpose of reading from a csv file since the values are hard-written

#hos <- data.frame (HeartAttack=c(1),HeartFailure=c(5),Pneumonia=c(10))

barplot(t(as.matrix(hos)),main='Mortality Rate in Cooper Green Mercy Hospital',
        xlab='Illness',ylab='Mortality Rate',beside=TRUE)

The csv file has 10 headers (from left to right): Hospital.Name, City, State, County.Name, Heart.Attack.Mortality, Heart.Attack.Readmission, Heart.Failure..Mortality, Heart.Failure.Readmission, Pneumonia.Mortality and Pneumonia.Readmission. Bold ones are the columns I'm interested in.

Desired output

Note: I have already looked at these two SO questions, but they did not quite solve my problem.

3
  • 1
    We don't have access to your disk file so the code that creates hos is not reproducible. Can you post sample data? Please edit the question with the output of dput(hos). Or, if it is too big with the output of dput(head(hos, 20)). Commented Nov 18, 2018 at 10:21
  • If you're not hell-bent on base plot, you could use ggplot2::geom_col. You can use tidyr::gather to reflow columns into a variable:value pair. Commented Nov 18, 2018 at 10:28
  • @RuiBarradas I've provided the csv file in question. Commented Nov 18, 2018 at 10:32

1 Answer 1

1

Your data has "Not Available" instead of NA in numeric columns so those columns become of class "factor" (if stringsAsFactors = TRUE, the default) or of class "character" (if stringsAsFactors = FALSE). I have therefore run the following right after reading in the data.

df[] <- lapply(df, function(x) {
  is.na(x) <- x == "Not Available"
  x})

i <- sapply(df, function(x) {
  y <- as.numeric(as.character(x))
  !all(is.na(y))
})

df[i] <- lapply(df[i], function(x) as.numeric(as.character(x)))

Another, better, possibility is to read the data in with

df = read.csv("xxxx.csv", stringsAsFactors = FALSE, na.strings = "Not Available")

Then comes your data preparation code.

Now the plot. The space argument is needed to make room for the middle bar label.

barplot(t(as.matrix(hos)),
            main = 'Mortality Rate in Cooper Green Mercy Hospital',
            xlab = 'Illness', ylab = 'Mortality Rate',
            names.arg = names(hos),
            beside = TRUE,
            space = c(0.05, 0))

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.