3

I have a dataframe called "wheat_cities"

The columns in my dataframe are as follows

 "Date"                                  "Wheat..Maximum.Price"                 
 [3] "Wheat..Minimum.Price"                  "Wheat..Modal.Price"                   
 [5] "Wheat..North.Zone..Agra"               "Wheat..North.Zone..Amritsar"          
 [7] "Wheat..North.Zone..Bhatinda"           "Wheat..North.Zone..Chandigarh"        
 [9] "Wheat..North.Zone..Dehradun"           "Wheat..North.Zone..Delhi"             
[11] "Wheat..North.Zone..Gurgaon"            "Wheat..North.Zone..Haldwani"          
[13] "Wheat..North.Zone..Hisar"              "Wheat..North.Zone..Jammu"             
[15] "Wheat..North.Zone..Kanpur"             "Wheat..North.Zone..Karnal"            
[17] "Wheat..North.Zone..Lucknow"            "Wheat..North.Zone..Ludhiana"          
[19] "Wheat..North.Zone..Mandi"              "Wheat..North.Zone..Panchkula"         
[21] "Wheat..North.Zone..Shimla"             "Wheat..North.Zone..Srinagar"          
[23] "Wheat..North.Zone..Varanasi"           "Wheat..West.Zone..Ahmedabad"          
[25] "Wheat..West.Zone..Bhopal"              "Wheat..West.Zone..Bhuj"               
[27] "Wheat..West.Zone..Gwalior"             "Wheat..West.Zone..Indore"             
[29] "Wheat..West.Zone..Jabalpur"            "Wheat..West.Zone..Jaipur"             
[31] "Wheat..West.Zone..Jodhpur"             "Wheat..West.Zone..Kota"               
[33] "Wheat..West.Zone..Mumbai"              "Wheat..West.Zone..Nagpur"             
[35] "Wheat..West.Zone..Panaji"              "Wheat..West.Zone..Raipur"             
[37] "Wheat..West.Zone..Rajkot"              "Wheat..West.Zone..Rewa"               
[39] "Wheat..West.Zone..Sagar"               "Wheat..West.Zone..Surat"              
[41] "Wheat..East.Zone..Bhagalpur"           "Wheat..East.Zone..Bhubaneshwar"       
[43] "Wheat..East.Zone..Cuttack"             "Wheat..East.Zone..Patna"              
[45] "Wheat..East.Zone..Purnia"              "Wheat..East.Zone..Ranchi"             
[47] "Wheat..East.Zone..Rourkela"            "Wheat..East.Zone..Sambalpur"          
[49] "Wheat..East.Zone..Siliguri"            "Wheat..North.East.Zone..Aizwal"       
[51] "Wheat..North.East.Zone..Dimapur"       "Wheat..North.East.Zone..Guwahati"     
[53] "Wheat..North.East.Zone..Itanagar"      "Wheat..North.East.Zone..Shillong"     
[55] "Wheat..South.Zone..Bengaluru"          "Wheat..South.Zone..Chennai"           
[57] "Wheat..South.Zone..Coimbatore"         "Wheat..South.Zone..Dharwad"           
[59] "Wheat..South.Zone..Dindigul"           "Wheat..South.Zone..Ernakulam"         
[61] "Wheat..South.Zone..Hyderabad"          "Wheat..South.Zone..Karimnagar"        
[63] "Wheat..South.Zone..Kozhikode"          "Wheat..South.Zone..Mangalore"         
[65] "Wheat..South.Zone..Mysore"             "Wheat..South.Zone..Palakkad"          
[67] "Wheat..South.Zone..Port.Blair"         "Wheat..South.Zone..Puducherry"        
[69] "Wheat..South.Zone..Thiruchirapalli"    "Wheat..South.Zone..Thiruvananthapuram"
[71] "Wheat..South.Zone..Thrissur"           "Wheat..South.Zone..Tirunelveli"       
[73] "Wheat..South.Zone..Vijaywada"          "Wheat..South.Zone..Visakhapatnam"     
[75] "Wheat..South.Zone..Warangal"           "Wheat..South.Zone..Wayanad"           
> 

I want to change the column names such that for column 5-76, I just get the name after the second "..". For column 2 and 3, I get the name after the first ".."

Since the length of characters differs, I am unable to use the substring command.

Please help. Thanks in advance!

3 Answers 3

3

We could do this with sub to match characters (.*) followed by two dots (\\.{2}), capture the characters after that in a group ((.*)) until the end ($) of the string and replace with the backreference (\\1) of the captured group

names(data) <- sub(".*\\.{2}(.*)$", "\\1", names(data))
names(data)
#[1] "Date"          "Maximum.Price" "Minimum.Price" "Agra"  

data

data <- data.frame(Date = c("2013-01-01", "2013-01-02"), 
   Wheat..Maximum.Price = 5:6, Wheat..Minimum.Price = 1:2, 
     Wheat..North.Zone..Agra = 6:7, stringsAsFactors = FALSE)
Sign up to request clarification or add additional context in comments.

Comments

3
names(data) <- gsub("Wheat..", "", names(data), fixed = T)
names(data) <- gsub("North.Zone..", "", names(data), fixed = T)
names(data)
# [1] "Date" "Maximum.Price" "Minimum.Price" "Modal.Price"   "Agra" "Amritsar"

First we remove "Wheat.." from all column names and then we remove "North.Zone..".

1 Comment

Thank you for your solution, it is very easy to follow.
0

You could use strsplit, this allows you to split string using specific values, ".." for example. As you said, from 5 to above, you want the name after the last "..", and from 2 to 4 you want the third name between "..", and you can do it with this instance:

change_names <- strsplit(colnames(wheat_cities), '[..]')

for(i in 2 : ncol(wheat_cities)){
  if(i %in% c(2 : 4)){
    colnames(wheat_cities)[i] <- change_names[[i]][3]
  }else{
    colnames(wheat_cities)[i] <- last(change_names[[i]])
  }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.