4

I am currently trying to build a small webscraper.

I am using the following code to scrape a website:

webpage <- "https://www.whisky.de/shop/Schottland/Single-Malt/Macallan-Triple-Cask-15-Jahre.html"
content <- read_html(webpage)

However, when I run the second line with the read_html command, I get the following error message:

Error in open.connection(x, "rb") : SSL certificate problem: certificate has expired

Does anyone of you know where this is coming from? When I used it a few days ago, I did not have any trouble with it.

I am using Mac OS X 10.15.5, RStudio (1.2.5033) I also installed the library "rvest"

Many thanks for your help in advance!

3 Answers 3

8

I was getting the same problem for another website, but the other answer did not solve it for me. I'm posting what worked for me in case it is useful to someone else.

library(tidyverse)
library(rvest)
webpage <- "https://www.whisky.de/shop/Schottland/Single-Malt/Macallan-Triple-Cask-15-Jahre.html"
content <- webpage %>% 
  httr::GET(config = httr::config(ssl_verifypeer = FALSE)) %>% 
  read_html()  

See here for a discussion about this solution.

Sign up to request clarification or add additional context in comments.

2 Comments

This is actually the solution that works!
Thanks for posting this, was the only solution that worked for me
2

Try using the GET function.

webpage <- "https://www.whisky.de/shop/Schottland/Single-Malt/Macallan-Triple-Cask-15-Jahre.html"
content <- read_html(GET(webpage))

I should have mentioned the GET function is part of the httr R package. Make sure you use GET and not get.

4 Comments

Dear Joshua, thanks for your quick answer! Unfortunately, it doesn't work for me. I get an error: "Object '[the link]' not found". Could you explain to me, how the get function should work in this context? Many thanks!
Sorry, GET is part of the httr R package. Make sure to install the package with install.packages("httr") and then either call the library with library(httr) at the beginning of your code or try content <- read_html(httr::GET(webpage))
Pretty sure it's because of this issue, affecting many sites everywhere. Nothing you can do until it's fixed: twitter.com/hrbrmstr/status/1266837823111471104?s=20
Thank you, Joshua and David! JoshuaMire: now it worked! David: Thanks for the information, I think they already solved this. But very interesting to see!
1

I had the same problem. I fixed it by changing the ssl settings in R. Just add the following line to the beginning of your code (at least before you call read_html()):

httr::set_config(config(ssl_verifypeer = FALSE, ssl_verifyhost = FALSE))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.