Webscraping, read_html() - Error in open.connection(x, "rb") : SSL certificate problem: certificate has expired

Question

I am currently trying to build a small webscraper.

I am using the following code to scrape a website:

webpage <- "https://www.whisky.de/shop/Schottland/Single-Malt/Macallan-Triple-Cask-15-Jahre.html"
content <- read_html(webpage)

However, when I run the second line with the read_html command, I get the following error message:

Error in open.connection(x, "rb") : SSL certificate problem: certificate has expired

Does anyone of you know where this is coming from? When I used it a few days ago, I did not have any trouble with it.

I am using Mac OS X 10.15.5, RStudio (1.2.5033) I also installed the library "rvest"

Many thanks for your help in advance!

Roland · Accepted Answer · 2020-08-08 22:04:42Z

8

I was getting the same problem for another website, but the other answer did not solve it for me. I'm posting what worked for me in case it is useful to someone else.

library(tidyverse)
library(rvest)
webpage <- "https://www.whisky.de/shop/Schottland/Single-Malt/Macallan-Triple-Cask-15-Jahre.html"
content <- webpage %>% 
  httr::GET(config = httr::config(ssl_verifypeer = FALSE)) %>% 
  read_html()

See here for a discussion about this solution.

answered Aug 8, 2020 at 22:04

Roland

4076 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

nd091680 Over a year ago

This is actually the solution that works!

CoolGuyHasChillDay Over a year ago

Thanks for posting this, was the only solution that worked for me

Dharman · Accepted Answer · 2020-06-16 21:55:05Z

2

Try using the GET function.

webpage <- "https://www.whisky.de/shop/Schottland/Single-Malt/Macallan-Triple-Cask-15-Jahre.html"
content <- read_html(GET(webpage))

I should have mentioned the GET function is part of the httr R package. Make sure you use GET and not get.

edited Jun 16, 2020 at 21:55

Dharman♦

33.9k27 gold badges105 silver badges157 bronze badges

answered May 30, 2020 at 18:02

Joshua Mire

7361 gold badge6 silver badges17 bronze badges

4 Comments

moellivm Over a year ago

Dear Joshua, thanks for your quick answer! Unfortunately, it doesn't work for me. I get an error: "Object '[the link]' not found". Could you explain to me, how the get function should work in this context? Many thanks!

Joshua Mire Over a year ago

Sorry, GET is part of the httr R package. Make sure to install the package with install.packages("httr") and then either call the library with library(httr) at the beginning of your code or try content <- read_html(httr::GET(webpage))

David Ranzolin Over a year ago

Pretty sure it's because of this issue, affecting many sites everywhere. Nothing you can do until it's fixed: twitter.com/hrbrmstr/status/1266837823111471104?s=20

moellivm Over a year ago

Thank you, Joshua and David! JoshuaMire: now it worked! David: Thanks for the information, I think they already solved this. But very interesting to see!

Peter · Accepted Answer · 2022-11-27 09:11:35Z

1

I had the same problem. I fixed it by changing the ssl settings in R. Just add the following line to the beginning of your code (at least before you call read_html()):

httr::set_config(config(ssl_verifypeer = FALSE, ssl_verifyhost = FALSE))

answered Nov 27, 2022 at 9:11

Peter

3735 silver badges21 bronze badges

Collectives™ on Stack Overflow

Webscraping, read_html() - Error in open.connection(x, "rb") : SSL certificate problem: certificate has expired

3 Answers 3

2 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related