Question:
Problems connecting to a certain site via RStudio
url <- "https://www.jusbrasil.com.br/diarios/busca?q=%22licen%C3%A7a+sem+vencimentos%22&idtopico=T10001849&o=data"
links <- read_html(url) %>% html_nodes('.DocumentSnippet') %>% html_nodes('a')
Generating the following error:
Error in open.connection(x, "rb") :
Failed to connect to www.jusbrasil.com.br port 443: Connection refused
I believe it is the proxy configuration in RStudio (or in R itself). I tried the solutions below, but there was always the same error:
-
Configuring R to Use an HTTP or HTTPS Proxy
-
The code
set_config(use_proxy(url = "meuproxy", port = "meuproxy", user="user" e password = "password"))
. With the proxy and my network login and password.
I know the code works as I tested it on my home computer and got the expected links.
Answer:
Access this site before: https://www.jusbrasil.com.br/robots.txt there are definitions of what you can or cannot access.
Robots.txt: https://rockcontent.com/br/blog/robots-txt/
This answer can help you: https://stackoverflow.com/questions/35690914/web-scraping-the-iis-based-website