Problems accessing a website via RStudio

Question:

Problems connecting to a certain site via RStudio

 url <- "https://www.jusbrasil.com.br/diarios/busca?q=%22licen%C3%A7a+sem+vencimentos%22&idtopico=T10001849&o=data"
links <- read_html(url) %>% html_nodes('.DocumentSnippet') %>% html_nodes('a')

Generating the following error:

Error in open.connection(x, "rb") : 
  Failed to connect to www.jusbrasil.com.br port 443: Connection refused

I believe it is the proxy configuration in RStudio (or in R itself). I tried the solutions below, but there was always the same error:

  1. Configuring R to Use an HTTP or HTTPS Proxy

  2. The code set_config(use_proxy(url = "meuproxy", port = "meuproxy", user="user" e password = "password")) . With the proxy and my network login and password.

I know the code works as I tested it on my home computer and got the expected links.

Answer:

Access this site before: https://www.jusbrasil.com.br/robots.txt there are definitions of what you can or cannot access.

Robots.txt: https://rockcontent.com/br/blog/robots-txt/

This answer can help you: https://stackoverflow.com/questions/35690914/web-scraping-the-iis-based-website

Scroll to Top