Parsing a website in java using jsoup


I decided to write a simple TV program using jsoup . The task is more than boring, but new for me because I have not worked with parsers before. I ask you for help not in writing code, but for advice on how to work with the parser, I read articles, but they only talk about headlines, etc., but here's how, for example, to parse the schedule from the first channel on the page . And maybe you think that jsoup complete crap, advise another.


There are 2 types of html/xml parsers:

  1. SAX parser – parses in stream mode, html / xml stream is fed to the input, in certain places the so-called. handlers, that is, interceptors that say "now the parser has come across such and such an element." In the handler, the proger usually inserts its code and does its job
  2. DOM parser – the entire source is shoved, the output is a tree – sometimes quite complex.

jsoup is a kind of DOM parser, so it's all a matter of positioning itself correctly in the parsed tree – or in terms of DOM parsing in nodes. This is described in the jsoup API documentation in the org.jsoup.nodes package.

At the same time, it will be useful to read about the DOM , this will immediately direct the brains in the right direction.

Good luck.

Scroll to Top