I decided to write a simple TV program using
jsoup . The task is more than boring, but new for me because I have not worked with parsers before. I ask you for help not in writing code, but for advice on how to work with the parser, I read articles, but they only talk about headlines, etc., but here's how, for example, to parse the schedule from the first channel on the page . And maybe you think that
jsoup complete crap, advise another.
There are 2 types of html/xml parsers:
- SAX parser – parses in stream mode, html / xml stream is fed to the input, in certain places the so-called. handlers, that is, interceptors that say "now the parser has come across such and such an element." In the handler, the proger usually inserts its code and does its job
- DOM parser – the entire source is shoved, the output is a tree – sometimes quite complex.
jsoup is a kind of DOM parser, so it's all a matter of positioning itself correctly in the parsed tree – or in terms of DOM parsing in nodes. This is described in the jsoup API documentation in the
At the same time, it will be useful to read about the DOM , this will immediately direct the brains in the right direction.