In this post under Jsoup, I will explain with example how to parse an HTML data.

The HTML data can be present in a local file, in a String, or at a URL.

Jsoup provides overloaded methods to parse html data at these locations

For our example we will have local file named “Input1.html” in the classpath of the java code with below content.

Input1.html


<html>
    <head>
        <title>Input1</title>;
    </head>
    <body>
        <p>Input1</p>
    </body>
</html>

In our example we will parse the above file and also the html data present at url “www.google.com”.

Below is main code showing how to parse html documents.

Main Code


1  import java.io.File;
2  import java.io.IOException;
3  import java.net.URL;
4  
5  import org.jsoup.Jsoup;
6  import org.jsoup.nodes.Document;
7  
8  public class JsoupDemo1 {
9      public static void main(String[] args) throws IOException {
10         File file = new File("Input1.html");
11         Document document = Jsoup.parse(file, "UTF-8");
12         System.out.println(document.title());
13         
14         URL url = new URL("http://www.google.com");
15         document = Jsoup.parse(url, 10000);
16         System.out.println(document.title());
17     }
18 }

As you can see in the above code, at line 11 and 15, we are calling different overloaded versions of “parse” static method available in Jsoup class.

The return of “parse” static method is an instance of Document class which represents the parsed html document.

Once parsed, we are printing the documents title to the console.

In this way we can parse html contents using Jsoup library

The output will be as shown below

Output

Input1
Google

Code Recipes Blog

Welcome to Programming Learning Center.

Parsing an HTML document

Input1.html

Main Code

Output

Published by sumanthprabhakar

Leave a comment Cancel reply

Input1.html

Main Code

Output

Share this:

Related

Published by sumanthprabhakar

Leave a comment Cancel reply