In one of my previous posts under Jsoup, I showed how to verify html documents against a whitelist of html tags. In this post I will show how to verify html documents against a whitelist of html attributes along with whitelist of html tags. Below is the complete main code for your reference Main class…… Continue reading Validating html documents against whitelist of html attributes
Category: Jsoup
Cloning an html document
In this post under jsoup, I will show with example how to clone the entire html document. For our example we will use the below html document Input1.html <html> <head> <title>Input1</title> </head> <body> <p>Input1</p> </body> </html> Below is the main code that contains the logic for cloning the entire document. Main Class 1 import java.io.File;…… Continue reading Cloning an html document
Getting all html elements in a document
In this post under Jsoup, I will show with example how to get a list of all html elements in a document. Below is the main class Main Class 1 import java.io.File; 2 import java.io.IOException; 3 import java.util.Iterator; 4 5 import org.jsoup.Jsoup; 6 import org.jsoup.nodes.Document; 7 import org.jsoup.nodes.Element; 8 import org.jsoup.select.Elements; 9 10 public class…… Continue reading Getting all html elements in a document
Clearing HTML content of invalid or blacklisted tags
In this post under Jsoup, I will show how to remove invalid or blacklisted tags from the user inputed HTML content. In our example we will have an html content with one invalid tag “script” and we will have whitelist (i.e., list of allowed tags) containing “p” and “span” tag. When we run our code,…… Continue reading Clearing HTML content of invalid or blacklisted tags
Validating html documents against whitelist of html tags
You may have a requirement where your application has to accept html data as input from the user and you have to make sure that the input contains only those tag that are allowed by your application. In this post under Jsoup, I will show how to implement the above requirement. First we have to…… Continue reading Validating html documents against whitelist of html tags
Jsoup’s connect method
In my previous post under Jsoup, I showed with example how to parse an html document located on world wide web using overloaded “parse” method. In this post, I will show with example another approach of doing the same thing. The Jsoup javadoc recommends to follow this approach instead of using overloaded “parse” method. Below…… Continue reading Jsoup’s connect method
Parsing an HTML fragment as a body of new html document
In this post under Jsoup, I will explain with example how to parse a html fragment as a body of a new html document. Suppose you have a html fragment as shown below <a href=’wwww.google.com’/> Which you want as body of a new html document as shown below for your reference <html> <head></head> <body> <a…… Continue reading Parsing an HTML fragment as a body of new html document
Parsing an HTML document
In this post under Jsoup, I will explain with example how to parse an HTML data. The HTML data can be present in a local file, in a String, or at a URL. Jsoup provides overloaded methods to parse html data at these locations For our example we will have local file named “Input1.html” in…… Continue reading Parsing an HTML document