Validating html documents against whitelist of html attributes

In one of my previous posts under Jsoup, I showed how to verify html documents against a whitelist of html tags. In this post I will show how to verify html documents against a whitelist of html attributes along with whitelist of html tags. Below is the complete main code for your reference Main class…… Continue reading Validating html documents against whitelist of html attributes

Getting all html elements in a document

In this post under Jsoup, I will show with example how to get a list of all html elements in a document. Below is the main class Main Class 1 import java.io.File; 2 import java.io.IOException; 3 import java.util.Iterator; 4 5 import org.jsoup.Jsoup; 6 import org.jsoup.nodes.Document; 7 import org.jsoup.nodes.Element; 8 import org.jsoup.select.Elements; 9 10 public class…… Continue reading Getting all html elements in a document

Clearing HTML content of invalid or blacklisted tags

In this post under Jsoup, I will show how to remove invalid or blacklisted tags from the user inputed HTML content. In our example we will have an html content with one invalid tag “script” and we will have whitelist (i.e., list of allowed tags) containing “p” and “span” tag. When we run our code,…… Continue reading Clearing HTML content of invalid or blacklisted tags

Validating html documents against whitelist of html tags

You may have a requirement where your application has to accept html data as input from the user and you have to make sure that the input contains only those tag that are allowed by your application. In this post under Jsoup, I will show how to implement the above requirement. First we have to…… Continue reading Validating html documents against whitelist of html tags

Parsing an HTML fragment as a body of new html document

In this post under Jsoup, I will explain with example how to parse a html fragment as a body of a new html document. Suppose you have a html fragment as shown below <a href=’wwww.google.com’/> Which you want as body of a new html document as shown below for your reference <html> <head></head> <body> <a…… Continue reading Parsing an HTML fragment as a body of new html document