Validating html documents against whitelist of html attributes

In one of my previous posts under Jsoup, I showed how to verify html documents against a whitelist of html tags. In this post I will show how to verify html documents against a whitelist of html attributes along with whitelist of html tags. Below is the complete main code for your reference Main class…… Continue reading Validating html documents against whitelist of html attributes

Getting all html elements in a document

In this post under Jsoup, I will show with example how to get a list of all html elements in a document. Below is the main class Main Class 1 import java.io.File; 2 import java.io.IOException; 3 import java.util.Iterator; 4 5 import org.jsoup.Jsoup; 6 import org.jsoup.nodes.Document; 7 import org.jsoup.nodes.Element; 8 import org.jsoup.select.Elements; 9 10 public class…… Continue reading Getting all html elements in a document

Clearing HTML content of invalid or blacklisted tags

In this post under Jsoup, I will show how to remove invalid or blacklisted tags from the user inputed HTML content. In our example we will have an html content with one invalid tag “script” and we will have whitelist (i.e., list of allowed tags) containing “p” and “span” tag. When we run our code,…… Continue reading Clearing HTML content of invalid or blacklisted tags

Parsing an HTML fragment as a body of new html document

In this post under Jsoup, I will explain with example how to parse a html fragment as a body of a new html document. Suppose you have a html fragment as shown below <a href=’wwww.google.com’/> Which you want as body of a new html document as shown below for your reference <html> <head></head> <body> <a…… Continue reading Parsing an HTML fragment as a body of new html document