Validating html documents against whitelist of html attributes

In one of my previous posts under Jsoup, I showed how to verify html documents against a whitelist of html tags.

In this post I will show how to verify html documents against a whitelist of html attributes along with whitelist of html tags.

Below is the complete main code for your reference

Main class


1  import java.io.IOException;
2  
3  import org.jsoup.Jsoup;
4  import org.jsoup.safety.Safelist;
5  
6  public class JsoupDemo8 {
7  	public static void main(String[] args) throws IOException {
8  		String validHtml1 = "<span style='hello'>Welcome To Valid HTML</span>\r\n"; 
9  		
10 		String inValidHtml ="<span script='hello'>Welcome To Valid HTML</span>\r\n"; 
11 		
12 		String validHtml2 = "<span>Welcome To Valid HTML</span>\r\n"; 
13 		
14 		Safelist safelist = new Safelist();
15 		safelist.addAttributes("span", "style");
16 		
17 		System.out.println("Is 'validHtml1' Valid: " + Jsoup.isValid(validHtml1, safelist));
18 		System.out.println("Is 'inValidHTML' Valid: " + Jsoup.isValid(inValidHtml, safelist));
19 		System.out.println("Is 'validHtml2' Valid: " + Jsoup.isValid(validHtml2, safelist));
20 	}
21 }

In the above code, we create three html data assigned to “validHTML1”, “validHtml2” and “inValidHTML” String varibles.

“validHTML1” and “validHtml2” contains valid html whereas “inValidHTML” contains invalid html.

At line 14, we create an instance of “Safelist” class.

At line 15, we are creating a list of valid html attributes in “Safelist” instance by calling “addAttributes” method. In “addAttributes” method, the first argument is the tag name and subsequent arguments are a list of attribute names that are allowed in that tag.

So at line 15 we are saying allow html documents that contain only “span” tag. Also if the “span” tag has any attribute it should only have “style” attribute.

In this way we are validating html documents against a whitelist of html attributes.

Below is the output

Output

Is ‘validHtml1’ Valid: true
Is ‘inValidHTML’ Valid: false
Is ‘validHtml2’ Valid: true

Leave a Reply