GitXplorerGitXplorer
b

javascript-xhtml-purifier

public
59 stars
7 forks
0 issues

Commits

List of commits on branch master.
Unverified
e517dc8ab723a9ca738b6e993abf4323a9414d35

Fixed problem with br's

bbiilmann committed 14 years ago
Unverified
d63493970f499b4f581a5dcc2127a23ce7e84e19

No longer outputting empty text nodes

bbiilmann committed 14 years ago
Unverified
c54da82b9a3c17f3a1da0bad5ce32d354f39184f

All tests passing - no browser dom used

bbiilmann committed 14 years ago
Unverified
4030e325506cc9f1abd530b9fa41d852cd286a98

No more pretty printer and no storing nodes in browser dom when building tree - 1 test to go

bbiilmann committed 14 years ago
Unverified
e8bb4ed605c59002a0c85014723328ccce7ffe60

Now converting <i> to <em>

ddomestika committed 16 years ago
Unverified
f59798c03b7696cafbb0ad7220a9416c05b023d6

Changed the configuration of allowed attributes for tags, and added an all_elements setting for attributes that should always be permitted (at the moment class is permitted by default)

ddomestika committed 16 years ago

README

The README file for this repository.

h2. Javascript XHTML Purifier

This script provides a method to cleanup dirty html. It will take a string of dirty and badly formatted html, and return a pretty printed valid XHTML string.

h2. Usage

XHTMLPurifier.purify(html_string);

h2. About the Implementation

The purifying is based on section 8.2 in the "HTML5 specification"http://www.whatwg.org/specs/web-apps/current-work/#parsing , and implements a subset of the algorithm described there.

Only a limited set of the permitted HTML5 elements and attributes are permitted, and all other tags/attributes will simply be gone in the resulting XHTML.

h2. Allowed elements

  • strong (b and all headers will currently be transformed to strong)
  • em
  • blockquote
  • ol
  • ul
  • li
  • p
  • pre
  • a
  • img
  • br
  • table
  • caption
  • col
  • colgroup
  • tbody
  • td
  • tfoot
  • th
  • thead
  • tr

All other elements will be stripped from the resulting XHTML, although the inner text will be left intact.

The script was originally created for use with a Rich Text Editor for a CMS, and purposefully puts very firm limits on what can be included in the resulting XHTML. Since it is based on the HTML5 parsing specification it is very robust when it comes to cleaning up tag soup.

h2. License

Copyright © 2008 "Mathias Biilmann Christensen":http://mathias-biilmann.net / "Domestika INTERNET S.L.":http://domestika.com, released under the MIT license (see MIT-LICENSE)

Includes John Resig's and Erik Arvidsson's HTML Parser, which is used as a tokenizer.

HTML Parser By John Resig (ejohn.org) Original code by Erik Arvidsson, Mozilla Public License http://erik.eae.net/simplehtmlparser/simplehtmlparser.js