GitXplorerGitXplorer
b

javascript-xhtml-purifier

public
59 stars
7 forks
0 issues

Commits

List of commits on branch master.
Unverified
e8578f525fde31e55596f165398abca4b48fcefa

Moved some variable initiation outside of inner loops

ddomestika committed 16 years ago
Unverified
2586d0e13e2a3d67b1f0f6d4415c5cb547dc8ee6

Now allowing rowspan and colspan for tr, td and th tags

ddomestika committed 16 years ago
Unverified
1c79345d4479a992db7ab02f9d4950ff63602d12

Renamed reconstruct_active_formatting_elements to reconstruct_the_active_formatting_elements to match name in html 5 draft

ddomestika committed 16 years ago
Unverified
82074e639eab8f3f51fb5dec5ed0416709cbfe5b

Fixed bug that caused elements after a table to disappear due to a missing assignment in the reset_insertion_mode function.

ddomestika committed 16 years ago
Unverified
c783e24286aac5384db63c30af1e1839c7f51ac5

Added test for two tbodys and a tfoot and ths within tbody

ddomestika committed 16 years ago
Unverified
5641d312e8f849839bf795e5b7e49dfa9645b37b

Removed console.log calls

ddomestika committed 16 years ago

README

The README file for this repository.

h2. Javascript XHTML Purifier

This script provides a method to cleanup dirty html. It will take a string of dirty and badly formatted html, and return a pretty printed valid XHTML string.

h2. Usage

XHTMLPurifier.purify(html_string);

h2. About the Implementation

The purifying is based on section 8.2 in the "HTML5 specification"http://www.whatwg.org/specs/web-apps/current-work/#parsing , and implements a subset of the algorithm described there.

Only a limited set of the permitted HTML5 elements and attributes are permitted, and all other tags/attributes will simply be gone in the resulting XHTML.

h2. Allowed elements

  • strong (b and all headers will currently be transformed to strong)
  • em
  • blockquote
  • ol
  • ul
  • li
  • p
  • pre
  • a
  • img
  • br
  • table
  • caption
  • col
  • colgroup
  • tbody
  • td
  • tfoot
  • th
  • thead
  • tr

All other elements will be stripped from the resulting XHTML, although the inner text will be left intact.

The script was originally created for use with a Rich Text Editor for a CMS, and purposefully puts very firm limits on what can be included in the resulting XHTML. Since it is based on the HTML5 parsing specification it is very robust when it comes to cleaning up tag soup.

h2. License

Copyright © 2008 "Mathias Biilmann Christensen":http://mathias-biilmann.net / "Domestika INTERNET S.L.":http://domestika.com, released under the MIT license (see MIT-LICENSE)

Includes John Resig's and Erik Arvidsson's HTML Parser, which is used as a tokenizer.

HTML Parser By John Resig (ejohn.org) Original code by Erik Arvidsson, Mozilla Public License http://erik.eae.net/simplehtmlparser/simplehtmlparser.js