GitXplorerGitXplorer
a

sandblast

public
11 stars
2 forks
0 issues

Commits

List of commits on branch master.
Unverified
43f8fb998d06140736b7a019d227bdda434d9487

added ability to keep link destinations in cleaned text.

aaarzilli committed 9 years ago
Unverified
0d6505f31854fa1b138807eaca39b21a0ff49f47

qa: two messages

aaarzilli committed 9 years ago
Unverified
57bd1d88e5842758542db37c392bcd5a70e8b424

finished qa rig

aaarzilli committed 9 years ago
Unverified
58cfdc71eb8f22647aaa46bcae897cd5dab98c40

go fmt

aaarzilli committed 9 years ago
Unverified
5d74f849627dbeb911a7df114fc5f07456e0e504

bugfix: cleanAsciiArt: uppercase letters should not be removed

aaarzilli committed 9 years ago
Unverified
34db7349a133b2670267fcb5bf16eccbcd6cf07c

qa rig

aaarzilli committed 9 years ago

README

The README file for this repository.

Library that uses Readability-like heuristics to extract text from an HTML document.

Example:

import "golang.org/x/net/html"node, err := html.Parse(bytes.NewReader(raw_html))
if err != nil {
	log.Fatal("Parsing error: ", err)
}
title, text := sandblast.Extract(node)
fmt.Printf("Title: %s\n%s", title, text)
…

See also example/extract.go, a command line utility to extract text from a URL.