GitXplorerGitXplorer
k

urlnorm

public
6 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
4f51c0b1aa69392382558769ed4741b128a1a2d8

Simple pre plugin hook example (fix myspace redirect links)

kkurtmckee committed 15 years ago
Unverified
88890d852e7d2263d19eb335d3e271da11c22d5d

Add _register_pre_plugin and standardize plugin hook names

kkurtmckee committed 15 years ago
Unverified
55a9d677db9350c51483098e02cd322de4132688

Add tests for empty query and fragment stripping

kkurtmckee committed 15 years ago
Unverified
19c8862f0d2a0f23d5f99eab0f39c744979b8e6b

Plugin example: remove `www.` from hostname

kkurtmckee committed 15 years ago
Unverified
6ae4a8dbb9da8a2227e41ad84b3884801ce10595

Basic plugin example: remove default files in URL

kkurtmckee committed 15 years ago
Unverified
79818070a3bbfa5ee625e10f2e449210d080da53

Forgot to remove the parse_qsl import

kkurtmckee committed 15 years ago

README

The README file for this repository.

Purpose

The primary goal of urlnorm.py is to normalize HTTP and HTTPS URLs in a similar fashion to browser address bars so that the resource the URL is pointing at can be retrieved. For instance, all of the following URLs will be normalized to http://domain.example/:

The secondary goal of urlnorm.py is to provide a basic way to "fix" URLs with additional or unnecessary cruft attached. This is accomplished through a very simple plugin system.

Usage

urlnorm.py is a single file, so it can copied anywhere you can import it. Here's a simple example:

>>> from urlnorm import urlnorm
>>> urlnorm('domain.example')
http://domain.example/

It is also possible to specify a base URL:

>>> urlnorm('/path', 'http://domain.example/')
http://domain.example/path

The full call pattern is urlnorm(url, base=None), where url and base are both strings, and the return result is also a string.

Plugins

There are two functions provided for registering a plugin, register_pre_plugin and register_post_plugin. "Pre" plugins must accept a single string argument - an unparsed URL - and return a URL string. "Post" plugins must accept a single dictionary argument - a parsed URL - and return a single dictionary argument. The dictionary will contain multiple keys representing the different parts of the URL, including hostname and query to name two.

To register a plugin, call the appropriate register function with the plugin function as an argument. Here's an example of a no-op "pre" plugin:

>>> plugfn = lambda u: u
>>> urlnorm.register_pre_plugin(plugfn)

Several sample plugins are included in the plugins/ directory of the source code to demonstrate both types of plugins.

License

urlnorm.py is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

urlnorm.py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with urlnorm.py. If not, see http://www.gnu.org/licenses/.


Copyright (C) 2010 Kurt McKee <contactme@kurtmckee.org>

This README is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.