GitXplorerGitXplorer
m

USPTO-PatFT-Web-Crawler

public
30 stars
16 forks
4 issues

Commits

List of commits on branch master.
Verified
01edd477a02437f0ae7daa20da44ac5527929612

Merge pull request #3 from nobodyzxc/master

mmattwang44 committed 5 years ago
Unverified
d3020a5502412ec169c8cdbbf7156947b56a7b17

provide path portability on linux

nnobodyzxc committed 5 years ago
Verified
51fe057c36c1005775ebb2619f98a47231b031a5

Update README.md

mmattwang44 committed 7 years ago
Verified
053a5ea89f39598519928ffec2a49770a15dbf82

Update README.md

mmattwang44 committed 7 years ago
Unverified
7848ba03627ff30333d4c44d79b56487c86c16db

Update README.md

mmattwang44 committed 7 years ago
Unverified
f5e6225fab0ccf24d5ec0ba1f3d41a96f386de63

Update README.md

mmattwang44 committed 7 years ago

README

The README file for this repository.

Web Crawler of USPTO PatFT Database

Crawler for fetching information of US Patents and batch PDF download.
preview:

Motivation

I've participated in patent analyzation project since Apr. 2017. Our team need to search with certain query on PatFT and examine if each resulting patent is suitable for our topic and then analyze suitable patents. I found out that we can download bulk patent data only by searching certain words, names, or regions with Download patent data and PAIR Bulk Data from USPTO's Open Data Portal, which aren't very useful for us, and suitable tools that can be found on the Internet are all charged. So, I started to write a Python scripts containing basic functions, which accelerated the progress of project. To made this program more user friendly, I revised the code and made an UI with PyQt5.  

Download Execution File

The source code has packaged with pyinstaller in Windows
1.Normal package
2.Single executable file

Instruction

You can follow the instruction below or watch this video. It should be easy to learn :).

Patent Fetcher

(1) Insert PN (2) Filtering conditions (3) Information to be fetched (4) PDF type to be downloaded (5) Table

  1. Insert the patent numbers (PNs) to be processed in following ways:
    (a) Choose a CSV file with PNs in the first coulumn (example).
    (b) Search with query (The query should examined on PatFT first) . The PNs should be shown in the table.

  2. (Optional) Filtering the shown PNs with setting the patent types, range of application date & issue date.
    The filtered PNs are also shown on the table but will be deleted in the end of this process.

  3. Fetching the information of patents shown in the table by web crawling.

  4. Download PDF of full-text or drawing section (or both simultaneously) of patents shown in the table.

  5. The table can be saved as a CSV file anytime.

Browser

In the second page, you can insert PN to show the PatFT web of this patent or open PDF with your default browser.

Caution

  1. The program has some problems when fetching information of the patents issued before 1976. Still working on it.
  2. Searching with long query takes a lot of time, same as it takes on PatFT (example). I tried using threading in the program but it leads to more time consumed, and multiprocessing leads to bad connection. If you have a long query with less than 500 results, copying the patents number to a CSV file on your own and insert the file should be faster.
  3. If you encountered any problems or have any suggestion (like adding other function), feel free to contact me!