This simple Python script uses jieba to count all the words in a file and display the most frequent ones.
Currently, it's very basic and therefore not flexible.
Do pip install jieba
.
usage: seg.py [-h] [-a] file
positional arguments:
file the text file to process
optional arguments:
-h, --help show this help message and exit
-a, --all find all possible words
(from 1984 by George Orwell)
温斯顿 587
可能 325
没有 323
知道 258
想 242
党 233
奥 218
布兰 210
已经 206
这种 187
...
Currently, if you don't want to see these words, or you want to see more than ten, just edit the code. In the future, there will be a default stop list and you'll be able to specify your own additional stop lists. See Current issues and goals for corresponding issues.
- [x] missing argument just errs out
- [ ] add option for (multiple) stop lists
- [ ] add option to set delimiter
- [ ] add minimum frequency
- [ ] handle standard input
Have fun Chinese word frequencying!
- Chinese word list copy-pasted (with modification) from stopwords-zh