GitXplorerGitXplorer
h

hanzi_chaizi

public
354 stars
59 forks
12 issues

Commits

List of commits on branch master.
Unverified
ee8981da4af0e500fcd4e3619cb3499937165d1a

bugfix in Github CI

hhowl-anderson committed 3 months ago
Unverified
b54e54821bf61b13dad2bc65b56db53fcc16c867

bugfix in Github CI

hhowl-anderson committed 3 months ago
Unverified
61a90c76bf7b8ea1c152a9ee8e3dd5bfbd844735

add Github CI

hhowl-anderson committed 3 months ago
Unverified
f14422ea5a2a9b29e14ea764059f253941ccdc92

update README

hhowl-anderson committed 3 months ago
Unverified
3d3d46e3f9e863908287064eda927f10c199cceb

upgrade to v0.2: code maintaining

hhowl-anderson committed 3 months ago
Verified
937410c3ef6308767f34955020c2867698d23cde

Update README.md

hhowl-anderson committed 2 years ago

README

The README file for this repository.

Hanzi decomposition (Chinese character decomposition) | 汉字拆字

拆字是指將一文字,以筆畫、字形等基本組成單位分解成多個文字。 The decomposition of characters refers to breaking down a single character into multiple characters based on its basic components, such as strokes and structural elements.

汉字拆字让字型相似的字具有相似的拆解结果。 Hanzi decomposition yields similar decomposition results for characters with similar structures.

这种特性可以被深度学习模型用来作为字的特征之一:字形的特征。 This feature can be used by deep learning models as one of the features of characters: the structural feature.

Installation

pip install hanzi_chaizi

Usage

from hanzi_chaizi import HanziChaizi

hc = HanziChaizi()
result = hc.query('名')

print(result)

Output:

['夕', '口']

Development

Data source

Data from this project: 漢語拆字字典

parsing and convert data format

pytohn dev_scripts/parse.py

Credits

Data from this project: 漢語拆字字典

Citation

@misc{kong2018hanzichaizi,
  title={Hanzi Chaizi},
  author={Xiaoquan Kong},
  howpublished={https://github.com/howl-anderson/hanzi_chaizi},
  year={2018}
}

If the package is cited in books, seminars, and academic research papers, or used in company products, you are welcome (but not required) to email me about this. I'm glad to see the package being used and valuable to everyone.