howl-anderson
Graduate student in AI @Duke Univeristy; Developer Expert in Machine Learning @Google; Mentor of TensorFlow @Google Summer of Code; SuperHero @Rasa. Book Author
Repositories
Select a repository to view its commits, contributors, and more.unlocking-the-power-of-llms
使用 Prompts 和 Chains 让 ChatGPT 成为神奇的生产力工具!Unlocking the power of LLMs.
Chinese_models_for_SpaCy
SpaCy 中文模型 | Models for SpaCy that support Chinese
hanzi_chaizi
汉字拆字库,可以将汉字拆解成偏旁部首,在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components, which can be used as character shape features in machine learning.
hanzi_char_featurizer
汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning
tools_for_corpus_of_people_daily
人民日报语料处理工具集 | Tools for Corpus of People's Daily
WeatherBot
一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面
ATIS_dataset
The ATIS (Airline Travel Information System) Dataset
MicroTokenizer
一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..
rasa_chinese
rasa_chinese 专门针对中文语言的 rasa 组件扩展包,提供了许多针对中文语言的组件
seq2annotation
基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF,更多算法正在持续添加中)实现中文分词(Tokenizer / segmentation)、词性标注(Part Of Speech, POS)和命名实体识别(Named Entity Recognition, NER)等序列标注任务。
MITIE_Chinese_Wikipedia_corpus
Pre-trained Wikipedia corpus by MITIE
chinese-wikipedia-corpus-creator
Corpus creator for Chinese Wikipedia