Here I implemented a prefix-trie to suggest predictive text options. The trie is populated using a chunk of Enron email data (100k emails).
You can run the example like this (assuming you need to have a recent version of node/yarn installed):
yarn tsc
node index.js
Next steps would be:
- Use promises/await everywhere instead of callbacks
- Serialize the populated tree and save it to disk (so it loads faster)
- Wrap that into a library, with an interface to import data and "query" it
- elasticsearch
- train an ml model
- from scratch → seems more fun - more actual coding
actually this one →
(first chunk only)
→ trie with weight (frequency) at each leaf to order predictions
→ are there any other factors we should take into account when ordering?
→ how can we extend that for near matches?
will it fit in memory:
Could do Cpp for efficiency or java for nice stdlib but would be a lot of effort making an interface
Could do Go → KO since not enough exp, don't want to be checking syntax all the time
python/js easiest → go with js since easier to find nice libraries for interface
autocomplete cli libraries:
if there's time could make a frontend and host it somewhere
- Implement trie, populate it, save that somewhere
- Add cli interface → load trie into mem, print highest x suggestions after each char typed
- fuzzy