daac-tools / python-daachorse
๐ A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)
โ15Updated last year
Related projects โ
Alternatives and complementary repositories for python-daachorse
- Rust implementation of SIF and uSIF: Simple and fast sentence embeddingโ19Updated 11 months ago
- AllenNLP integration for Shiba: Japanese CANINE modelโ12Updated 3 years ago
- Funer is Rule based Named Entity Recognition tool.โ22Updated 2 years ago
- A library for semantic similarity searchโ23Updated 2 months ago
- Use custom tokenizers in spacy-transformersโ16Updated 2 years ago
- Code for COLING 2020 Paperโ13Updated 2 weeks ago
- Utility scripts for preprocessing Wikipedia texts for NLPโ76Updated 7 months ago
- Finding all pairs of similar documents time- and memory-efficientlyโ58Updated 2 years ago
- ๐ฆ Rust library of natural language dictionaries using character-wise double-array tries.โ28Updated last year
- Edit and create Kubernetes job from cronjob template using your EDITORโ15Updated 4 months ago
- The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)โ33Updated 2 weeks ago
- โ18Updated last month
- Repository of ACL2023 paper: Unbalanced Optimal Transport for Unbalanced Word Alignmentโ36Updated last year
- โ11Updated 2 months ago
- Rust library providing fast language model queries in compressed spaceโ23Updated 2 years ago
- DIRECT: Direct and Indirect REsponses in Conversational Text Corpusโ16Updated 3 years ago
- Japanese data from the Google UDT 2.0.โ28Updated last year
- Yet another sentence-level tokenizer for the Japanese textโ22Updated 2 years ago
- โ18Updated 5 months ago
- โ7Updated 3 years ago
- โ24Updated 2 weeks ago
- Wikipediaใใไฝๆใใๆฅๆฌ่ชๅๅฏใใใผใฟใปใใโ34Updated 4 years ago
- โ25Updated 5 months ago
- Yada is a yet another double-array trie library aiming for fast search and compact data representation.โ31Updated 8 months ago
- Codes to pre-train Japanese T5 modelsโ40Updated 3 years ago
- This repository has implementations of data augmentation for NLP for Japanese.โ64Updated last year
- Python Implementation of EmbedRankโ49Updated 5 years ago
- Annotated Fuman Kaitori Center Corpusโ17Updated 11 months ago
- A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpusโ10Updated 4 months ago
- ๆฅๆฌ่ชใใญในใใซๅฏพใใ wikification ใฎใใใฎใฝใใใฆใงใขโ15Updated 7 years ago