A library for semantic similarity search
☆26Jan 31, 2025Updated last year
Alternatives and similar repositories for semsis
Users that are interested in semsis are comparing it to the libraries listed below
Sorting:
- script to evaluate pre-trained Japanese word2vec model on Japanese similarity dataset☆12Nov 4, 2024Updated last year
- ☆29Apr 10, 2025Updated 10 months ago
- A library for evaluation of Grammatical Error Correction (GEC). Accepted to ACL'25 Demo: "gec-metrics: A Unified Library for Grammatical …☆14Jan 25, 2026Updated last month
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- AJIMEE-Bench (Advanced Japanese IME Evaluation Benchmark)☆18Jan 13, 2025Updated last year
- An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)☆10May 31, 2024Updated last year
- To be readable without enhancing english power.☆10Jul 22, 2020Updated 5 years ago
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆28Sep 20, 2025Updated 5 months ago
- JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dat…☆13Aug 5, 2024Updated last year
- Yet another Python binding for Juman++/KNP/KWJA☆38Updated this week
- ☆33Jul 31, 2024Updated last year
- ☆19Dec 6, 2024Updated last year
- Visualize, share, and keep your favorite music artists on the Web☆14May 23, 2023Updated 2 years ago
- 🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer☆252Feb 7, 2026Updated 3 weeks ago
- ☆17Aug 30, 2018Updated 7 years ago
- ☆15Mar 15, 2022Updated 3 years ago
- JQaRA: Japanese Question Answering with Retrieval Augmentation - 検索拡張(RAG)評価のための日本語Q&Aデータセット☆43Sep 9, 2025Updated 5 months ago
- DefSent: Sentence Embeddings using Definition Sentences☆22Aug 5, 2021Updated 4 years ago
- Word acquisition in neural language models (TACL 2022).☆20Jan 30, 2025Updated last year
- YAST - Yet Another SPLADE or Sparse Trainer☆21Jun 16, 2025Updated 8 months ago
- The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)☆84Jan 6, 2026Updated 2 months ago
- Scripts for creating a Japanese-English parallel corpus and training NMT models☆18Nov 9, 2021Updated 4 years ago
- [ICLR 2026] Evaluating the performance of LLMs on Japanese challenging financial tasks.☆30Jul 28, 2025Updated 7 months ago
- The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.☆125Nov 13, 2025Updated 3 months ago
- Darts-clone python binding☆20Apr 23, 2022Updated 3 years ago
- Word Rotator's Distance☆19Sep 5, 2021Updated 4 years ago
- ☆17May 31, 2023Updated 2 years ago
- A Japanese dependency parser based on BERT☆23Oct 26, 2022Updated 3 years ago
- Efficient, Extensible kNN-MT Framework☆19Sep 7, 2024Updated last year
- Give your dependencies stars on GitHub! 🌟☆18May 1, 2021Updated 4 years ago
- Codes for <Kernelized Bayesian Softmax for Text Generation> in NeurIPS 2019☆16Nov 20, 2019Updated 6 years ago
- Easily turn large English text datasets into Japanese text datasets using open LLMs.☆26Jan 20, 2025Updated last year
- Easy-to-use scripts to fine-tune GPT-2-JA with your own texts, to generate sentences, and to tweet them automatically.☆19Aug 26, 2025Updated 6 months ago
- ☆43Feb 2, 2024Updated 2 years ago
- ☆149Updated this week
- ☆19May 23, 2024Updated last year
- DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings☆19Nov 24, 2021Updated 4 years ago
- 日本語フェイクニュースデータセット☆20May 2, 2021Updated 4 years ago
- AASC: ACL Anthology Sentence Corpus☆20Oct 28, 2020Updated 5 years ago