superlinear-ai / wtpsplit-liteLinks
✂️ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) models
☆18Updated this week
Alternatives and similar repositories for wtpsplit-lite
Users that are interested in wtpsplit-lite are comparing it to the libraries listed below
Sorting:
- German Language Understanding Evaluation Benchmark @NAACL24☆15Updated last month
- Temporary remove unused tokens during training to save ram and speed.☆24Updated 2 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆26Updated 9 months ago
- Library for fast text representation and classification.☆31Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆21Updated 2 months ago
- State-of-the-art paired encoder and decoder models (17M-1B params)☆44Updated last month
- INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.☆24Updated last year
- ☆10Updated 11 months ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆20Updated last year
- My NER Experiments with ModernBERT and Ettin☆22Updated last month
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆63Updated 3 weeks ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆58Updated last year
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆23Updated 5 months ago
- One-stop shop for running and fine-tuning transformer-based language models for retrieval☆59Updated this week
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated 2 years ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"☆33Updated 2 months ago
- Simple-to-use scoring function for arbitrarily tokenized texts.☆46Updated 6 months ago
- ☆43Updated 2 years ago
- ☆13Updated 8 months ago
- ☆82Updated 3 months ago
- Official implementation of "GPT or BERT: why not both?"☆57Updated last month
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆17Updated last year
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆47Updated 2 years ago
- Multilingual Entity Linking model by BELA model☆12Updated 2 years ago
- NLP with Rust for Python 🦀🐍☆64Updated 3 months ago
- ☆21Updated 3 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆27Updated last year
- ☆27Updated 6 months ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Updated 2 years ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"☆27Updated 5 months ago