Contextualized per-token embeddings
☆34May 11, 2025Updated 9 months ago
Alternatives and similar repositories for miniCOIL
Users that are interested in miniCOIL are comparing it to the libraries listed below
Sorting:
- Trainable embedding transformation for confidence estimation, feature extraction, explainability and conversion from dense to sparse.☆26Jun 9, 2025Updated 8 months ago
- User-friendly viewer for Parquet files☆10Jan 10, 2026Updated last month
- ☆37Nov 21, 2024Updated last year
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFs☆30Jan 20, 2025Updated last year
- *high-load* benchmarking tool☆16Feb 20, 2026Updated last week
- PathPiece tokenizer☆13Nov 10, 2024Updated last year
- Tool to migrate data into Qdrant☆70Updated this week
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆26Nov 25, 2024Updated last year
- Data for the HIPE 2022 shared task.☆21Nov 29, 2023Updated 2 years ago
- My NER Experiments with ModernBERT and Ettin☆26Jul 17, 2025Updated 7 months ago
- ☆44Feb 11, 2026Updated 2 weeks ago
- Code for SaGe subword tokenizer (EACL 2023)☆27Nov 30, 2024Updated last year
- [NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆43Feb 11, 2026Updated 2 weeks ago
- A framework for evaluating semantic search across custom datasets, metrics, and embedding backends.☆38May 26, 2025Updated 9 months ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆28Oct 3, 2021Updated 4 years ago
- 🚀🤗 A collection of templates for Hugging Face Spaces☆35Oct 9, 2023Updated 2 years ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"☆33Sep 20, 2025Updated 5 months ago
- State-of-the-art paired encoder and decoder models (17M-1B params)☆58Aug 6, 2025Updated 6 months ago
- EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling☆34Nov 21, 2021Updated 4 years ago
- Plugin for the Cheshire Cat AI framework☆11Sep 19, 2023Updated 2 years ago
- DOS Program Development☆13Nov 9, 2022Updated 3 years ago
- Mathematical foundations of data analysis, Winter semester 22-23☆13Jan 31, 2023Updated 3 years ago
- fine-tuning tutorial☆18Feb 20, 2026Updated last week
- A library for probing Stockfish's NNUEs. The code for reading parameters and forward propagation is taken from Stockfish☆12Nov 18, 2025Updated 3 months ago
- Training code for Sparse Autoencoders on Embedding models☆39Feb 27, 2025Updated last year
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- ☆15Oct 24, 2023Updated 2 years ago
- ☆14Dec 12, 2022Updated 3 years ago
- LightGBM for handling label-imbalanced data with focal and weighted loss functions in binary and multiclass classification☆21Jan 29, 2026Updated last month
- A CLI tool to for crypto functions☆13Updated this week
- ☆11Dec 6, 2023Updated 2 years ago
- Redis distributed lock implementation for Python based on Pub/Sub messaging☆11Feb 14, 2026Updated 2 weeks ago
- Linear Attention for Efficient Bidirectional Sequence Modeling☆15May 13, 2025Updated 9 months ago
- ☆10Jan 9, 2024Updated 2 years ago
- ☆10Oct 2, 2024Updated last year
- FlexiTokens☆18Dec 27, 2025Updated 2 months ago
- ☆10Aug 4, 2024Updated last year
- 코로나19 발생현황 변동 및 새 공지사항 푸시알림 서비스(질병관리본부 코로나19 홈페이지 데이터 이용)☆12Jan 5, 2023Updated 3 years ago