huggingface / text-clustering
Easily embed, cluster and semantically label text datasets
☆526Updated last year
Alternatives and similar repositories for text-clustering:
Users that are interested in text-clustering are comparing it to the libraries listed below
- ☆515Updated 5 months ago
- awesome synthetic (text) datasets☆272Updated 5 months ago
- Generative Representational Instruction Tuning☆620Updated last month
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆436Updated this week
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆197Updated 6 months ago
- Official repository for ORPO☆448Updated 10 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,120Updated last week
- Late Interaction Models Training & Retrieval☆276Updated last week
- Best practices for distilling large language models.☆523Updated last year
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆274Updated last month
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,438Updated last week
- Automated Evaluation of RAG Systems☆579Updated 3 weeks ago
- ☆360Updated last year
- Generate textbook-quality synthetic LLM pretraining data☆498Updated last year
- Automatically evaluate your LLMs in Google Colab☆615Updated 11 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆911Updated last month
- Manage scalable open LLM inference endpoints in Slurm clusters☆254Updated 9 months ago
- Guideline following Large Language Model for Information Extraction☆365Updated 5 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆422Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆692Updated last year
- All-in-one text de-duplication☆669Updated 11 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆681Updated last month
- [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627☆480Updated 6 months ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆509Updated 9 months ago
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.☆267Updated last month
- Let's build better datasets, together!☆259Updated 4 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆833Updated last week
- A bagel, with everything.☆319Updated last year
- Fine-Tuning Embedding for RAG with Synthetic Data☆494Updated last year
- Notebooks for training universal 0-shot classifiers on many different tasks☆124Updated 3 months ago