huggingface / text-clustering
Easily embed, cluster and semantically label text datasets
☆534Updated last year
Alternatives and similar repositories for text-clustering
Users that are interested in text-clustering are comparing it to the libraries listed below
Sorting:
- ☆515Updated 5 months ago
- awesome synthetic (text) datasets☆281Updated 6 months ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆445Updated last week
- Generative Representational Instruction Tuning☆628Updated 2 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆204Updated last week
- Guideline following Large Language Model for Information Extraction☆371Updated 6 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆699Updated last month
- Official repository for ORPO☆452Updated 11 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,153Updated 3 weeks ago
- A bagel, with everything.☆320Updated last year
- Late Interaction Models Training & Retrieval☆328Updated this week
- ☆360Updated last year
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆421Updated last year
- Automatically evaluate your LLMs in Google Colab☆622Updated last year
- Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard☆536Updated 2 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆497Updated last year
- A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.☆773Updated 2 months ago
- [ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically d…☆301Updated last year
- Let's build better datasets, together!☆259Updated 4 months ago
- The official evaluation suite and dynamic data release for MixEval.☆239Updated 6 months ago
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.☆282Updated last week
- All-in-one text de-duplication☆674Updated 11 months ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆522Updated 10 months ago
- [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627☆484Updated 7 months ago
- Automated Evaluation of RAG Systems☆590Updated last month
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤☆1,015Updated 3 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆256Updated 10 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,516Updated last week
- data cleaning and curation for unstructured text☆328Updated 9 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,411Updated last week