huggingface / text-clusteringLinks
Easily embed, cluster and semantically label text datasets
☆552Updated last year
Alternatives and similar repositories for text-clustering
Users that are interested in text-clustering are comparing it to the libraries listed below
Sorting:
- ☆520Updated 7 months ago
- awesome synthetic (text) datasets☆282Updated 7 months ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆472Updated this week
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆207Updated last month
- Late Interaction Models Training & Retrieval☆452Updated 2 weeks ago
- Generative Representational Instruction Tuning☆654Updated 3 months ago
- Let's build better datasets, together!☆259Updated 6 months ago
- Guideline following Large Language Model for Information Extraction☆380Updated 7 months ago
- Official repository for ORPO☆455Updated last year
- Evaluate your LLM's response with Prometheus and GPT4 💯☆952Updated 2 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,213Updated 3 weeks ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,459Updated 3 weeks ago
- Notebooks for training universal 0-shot classifiers on many different tasks☆130Updated 5 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,641Updated this week
- ☆365Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆698Updated last year
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆280Updated 3 months ago
- Train Models Contrastively in Pytorch☆721Updated 3 months ago
- SpanMarker for Named Entity Recognition☆434Updated 5 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆425Updated last year
- A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.☆778Updated 3 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,411Updated 3 weeks ago
- Code for explaining and evaluating late chunking (chunked pooling)☆403Updated 6 months ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆314Updated 3 weeks ago
- Automated Evaluation of RAG Systems☆613Updated 2 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆501Updated last year
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆717Updated 3 months ago
- ☆455Updated last year
- Neural Search☆358Updated 3 months ago
- data cleaning and curation for unstructured text☆327Updated 10 months ago