latam-gpt / llm-data-evalLinks
LLM-aided data filtering
☆12Updated 9 months ago
Alternatives and similar repositories for llm-data-eval
Users that are interested in llm-data-eval are comparing it to the libraries listed below
Sorting:
- ☆41Updated 5 months ago
- Benchmarks for Evaluating Spanish Language Models☆11Updated 2 years ago
- Efficiently find the best-suited language model (LM) for your NLP task☆128Updated 2 months ago
- Generalist and Lightweight Model for Text Classification☆161Updated 3 months ago
- Chunk your text using gpt4o-mini more accurately☆44Updated last year
- Repositorio general para Bootcamps de Data Science en Coding Dojo☆11Updated 2 weeks ago
- A CLI for generating synthetic data☆42Updated 4 months ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆116Updated 5 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆63Updated 3 weeks ago
- Plug-and-play, zero-shot document processing pipelines.☆101Updated last week
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆67Updated 7 months ago
- ☆12Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆66Updated last year
- Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ …☆56Updated last week
- Pre-train Static Word Embeddings☆85Updated 2 weeks ago
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated 2 years ago
- Datamodels for hugging face tokenizers☆76Updated this week
- Official Implementation of the 'When XGBoost Outperforms GPT-4 on Text Classification: A Case Study' NAACL-W 2024 paper☆16Updated 9 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆33Updated last week
- Robust and fast topic models with sentence-transformers.☆80Updated last week
- ☆78Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated 2 years ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆212Updated last week
- ☆49Updated 7 months ago
- Simple UI for debugging correlations of text embeddings☆291Updated 4 months ago
- German dataset for DPR model training☆19Updated last year
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated last year
- The robust European language model benchmark.☆125Updated this week
- Synthetic Text Dataset Generation for LLM projects☆41Updated last week