telekom / wikipedia-22-12-de-dpr
German dataset for DPR model training
☆18Updated 5 months ago
Alternatives and similar repositories for wikipedia-22-12-de-dpr:
Users that are interested in wikipedia-22-12-de-dpr are comparing it to the libraries listed below
- Using short models to classify long texts☆21Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆62Updated 2 months ago
- Generalist and Lightweight Model for Text Classification☆58Updated 2 weeks ago
- Chunk your text using gpt4o-mini more accurately☆43Updated 5 months ago
- ☆34Updated 4 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆24Updated 9 months ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆26Updated 3 weeks ago
- Efficiently find the best-suited language model (LM) for your NLP task☆110Updated this week
- NLP with Rust for Python 🦀🐍☆60Updated 7 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆171Updated last week
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆58Updated 2 years ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆90Updated last month
- Pre-train Static Word Embeddings☆34Updated this week
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated last month
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.☆13Updated 4 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆49Updated 10 months ago
- Source code and data for Like a Good Nearest Neighbor☆28Updated last week
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆52Updated 3 weeks ago
- GLiNER model in a FastAPI microservice.☆34Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 6 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated last year
- Efficient few-shot learning with cross-encoders.☆42Updated 11 months ago
- a unified framework for leveraging LLMs☆63Updated this week
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated 10 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆56Updated 5 months ago
- Embedding Recycling for Language models☆38Updated last year
- A Python library aimed at dissecting and augmenting NER training data.☆57Updated last year
- PyTorch implementation for MRL☆18Updated 10 months ago
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated 9 months ago