swiss-ai / pretrain-dataLinks
Pretraining data reconstruction scripts for Apertus
☆93Updated last week
Alternatives and similar repositories for pretrain-data
Users that are interested in pretrain-data are comparing it to the libraries listed below
Sorting:
- ☆67Updated last year
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆145Updated 8 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆67Updated 11 months ago
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Python library to use Pleias-RAG models☆63Updated 5 months ago
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆169Updated last year
- Datamodels for hugging face tokenizers☆77Updated 3 weeks ago
- Low-Rank adapter extraction for fine-tuned transformers models☆177Updated last year
- ☆136Updated 2 months ago
- ☆49Updated 8 months ago
- ☆136Updated last year
- Lightweight tools for quick and easy LLM demo's☆28Updated last year
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆46Updated last year
- ☆63Updated last year
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆42Updated last week
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆41Updated last year
- GPTQ and efficient search for GGUF☆51Updated last month
- lossily compress representation vectors using product quantization☆59Updated 6 months ago
- Clue inspired puzzles for testing LLM deduction abilities☆44Updated 6 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆66Updated last year
- ☆57Updated 2 weeks ago
- Chat Markup Language conversation library☆55Updated last year
- ☆62Updated 3 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆111Updated 6 months ago
- ☆55Updated 11 months ago
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated last month
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated last year
- Source code for the collaborative reasoner research project at Meta FAIR.☆102Updated 6 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆96Updated last week
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆87Updated last month