CoderPat / croissant-llm-trainingLinks
Repository containing the code for training the CroissantLLM
☆21Updated 2 years ago
Alternatives and similar repositories for croissant-llm-training
Users that are interested in croissant-llm-training are comparing it to the libraries listed below
Sorting:
- A framework for few-shot evaluation of autoregressive language models.☆13Updated last year
- The robust European language model benchmark.☆159Updated this week
- Repository for the EM German Model☆112Updated 2 years ago
- Page de préconfiguration de la communauté OpenLLM-France☆49Updated 2 years ago
- French instruction-following and chat models☆506Updated last year
- Interroger à l'aveugle deux modèles de langage conversationnels sur des tâches exprimées en français et comparer les résultats.☆59Updated this week
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆294Updated 11 months ago
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆199Updated 8 months ago
- Let's build better datasets, together!☆269Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆64Updated last year
- My personal site☆80Updated last month
- A library for working with prompt templates locally or on the Hugging Face Hub.☆55Updated 11 months ago
- A Scandinavian Benchmark for sentence embeddings☆45Updated 2 months ago
- 🤗 Benchmark Large Language Models Reliably On Your Data☆426Updated last month
- A repository containing the code for translating popular LLM benchmarks to German.☆31Updated 2 years ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- Easily embed, cluster and semantically label text datasets☆592Updated last year
- ☆141Updated 5 months ago
- Late Interaction Models Training & Retrieval☆701Updated this week
- SpanMarker for Named Entity Recognition☆465Updated last year
- awesome synthetic (text) datasets☆321Updated last month
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆505Updated last year
- A CLI for generating synthetic data☆43Updated 8 months ago
- The website for Danish Foundation Models, a project for training foundational Danish language model.☆81Updated last month
- Manage scalable open LLM inference endpoints in Slurm clusters☆280Updated last year
- German Alpaca Dataset (Cleaned + Translated)☆26Updated 2 years ago
- ☆39Updated 2 years ago
- code for training & evaluating Contextual Document Embedding models☆202Updated 8 months ago
- A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany☆52Updated last year
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆213Updated 4 months ago