CoderPat / croissant-llm-training
Repository containing the code for training the CroissantLLM
☆21Updated last year
Alternatives and similar repositories for croissant-llm-training:
Users that are interested in croissant-llm-training are comparing it to the libraries listed below
- Page de préconfiguration de la communauté OpenLLM-France☆46Updated last year
- The robust European language model benchmark.☆93Updated this week
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆124Updated 4 months ago
- Late Interaction Models Training & Retrieval☆263Updated last week
- Backend ressources for Albert. Albert is a conversational agent that uses official French data sources to answer administrative agents qu…☆120Updated 2 months ago
- Robust and fast topic models with sentence-transformers.☆48Updated last week
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆166Updated 9 months ago
- Let's build better datasets, together!☆257Updated 3 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆56Updated 8 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆67Updated 4 months ago
- A framework for few-shot evaluation of autoregressive language models.☆13Updated last year
- code for training & evaluating Contextual Document Embedding models☆176Updated 2 months ago
- A repository containing the code for translating popular LLM benchmarks to German.☆25Updated last year
- Guideline following Large Language Model for Information Extraction☆358Updated 5 months ago
- French instruction-following and chat models☆504Updated 3 months ago
- A repository of instructions in French to fine-tune LLMs☆17Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆253Updated 8 months ago
- ☆42Updated 2 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆190Updated 5 months ago
- Repository for the EM German Model☆108Updated last year
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆176Updated 2 months ago
- ☆85Updated 3 months ago
- Benchmarking library for RAG☆181Updated last week
- 🌱 EcoLogits tracks the energy consumption and environmental footprint of using generative AI models through APIs.☆141Updated 3 weeks ago
- Official implementation of "GPT or BERT: why not both?"☆49Updated 2 weeks ago
- Notebooks for training universal 0-shot classifiers on many different tasks☆120Updated 3 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆120Updated last week
- A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany☆42Updated 6 months ago
- A Scandinavian Benchmark for sentence embeddings☆36Updated last month
- ☆112Updated 6 months ago