OpenLLM-France / Lit-ClaireLinks

Continual pretraining of foundation LLM using ⚡ Lightning Fabric

☆36

Alternatives and similar repositories for Lit-Claire

Users that are interested in Lit-Claire are comparing it to the libraries listed below

Sorting:

betagouv / ComparIA
Interroger à l'aveugle deux modèles de langage conversationnels sur des tâches exprimées en français et comparer les résultats.
☆37Updated this week
german-asr / kaldi-german
Scripts for training Kaldi for German speech recognition (ASR).
☆24Updated 4 years ago
asappresearch / sew
☆76Updated 3 years ago
nateraw / hf-hub-lightning
A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️
☆36Updated 3 years ago
Softcatala / nmt-models
Softcatalà neural translation models
☆18Updated 5 months ago
coqui-ai / inference-engine
Coqui Inference Engine
☆40Updated 3 years ago
huggingface / AIEnergyScore
AI Energy Score: Initiative to establish comparable energy efficiency ratings for AI models.
☆31Updated 2 months ago
commoncrawl / web-languages
Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ …
☆45Updated this week
stephantul / unitoken
Tokenization across languages. Useful as preprocessing for subword tokenization.
☆22Updated 2 years ago
opening-up-chatgpt / opening-up-chatgpt.github.io
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Track…
☆118Updated 3 months ago
nateraw / huggingface-datasets-converter
Scripts to convert datasets from various sources to Hugging Face Datasets.
☆57Updated 2 years ago
linhd-postdata / rantanplan
Scansion tool for Spanish texts
☆12Updated last year
huggingface / hf-endpoints-emulator
Local emulator for Hugging Face Inference Endpoints customer handlers
☆26Updated last year
huggingface / hffs
**ARCHIVED** Filesystem interface to 🤗 Hub
☆58Updated 2 years ago
kyutai-labs / yomikomi
A small rust-based data loader
☆29Updated 2 weeks ago
mariosasko / datasets_sql
Execute arbitrary SQL queries on 🤗 Datasets
☆32Updated last year
tarekziade / mwcat
MediaWiki Categories Model
☆13Updated last year
qdrant / quaterion-models
The collection of bulding blocks building fine-tunable metric learning models
☆32Updated 2 months ago
jqueguiner / wav2vec2-sprint
docker for HF wav2vec2-sprint
☆13Updated 4 years ago
jackbandy / bookcorpus-datasheet
Documentation effort for the BookCorpus dataset
☆34Updated 4 years ago
besacier / ASR2022
☆56Updated 2 years ago
kpu / fasterText
Library for fast text representation and classification.
☆30Updated last year
openlanguagedata / seed
Seed Machine Translation Data
☆32Updated 7 months ago
bitextor / warc2text
Extracts plain text, language identification and more metadata from WARC records
☆22Updated 3 months ago
mozilla / distilvit
image-to-text model for PDF.js
☆41Updated 3 months ago
jumon / zac
Zero-shot Audio Classification using Whisper
☆79Updated 2 years ago
LMU-Seminar-LLMs / AutoTestGen
Automatic Test Generator
☆12Updated 3 months ago
Open-Speech-EkStep / audio-to-speech-pipeline
This will hold the data pipeline to convert raw audio data to speech which will act as input dataset for speech-to-text pipeline
☆32Updated 2 years ago
ParadigmAI / paradigm
Hassle-free ML Pipelines on Kubernetes
☆39Updated 2 years ago
loretoparisi / hf-experiments
Experiments with Hugging Face 🔬 🤗
☆44Updated 10 months ago