Thytu / SMITLinks
SMIT: A Simple Modality Integration Tool
β15Updated last year
Alternatives and similar repositories for SMIT
Users that are interested in SMIT are comparing it to the libraries listed below
Sorting:
- NLP with Rust for Python π¦πβ66Updated 6 months ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)β12Updated last year
- β22Updated 2 years ago
- Train vision models using JAX and π€ transformersβ100Updated 2 weeks ago
- β124Updated last year
- Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)β34Updated 8 months ago
- HomebrewNLP in JAX flavour for maintable TPU-Trainingβ51Updated last year
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 linesβ195Updated last year
- Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch πβ134Updated last month
- β86Updated 4 months ago
- Cost aware hyperparameter tuning algorithmβ173Updated last year
- DiffusionWithAutoscalerβ29Updated last year
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.β82Updated 2 years ago
- Chat Markup Language conversation libraryβ55Updated last year
- β100Updated 4 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.β18Updated 3 months ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.β159Updated last year
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" givenβ¦β14Updated 2 years ago
- A Jax-based library for building transformers, includes implementations of GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT and more.β297Updated last year
- β50Updated last year
- β10Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β103Updated 11 months ago
- Collection of autoregressive model implementationβ86Updated 6 months ago
- β40Updated last year
- β63Updated last year
- Automatically take good care of your preemptible TPUsβ37Updated 2 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.β66Updated last month
- Code for the paper Don't Pay Attentionβ50Updated last month
- QLoRA with Enhanced Multi GPU Supportβ37Updated 2 years ago
- β51Updated 9 months ago