A place to store reusable transformer components of my own creation or found on the interwebs
☆73Feb 28, 2026Updated this week
Alternatives and similar repositories for transformer_nuggets
Users that are interested in transformer_nuggets are comparing it to the libraries listed below
Sorting:
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- (unofficial) - customized fork of DETR, optimized for intelligent obj detection on 'real world' custom datasets☆12Aug 22, 2020Updated 5 years ago
- Mixtral finetuning☆19Feb 2, 2024Updated 2 years ago
- Full finetuning of large language models without large memory requirements☆94Sep 22, 2025Updated 5 months ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- LiteGPT: A 124M Small Language Model (SLM) pre-trained on FineWeb and fine-tuned on Alpaca.☆34Dec 16, 2025Updated 2 months ago
- An unofficial jax/haiku implementation of Crystal Graph Convolutional Neural Networks (CGCNN)☆10Dec 17, 2022Updated 3 years ago
- ☆176Feb 3, 2024Updated 2 years ago
- This repository contains the experimental PyTorch native float8 training UX☆226Aug 1, 2024Updated last year
- ☆28Jan 17, 2025Updated last year
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines☆197May 6, 2024Updated last year
- extensible collectives library in triton☆96Mar 31, 2025Updated 11 months ago
- [WIP] Transformer to embed Danbooru labelsets☆13Mar 31, 2024Updated last year
- ☆93Jul 5, 2024Updated last year
- The largest VQA dataset for Vietnamese. Related to the text content in the image.☆19Apr 9, 2025Updated 10 months ago
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆27Dec 18, 2025Updated 2 months ago
- This is a short introduction of Julia language. This is in English and Japanese.☆14Nov 6, 2024Updated last year
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆184Apr 16, 2024Updated last year
- GPTQ inference Triton kernel☆321May 18, 2023Updated 2 years ago
- Monitor parameter and gradient statistics during neural network training with Chainer☆13Jan 24, 2017Updated 9 years ago
- Manage ML configuration with pydantic☆16Updated this week
- Markov Decision Processes in Python☆15Jan 3, 2019Updated 7 years ago
- CLI for Recursive Language Models☆52Jan 28, 2026Updated last month
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆161Sep 26, 2023Updated 2 years ago
- Hugging Face Jobs☆19Jul 11, 2025Updated 7 months ago
- ☆17Jul 28, 2023Updated 2 years ago
- Various transformers for FSDP research☆38Nov 11, 2022Updated 3 years ago
- Embedding Recycling for Language models☆38Jul 11, 2023Updated 2 years ago
- Use Actions to acquire those precious lambda GPUs☆19Sep 7, 2023Updated 2 years ago
- [ICLR 2026 🔥] Dr.LLM: Dynamic Layer Routing in LLMs☆41Oct 15, 2025Updated 4 months ago
- Research Paper: "Graph Contrastive Learning as a Versatile Foundation for Advanced scRNA-seq Data Analysis"☆10Nov 20, 2024Updated last year
- ☆46Apr 13, 2022Updated 3 years ago
- QLoRA with Enhanced Multi GPU Support☆38Aug 8, 2023Updated 2 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆79Jun 17, 2024Updated last year
- Implementation of MixCE method described in ACL 2023 paper by Zhang et al.☆20May 29, 2023Updated 2 years ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆83Sep 10, 2023Updated 2 years ago
- ☆45Oct 13, 2023Updated 2 years ago
- What would you do with 1000 H100s...☆1,154Jan 10, 2024Updated 2 years ago
- ☆21Oct 6, 2023Updated 2 years ago