naba89 / custom_hf_trainerLinks
A custom Huggingface trainer which supports logging auxiliary losses returned by your model
☆13Updated 2 months ago
Alternatives and similar repositories for custom_hf_trainer
Users that are interested in custom_hf_trainer are comparing it to the libraries listed below
Sorting:
- PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models(NeurIPS 2024 Spotlight)☆355Updated last week
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆59Updated 3 months ago
- ☆201Updated 8 months ago
- Code the ICML 2024 paper: "Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models"☆11Updated last year
- A Sober Look at Language Model Reasoning☆74Updated last week
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆55Updated 5 months ago
- Direct Preference Optimization from scratch in PyTorch☆98Updated 2 months ago
- [arXiv 2025] Efficient Reasoning Models: A Survey☆184Updated this week
- ☆55Updated 6 months ago
- A collection of papers on discrete diffusion models☆145Updated 2 weeks ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆44Updated 7 months ago
- ☆44Updated last year
- An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT☆102Updated 3 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆222Updated last month
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆252Updated 2 weeks ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆141Updated 4 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆88Updated last month
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated last year
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆79Updated 4 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆112Updated last year
- ☆46Updated last year
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆41Updated 11 months ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆31Updated last year
- Implementation of CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation☆22Updated 4 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆213Updated 3 weeks ago
- [NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning☆214Updated 6 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆88Updated 8 months ago
- [SIGIR'24] The official implementation code of MOELoRA.☆168Updated 11 months ago
- Explorations into some recent techniques surrounding speculative decoding☆269Updated 6 months ago
- ☆65Updated 2 months ago