joey00072 / oharaView external linksLinks
Collection of autoregressive model implementation
☆85Feb 10, 2026Updated last week
Alternatives and similar repositories for ohara
Users that are interested in ohara are comparing it to the libraries listed below
Sorting:
- alternative way to calculating self attention☆18May 25, 2024Updated last year
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 8 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated last year
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆36Jun 7, 2024Updated last year
- slowly building a set of infinite riddle generators for data-hungry methods☆14Nov 15, 2022Updated 3 years ago
- Curriculum training of instruction-following LLMs with Unsloth☆14Dec 15, 2025Updated 2 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Apr 22, 2025Updated 9 months ago
- A repository for research on medium sized language models.☆77May 23, 2024Updated last year
- Experiments with BitNet inference on CPU☆55Apr 1, 2024Updated last year
- ☆82Apr 16, 2024Updated last year
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆240May 26, 2024Updated last year
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆44Feb 13, 2024Updated 2 years ago
- Fast approximate inference on a single GPU with sparsity aware offloading☆39Jan 4, 2024Updated 2 years ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆355Jul 29, 2024Updated last year
- ☆28Oct 7, 2025Updated 4 months ago
- Low-Rank adapter extraction for fine-tuned transformers models☆180May 2, 2024Updated last year
- Collection of resources for RL and Reasoning☆27Feb 3, 2025Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆233Oct 31, 2024Updated last year
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.…☆11May 16, 2024Updated last year
- [ICML-2025] We introduce Lie group Relative position Encodings (LieRE) that goes beyond RoPE in supporting n-dimensional inputs.☆14Aug 8, 2025Updated 6 months ago
- working implimention of deepseek MLA☆45Jan 8, 2025Updated last year
- Piper based VoiceDock TTS implementation☆11Aug 12, 2023Updated 2 years ago
- This repo is for CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering.☆14Mar 6, 2024Updated last year
- ☆16Jul 29, 2025Updated 6 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆190Jan 11, 2026Updated last month
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- ☆34Aug 23, 2023Updated 2 years ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆53Aug 10, 2025Updated 6 months ago
- My Gen AI research☆11Jun 3, 2024Updated last year
- ☆10Feb 3, 2025Updated last year
- Python tools☆14Oct 22, 2023Updated 2 years ago
- Applies ROME and MEMIT on Mamba-S4 models☆14Apr 5, 2024Updated last year
- ☆48Aug 29, 2024Updated last year
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆16Feb 15, 2025Updated last year
- Experimentation on google's gemma model☆16Mar 6, 2024Updated last year