antimatter15 / reverse-engineering-gemma-3nLinks
Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model
☆127Updated 3 weeks ago
Alternatives and similar repositories for reverse-engineering-gemma-3n
Users that are interested in reverse-engineering-gemma-3n are comparing it to the libraries listed below
Sorting:
- KV cache compression for high-throughput LLM inference☆131Updated 4 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆125Updated this week
- Load compute kernels from the Hub☆191Updated this week
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆149Updated 2 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆161Updated 3 weeks ago
- Experiments on speculative sampling with Llama models☆128Updated 2 years ago
- Nano repo for RL training of LLMs☆61Updated 2 weeks ago
- ☆130Updated 4 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆106Updated 3 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆395Updated last month
- EvaByte: Efficient Byte-level Language Models at Scale☆102Updated 2 months ago
- prime-rl is a codebase for decentralized async RL training at scale☆341Updated this week
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆163Updated last year
- The HELMET Benchmark☆154Updated 2 months ago
- patches for huggingface transformers to save memory☆23Updated 3 weeks ago
- A collection of tricks and tools to speed up transformer models☆167Updated 3 weeks ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆80Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆189Updated 3 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆141Updated 2 weeks ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆203Updated last year
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆176Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆126Updated 6 months ago
- PyTorch implementation of models from the Zamba2 series.☆182Updated 5 months ago
- A simple unified framework for evaluating LLMs☆219Updated 2 months ago
- ☆47Updated 2 weeks ago
- Async pipelined version of Verl☆100Updated 2 months ago
- ☆198Updated 6 months ago
- PyTorch building blocks for the OLMo ecosystem☆238Updated this week
- Simple and efficient pytorch-native transformer training and inference (batched)☆76Updated last year