antimatter15 / reverse-engineering-gemma-3nLinks

Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model

☆244

Alternatives and similar repositories for reverse-engineering-gemma-3n

Users that are interested in reverse-engineering-gemma-3n are comparing it to the libraries listed below

Sorting:

Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 8 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆296Updated last month
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆247Updated 8 months ago
apple / ml-recurrent-drafter
☆218Updated 8 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆342Updated 10 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆90Updated 4 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆109Updated 5 months ago
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆359Updated 8 months ago
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆403Updated 10 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆201Updated last year
microsoft / LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
☆260Updated last year
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆224Updated this week
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆194Updated last month
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆341Updated 5 months ago
eqimp / hogwild_llm
Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
☆125Updated 2 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆126Updated 2 years ago
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆305Updated this week
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 11 months ago
FasterDecoding / BitDelta
☆201Updated 10 months ago
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆251Updated this week
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆155Updated 6 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆196Updated 7 months ago
Pints-AI / 1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
☆330Updated 6 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆293Updated 2 months ago
ZihanWang314 / CoE
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆220Updated 3 weeks ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 8 months ago
huggingface / fineweb-2
☆195Updated 3 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆161Updated 5 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆97Updated 10 months ago