Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This repository contains the code for the experiments in the paper.
☆60Oct 31, 2024Updated last year
Alternatives and similar repositories for neuzip
Users that are interested in neuzip are comparing it to the libraries listed below
Sorting:
- The source code for running LLMs on the AAAR-1.0 benchmark.☆18Apr 5, 2025Updated 10 months ago
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆35Mar 6, 2025Updated 11 months ago
- Конспекты лекций магистратуры "Науки о данных" МФТИ☆24Dec 7, 2024Updated last year
- ☆20Jun 11, 2025Updated 8 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆51Aug 24, 2025Updated 6 months ago
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆36Jun 20, 2025Updated 8 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Jul 24, 2024Updated last year
- Pokedex for LLMs☆14Apr 14, 2025Updated 10 months ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆150Nov 3, 2025Updated 4 months ago
- Promptsage is an LLM prompt builder, linter and sanitizer with built-in guardrails☆21Mar 25, 2024Updated last year
- Work in progress.☆79Nov 25, 2025Updated 3 months ago
- Some tools for neuronal image analysis☆41Feb 6, 2026Updated 3 weeks ago
- ☆21Nov 21, 2024Updated last year
- MEXMA: Token-level objectives improve sentence representations☆43Jan 6, 2025Updated last year
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"☆29Jun 3, 2025Updated 9 months ago
- ☆63Jul 10, 2025Updated 7 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆42Jun 10, 2025Updated 8 months ago
- ☆43Jul 10, 2024Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- PyTorch implementation of models from the Zamba2 series.☆187Jan 23, 2025Updated last year
- ☆20Aug 19, 2024Updated last year
- Implementation of BitNet-1.58 instruct tuning☆27Apr 14, 2024Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆360Feb 5, 2026Updated 3 weeks ago
- GRadient-INformed MoE☆264Sep 25, 2024Updated last year
- ☆19Jan 3, 2025Updated last year
- ☆57Aug 16, 2025Updated 6 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆372Dec 12, 2024Updated last year
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engine☆26Dec 20, 2024Updated last year
- ☆29Apr 10, 2025Updated 10 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆156Apr 7, 2025Updated 10 months ago
- ☆122Feb 4, 2026Updated last month
- Nearly Inference Free Embeddings: make your RAG queries 500x faster☆70Feb 20, 2026Updated last week
- ☆102Oct 2, 2024Updated last year
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆524Feb 10, 2025Updated last year
- Official implementation of "Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance" (NeurIPS 2024)☆307Sep 12, 2025Updated 5 months ago
- [CVPR 2025] Official implementation of the paper "Generative Inbetweening through Frame-wise Conditions-Driven Video Generation"☆115Feb 27, 2025Updated last year
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- This repository is about implementing The Personality Cores Conversation System originally developed by Aperture Science, Inc. in the Por…☆24May 5, 2024Updated last year
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Nov 6, 2023Updated 2 years ago