zxytim / arithmetic-encoding-compression
☆11Updated last year
Alternatives and similar repositories for arithmetic-encoding-compression:
Users that are interested in arithmetic-encoding-compression are comparing it to the libraries listed below
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Updated 2 years ago
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- Benchmark tests supporting the TiledCUDA library.☆15Updated 4 months ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆12Updated last month
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆46Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆82Updated 2 years ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆14Updated last month
- Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.☆28Updated last year
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- ☆18Updated 10 months ago
- ☆30Updated 10 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…☆12Updated 3 months ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆12Updated 3 years ago
- LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification☆42Updated 3 weeks ago
- The repository for our paper: Neighboring Perturbations of Knowledge Editing on Large Language Models☆16Updated 10 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆14Updated 8 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆26Updated last year
- Low-Rank Llama Custom Training☆22Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆25Updated 9 months ago
- ☆19Updated 3 weeks ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆45Updated 4 months ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆22Updated 9 months ago
- Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"☆13Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆51Updated 9 months ago
- ☆27Updated last year
- ☆22Updated last year
- Beyond KV Caching: Shared Attention for Efficient LLMs☆16Updated 8 months ago
- ☆20Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆44Updated this week