catid / cuda_float_compressLinks
Python package for compressing floating-point PyTorch tensors
☆12Updated last year
Alternatives and similar repositories for cuda_float_compress
Users that are interested in cuda_float_compress are comparing it to the libraries listed below
Sorting:
- ☆52Updated last year
- Latent Large Language Models☆19Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 11 months ago
- Cerule - A Tiny Mighty Vision Model☆67Updated last week
- A collection of lightweight interpretability scripts to understand how LLMs think☆66Updated last week
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆145Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆98Updated 6 months ago
- ☆47Updated last year
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆65Updated 2 years ago
- Train, tune, and infer Bamba model☆136Updated 5 months ago
- ☆58Updated this week
- Port of Facebook's LLaMA model in C/C++☆21Updated 2 years ago
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆72Updated 9 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆138Updated 2 months ago
- Compression for Foundation Models☆35Updated 3 months ago
- ☆13Updated 2 years ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated 5 months ago
- Token Omission Via Attention☆127Updated last year
- ☆62Updated last year
- ☆28Updated last year
- new optimizer☆20Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆62Updated 2 years ago
- ☆18Updated last year
- Fork of Flame repo for training of some new stuff in development☆19Updated this week
- Utilities for Training Very Large Models☆58Updated last year
- The implementation of "Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration"☆55Updated last year
- Training hybrid models for dummies.☆29Updated 2 weeks ago
- ☆50Updated last year
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆57Updated 5 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year