huggingface / hf_transferLinks
☆515Updated 3 weeks ago
Alternatives and similar repositories for hf_transfer
Users that are interested in hf_transfer are comparing it to the libraries listed below
Sorting:
- Official implementation of Half-Quadratic Quantization (HQQ)☆884Updated last week
- ☆573Updated last year
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆368Updated last year
- Implementation of DoRA☆304Updated last year
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆311Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆202Updated last year
- Beyond Language Models: Byte Models are Digital World Simulators☆329Updated last year
- Gemma 2 optimized for your local machine.☆377Updated last year
- batched loras☆347Updated 2 years ago
- A repository for research on medium sized language models.☆515Updated 4 months ago
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆540Updated 9 months ago
- Module, Model, and Tensor Serialization/Deserialization☆272Updated 2 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆216Updated last year
- PyTorch building blocks for the OLMo ecosystem☆311Updated this week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆679Updated this week
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆750Updated last year
- A benchmark for emotional intelligence in large language models☆370Updated last year
- Official inference library for pre-processing of Mistral models☆804Updated this week
- A pytorch quantization backend for optimum☆999Updated last week
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆554Updated 2 months ago
- scalable and robust tree-based speculative decoding algorithm☆361Updated 9 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆249Updated last year
- Inference server benchmarking tool☆121Updated 3 weeks ago
- ☆258Updated last week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last year
- Reference implementation of Megalodon 7B model☆522Updated 5 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆347Updated 10 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆317Updated last month
- ☆546Updated 11 months ago