huggingface / hf_transferLinks
☆509Updated 5 months ago
Alternatives and similar repositories for hf_transfer
Users that are interested in hf_transfer are comparing it to the libraries listed below
Sorting:
- ☆572Updated last year
- Module, Model, and Tensor Serialization/Deserialization☆267Updated last month
- Official implementation of Half-Quadratic Quantization (HQQ)☆879Updated last month
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆311Updated last year
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆370Updated last year
- batched loras☆346Updated 2 years ago
- OpenAI compatible API for TensorRT LLM triton backend☆215Updated last year
- Implementation of DoRA☆301Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆201Updated last year
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆746Updated last year
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆537Updated 8 months ago
- A benchmark for emotional intelligence in large language models☆365Updated last year
- PyTorch building blocks for the OLMo ecosystem☆305Updated this week
- Inference code for Persimmon-8B☆414Updated 2 years ago
- Gemma 2 optimized for your local machine.☆376Updated last year
- Comparison of Language Model Inference Engines☆229Updated 9 months ago
- xet client tech, used in huggingface_hub☆292Updated this week
- A pytorch quantization backend for optimum☆989Updated last month
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆545Updated last month
- A repository for research on medium sized language models.☆511Updated 4 months ago
- A bagel, with everything.☆324Updated last year
- ☆560Updated 11 months ago
- scalable and robust tree-based speculative decoding algorithm☆359Updated 8 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆651Updated this week
- Inference server benchmarking tool☆113Updated last week
- Manage scalable open LLM inference endpoints in Slurm clusters☆274Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,093Updated last year
- Merge Transformers language models by use of gradient parameters.☆208Updated last year
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆158Updated last year
- FRP Fork☆175Updated 6 months ago