huggingface / hf_transferLinks
☆524Updated last month
Alternatives and similar repositories for hf_transfer
Users that are interested in hf_transfer are comparing it to the libraries listed below
Sorting:
- Module, Model, and Tensor Serialization/Deserialization☆273Updated 3 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆314Updated last year
- ☆577Updated last year
- A repository for research on medium sized language models.☆520Updated 5 months ago
- batched loras☆348Updated 2 years ago
- Beyond Language Models: Byte Models are Digital World Simulators☆330Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆202Updated last year
- Official inference library for pre-processing of Mistral models☆815Updated last week
- FRP Fork☆176Updated 7 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆893Updated last month
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆370Updated last year
- PyTorch building blocks for the OLMo ecosystem☆400Updated this week
- xet client tech, used in huggingface_hub☆322Updated last week
- Gemma 2 optimized for your local machine.☆377Updated last year
- Implementation of DoRA☆307Updated last year
- A benchmark for emotional intelligence in large language models☆382Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆356Updated 11 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- Inference code for Persimmon-8B☆412Updated 2 years ago
- A bagel, with everything.☆324Updated last year
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆750Updated last year
- Python bindings for ggml☆146Updated last year
- Comparison of Language Model Inference Engines☆235Updated 11 months ago
- Merge Transformers language models by use of gradient parameters.☆209Updated last year
- Load compute kernels from the Hub☆327Updated last week
- scalable and robust tree-based speculative decoding algorithm☆362Updated 9 months ago
- Experiments on speculative sampling with Llama models☆126Updated 2 years ago
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆547Updated 10 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆293Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆277Updated last year