huggingface / hf_transferLinks
☆504Updated 4 months ago
Alternatives and similar repositories for hf_transfer
Users that are interested in hf_transfer are comparing it to the libraries listed below
Sorting:
- ☆561Updated last year
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆371Updated last year
- Beyond Language Models: Byte Models are Digital World Simulators☆328Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆874Updated 2 weeks ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆309Updated last year
- A repository for research on medium sized language models.☆510Updated 2 months ago
- A bagel, with everything.☆324Updated last year
- Implementation of DoRA☆301Updated last year
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆744Updated 11 months ago
- batched loras☆345Updated last year
- Official inference library for pre-processing of Mistral models☆784Updated this week
- Reference implementation of Megalodon 7B model☆524Updated 3 months ago
- A benchmark for emotional intelligence in large language models☆348Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆200Updated last year
- Inference code for Persimmon-8B☆415Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆346Updated 8 months ago
- Module, Model, and Tensor Serialization/Deserialization☆260Updated 2 weeks ago
- ☆536Updated 9 months ago
- PyTorch building blocks for the OLMo ecosystem☆277Updated last week
- OpenAI compatible API for TensorRT LLM triton backend☆214Updated last year
- Gemma 2 optimized for your local machine.☆376Updated last year
- Comparison of Language Model Inference Engines☆229Updated 8 months ago
- [ACL 2024] Progressive LLaMA with Block Expansion.☆509Updated last year
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆528Updated 7 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆291Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆270Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆161Updated 3 weeks ago
- FRP Fork☆177Updated 4 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆240Updated last year
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆223Updated 2 months ago