huggingface / hf_transferLinks
☆494Updated 3 months ago
Alternatives and similar repositories for hf_transfer
Users that are interested in hf_transfer are comparing it to the libraries listed below
Sorting:
- ☆556Updated 11 months ago
- A benchmark for emotional intelligence in large language models☆336Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆856Updated this week
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆306Updated last year
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆504Updated this week
- Gemma 2 optimized for your local machine.☆377Updated last year
- A repository for research on medium sized language models.☆510Updated 2 months ago
- batched loras☆344Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆343Updated 7 months ago
- Comparison of Language Model Inference Engines☆225Updated 7 months ago
- A bagel, with everything.☆324Updated last year
- Beyond Language Models: Byte Models are Digital World Simulators☆326Updated last year
- Implementation of DoRA☆301Updated last year
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆371Updated last year
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆574Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆250Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆209Updated last year
- Merge Transformers language models by use of gradient parameters.☆206Updated last year
- PyTorch building blocks for the OLMo ecosystem☆270Updated this week
- Inference server benchmarking tool☆90Updated 3 months ago
- Official inference library for pre-processing of Mistral models☆777Updated this week
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆739Updated 10 months ago
- Python bindings for ggml☆143Updated 11 months ago
- A pytorch quantization backend for optimum☆979Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated last year
- Reference implementation of Megalodon 7B model☆524Updated 2 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆397Updated 8 months ago
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆523Updated 6 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆268Updated last year
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆221Updated 2 months ago