huggingface / hf_transferLinks

☆515

Alternatives and similar repositories for hf_transfer

Users that are interested in hf_transfer are comparing it to the libraries listed below

Sorting:

dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆884Updated last week
apoorvumang / prompt-lookup-decoding
☆573Updated last year
dzhulgakov / llama-mistral
Inference code for Mistral and Mixtral hacked up into original Llama implementation
☆368Updated last year
catid / dora
Implementation of DoRA
☆304Updated last year
lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆311Updated last year
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
sanderwood / bgpt
Beyond Language Models: Byte Models are Digital World Simulators
☆329Updated last year
huggingface / local-gemma
Gemma 2 optimized for your local machine.
☆377Updated last year
sabetAI / BLoRA
batched loras
☆347Updated 2 years ago
mlfoundations / open_lm
A repository for research on medium sized language models.
☆515Updated 4 months ago
PrimeIntellect-ai / OpenDiloco
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
☆540Updated 9 months ago
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆272Updated 2 months ago
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆216Updated last year
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆311Updated this week
intel / auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
☆679Updated this week
jzhang38 / EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆750Updated last year
EQ-bench / EQ-Bench
A benchmark for emotional intelligence in large language models
☆370Updated last year
mistralai / mistral-common
Official inference library for pre-processing of Mistral models
☆804Updated this week
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆999Updated last week
LeanModels / DFloat11
DFloat11: Lossless LLM Compression for Efficient GPU Inference
☆554Updated 2 months ago
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆361Updated 9 months ago
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆249Updated last year
huggingface / inference-benchmarker
Inference server benchmarking tool
☆121Updated 3 weeks ago
run-ai / runai-model-streamer
☆258Updated last week
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated last year
XuezheMax / megalodon
Reference implementation of Megalodon 7B model
☆522Updated 5 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆347Updated 10 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆317Updated last month
huggingface / cosmopedia
☆546Updated 11 months ago