henloitsjoyce / psychic-garbanzo
☆10Updated 3 months ago
Alternatives and similar repositories for psychic-garbanzo:
Users that are interested in psychic-garbanzo are comparing it to the libraries listed below
- ☆10Updated 3 months ago
- ☆10Updated 3 months ago
- ☆10Updated 3 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆791Updated this week
- Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,036Updated this week
- Large-scale LLM inference engine☆1,395Updated this week
- FlashAttention (Metal Port)☆479Updated 7 months ago
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆371Updated last year
- An OAI compatible exllamav2 API that's both lightweight and fast☆915Updated this week
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆651Updated 10 months ago
- Port of Facebook's LLaMA model in C/C++☆11Updated this week
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆1,005Updated 2 months ago
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,542Updated 5 months ago
- MobiLlama : Small Language Model tailored for edge devices☆632Updated last year
- An Open Source Toolkit For LLM Distillation☆579Updated 3 months ago
- A library for making RepE control vectors☆579Updated 3 months ago
- ☆529Updated 8 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆598Updated last year
- On-device AI across mobile, embedded and edge for PyTorch☆2,747Updated this week
- The repository for the code of the UltraFastBERT paper☆518Updated last year
- Website for hosting the Open Foundation Models Cheat Sheet.☆267Updated last week
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆628Updated 3 weeks ago
- flow-pilot is an openpilot based driver assistance system that runs on linux, windows and android powered machines.☆1,735Updated 7 months ago
- Training LLMs with QLoRA + FSDP☆1,472Updated 5 months ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆800Updated 5 months ago
- ☆543Updated 4 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆235Updated 11 months ago
- ☆713Updated last month
- OLMoE: Open Mixture-of-Experts Language Models☆716Updated last month
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆263Updated 6 months ago