henloitsjoyce / psychic-garbanzo
☆10Updated 4 months ago
Alternatives and similar repositories for psychic-garbanzo
Users that are interested in psychic-garbanzo are comparing it to the libraries listed below
Sorting:
- ☆10Updated 4 months ago
- ☆10Updated 4 months ago
- ☆10Updated 4 months ago
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆371Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆810Updated this week
- Pybind11 bindings for Whisper.cpp☆331Updated 5 months ago
- ☆543Updated 5 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆363Updated last year
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆805Updated 7 months ago
- Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".☆793Updated 8 months ago
- MobiLlama : Small Language Model tailored for edge devices☆638Updated last week
- ☆531Updated 6 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆698Updated last year
- A simple and effective LLM pruning approach.☆746Updated 9 months ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,104Updated last year
- Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"☆704Updated last year
- ☆11Updated last year
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆653Updated 11 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,248Updated 2 months ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,026Updated this week
- ☆863Updated last year
- ☆10Updated last year
- Rust full node implementation of the Fuel v2 protocol.☆57,651Updated this week
- A bagel, with everything.☆320Updated last year
- TinyChatEngine: On-Device LLM Inference Library☆848Updated 10 months ago
- Codebase for Merging Language Models (ICML 2024)☆822Updated last year
- Fine-tune mistral-7B on 3090s, a100s, h100s☆711Updated last year
- Fuel Network Rust SDK☆43,770Updated this week
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆725Updated 7 months ago
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,304Updated 5 months ago