wjytt / bug-free-pancake
☆10Updated 3 months ago
Alternatives and similar repositories for bug-free-pancake:
Users that are interested in bug-free-pancake are comparing it to the libraries listed below
- ☆10Updated 3 months ago
- ☆10Updated 3 months ago
- ☆10Updated 3 months ago
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆263Updated this week
- Python bindings for whisper.cpp☆242Updated last week
- Inference Llama 2 in one file of pure Python☆415Updated 6 months ago
- Port of Facebook's LLaMA model in C/C++☆11Updated this week
- Stop messing around with finicky sampling parameters and just use DRµGS!☆348Updated 10 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆231Updated 5 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆278Updated this week
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆331Updated 10 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆786Updated last week
- ☆531Updated 5 months ago
- C++ implementation for 💫StarCoder☆453Updated last year
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆345Updated 8 months ago
- whisper.cpp bindings for python☆94Updated last year
- llama-cpp-python-exploit☆15Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆235Updated 10 months ago
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆922Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 6 months ago
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆555Updated 9 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆709Updated last year
- CI scripts designed to build a Pascal-compatible version of vLLM.☆12Updated 8 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆232Updated last year
- Full finetuning of large language models without large memory requirements☆94Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers☆422Updated last year
- Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".☆785Updated 8 months ago
- Experimental BitNet Implementation☆64Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆693Updated last year
- Effort to open-source NLLB checkpoints.☆444Updated 10 months ago