hscspring / llama.np
Inference Llama/Llama2 Modes in NumPy
☆18Updated 9 months ago
Related projects: ⓘ
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 3 months ago
- ☆50Updated 3 months ago
- Google TPU optimizations for transformers models☆62Updated this week
- A pipeline for LLM knowledge distillation☆68Updated last month
- Low-Rank adapter extraction for fine-tuned transformers model☆154Updated 4 months ago
- Data preparation code for Amber 7B LLM☆76Updated 4 months ago
- Small and Efficient Mathematical Reasoning LLMs☆69Updated 7 months ago
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆115Updated 5 months ago
- Set of scripts to finetune LLMs☆36Updated 5 months ago
- 1.58-bit LLaMa model☆77Updated 5 months ago
- experiments with inference on llama☆106Updated 3 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆127Updated 2 weeks ago
- Data preparation code for CrystalCoder 7B LLM☆42Updated 4 months ago
- ☆75Updated 3 weeks ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆72Updated 8 months ago
- Sakura-SOLAR-DPO: Merge, SFT, and DPO☆114Updated 8 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆118Updated last week
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆177Updated 4 months ago
- ☆73Updated 8 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆118Updated 2 weeks ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆158Updated 2 months ago
- ☆42Updated 3 weeks ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆94Updated 2 weeks ago
- QLoRA with Enhanced Multi GPU Support☆36Updated last year
- ☆59Updated last week
- QuIP quantization☆41Updated 6 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆117Updated 8 months ago
- Self-hosted LLM chatbot arena, with yourself as the only judge☆36Updated 7 months ago
- Pre-training code for CrystalCoder 7B LLM☆52Updated 4 months ago
- A quick and optimized solution to manage llama based gguf quantized models, download gguf files, retreive messege formatting, add more mo…☆11Updated 8 months ago