kayvr / token-hawk
WebGPU LLM inference tuned by hand
☆149Updated last year
Alternatives and similar repositories for token-hawk:
Users that are interested in token-hawk are comparing it to the libraries listed below
- LLaVA server (llama.cpp).☆178Updated last year
- An implementation of bucketMul LLM inference☆215Updated 8 months ago
- LLM-based code completion engine☆181Updated 2 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Full finetuning of large language models without large memory requirements☆93Updated last year
- Extend the original llama.cpp repo to support redpajama model.☆117Updated 6 months ago
- Command-line script for inferencing from models such as MPT-7B-Chat☆101Updated last year
- TypeScript generator for llama.cpp Grammar directly from TypeScript interfaces☆136Updated 8 months ago
- tinygrad port of the RWKV large language model.☆44Updated 2 weeks ago
- Tiny inference-only implementation of LLaMA☆92Updated 11 months ago
- Python bindings for ggml☆140Updated 6 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts☆111Updated last year
- SoTA Transformers with C-backend for fast inference on your CPU.☆311Updated last year
- Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆28Updated last year
- Command-line script for inferencing from models such as falcon-7b-instruct☆76Updated last year
- inference code for mixtral-8x7b-32kseqlen☆99Updated last year
- Mistral7B playing DOOM☆130Updated 8 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- Run GGML models with Kubernetes.☆174Updated last year
- Tensor library for machine learning☆278Updated last year
- ☆153Updated 2 years ago
- ☆40Updated last year
- Local ML voice chat using high-end models.☆161Updated last week
- This is our own implementation of 'Layer Selective Rank Reduction'☆233Updated 9 months ago
- Simple embedding -> text model trained on a small subset of Wikipedia sentences.☆153Updated last year
- Run inference on replit-3B code instruct model using CPU☆154Updated last year
- GPT-2 small trained on phi-like data☆65Updated last year
- GRDN.AI app for garden optimization☆70Updated last year
- Modified Stanford-Alpaca Trainer for Training Replit's Code Model☆40Updated last year