kayvr / token-hawk
WebGPU LLM inference tuned by hand
☆148Updated last year
Alternatives and similar repositories for token-hawk:
Users that are interested in token-hawk are comparing it to the libraries listed below
- Extend the original llama.cpp repo to support redpajama model.☆117Updated 5 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Full finetuning of large language models without large memory requirements☆93Updated last year
- Command-line script for inferencing from models such as MPT-7B-Chat☆101Updated last year
- TypeScript generator for llama.cpp Grammar directly from TypeScript interfaces☆134Updated 7 months ago
- Command-line script for inferencing from models such as falcon-7b-instruct☆76Updated last year
- LLaVA server (llama.cpp).☆177Updated last year
- Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆28Updated last year
- Python bindings for ggml☆137Updated 5 months ago
- An implementation of bucketMul LLM inference☆215Updated 7 months ago
- SoTA Transformers with C-backend for fast inference on your CPU.☆311Updated last year
- LLM-based code completion engine☆179Updated 3 weeks ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…☆121Updated last year
- ☆40Updated last year
- Add local LLMs to your Web or Electron apps! Powered by Rust + WebGPU☆102Updated last year
- Mistral7B playing DOOM☆127Updated 7 months ago
- Run inference on replit-3B code instruct model using CPU☆154Updated last year
- This repository explains and provides examples for "concept anchoring" in GPT4.☆72Updated last year
- ☆112Updated last year
- ☆152Updated 7 months ago
- Drop in replacement for OpenAI, but with Open models.☆153Updated last year
- GPT-2 small trained on phi-like data☆65Updated last year
- ☆136Updated last year
- Embeddings focused small version of Llama NLP model☆103Updated last year
- Fast parallel LLM inference for MLX☆163Updated 7 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆233Updated 8 months ago
- Modified Stanford-Alpaca Trainer for Training Replit's Code Model☆40Updated last year
- The code we currently use to fine-tune models.☆113Updated 9 months ago