kayvr / token-hawkLinks

WebGPU LLM inference tuned by hand

☆151

Alternatives and similar repositories for token-hawk

Users that are interested in token-hawk are comparing it to the libraries listed below

Sorting:

eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆123Updated 2 years ago
togethercomputer / redpajama.cpp
Extend the original llama.cpp repo to support redpajama model.
☆118Updated 11 months ago
ggml-org / p1
LLM-based code completion engine
☆194Updated 6 months ago
Birch-san / mpt-play
Command-line script for inferencing from models such as MPT-7B-Chat
☆100Updated 2 years ago
closedai-project / closedai
Drop in replacement for OpenAI, but with Open models.
☆152Updated 2 years ago
euclaise / SlimTrainer
Full finetuning of large language models without large memory requirements
☆94Updated last year
IntrinsicLabsAI / gbnfgen
TypeScript generator for llama.cpp Grammar directly from TypeScript interfaces
☆139Updated last year
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆246Updated last year
kolinko / effort
An implementation of bucketMul LLM inference
☆221Updated last year
NolanoOrg / cformers
SoTA Transformers with C-backend for fast inference on your CPU.
☆309Updated last year
trzy / llava-cpp-server
LLaVA server (llama.cpp).
☆181Updated last year
4dh / GRDN
GRDN.AI app for garden optimization
☆70Updated last year
lachlansneff / sparsellama
☆40Updated 2 years ago
aigoopy / llm-jeopardy
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
☆110Updated 2 years ago
lxe / wasm-gpt
Tensor library for machine learning
☆275Updated 2 years ago
FL33TW00D / laserbeak
Add local LLMs to your Web or Electron apps! Powered by Rust + WebGPU
☆102Updated 2 years ago
IntrinsicLabsAI / grammar-builder
Generates grammer files from typescript for LLM generation
☆38Updated last year
geov-ai / geov
The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…
☆121Updated 2 years ago
AlpinDale / sparsegpt-for-LLaMA
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
☆71Updated 2 years ago
bigcode-project / starcoder.cpp
C++ implementation for 💫StarCoder
☆456Updated last year
abacaj / replit-3B-inference
Run inference on replit-3B code instruct model using CPU
☆157Updated 2 years ago
lastmile-ai / llama-retrieval-plugin
LLaMa retrieval plugin script using OpenAI's retrieval plugin
☆323Updated 2 years ago
EGjoni / DRUGS
Stop messing around with finicky sampling parameters and just use DRµGS!
☆351Updated last year
skeskinen / llama-lite
Embeddings focused small version of Llama NLP model
☆103Updated 2 years ago
abetlen / ggml-python
Python bindings for ggml
☆143Updated 11 months ago
Birch-san / falcon-play
Command-line script for inferencing from models such as falcon-7b-instruct
☆75Updated 2 years ago
danielgross / ggml-k8s
Run GGML models with Kubernetes.
☆173Updated last year
tairov / QStarLearning.mojo
☆111Updated last year
FL33TW00D / embd
GPU accelerated client-side embeddings for vector search, RAG etc.
☆66Updated last year
umuthopeyildirim / DOOM-Mistral
Mistral7B playing DOOM
☆133Updated last year