iangitonga / tinyllama.cpp

A C++ implementation of tinyllama inference on CPU.

☆11

Alternatives and similar repositories for tinyllama.cpp

Users that are interested in tinyllama.cpp are comparing it to the libraries listed below

Sorting:

jiamingkong / rwkv_reward
Training a reward model for RLHF using RWKV.
☆14Updated last year
cwhy / rwkv-decon
Trying to deconstruct RWKV in understandable terms
☆14Updated 2 years ago
RobinQu / instinct.cpp
instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…
☆48Updated 10 months ago
FishiaT / yawullm
Yet Another (LLM) Web UI, made with Gemini
☆12Updated 4 months ago
menloresearch / cortex.llamacpp
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…
☆40Updated this week
kyegomez / OpenStrawberry
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆29Updated this week
berwynhoyt / lua-server-benchmark
Compare openresty vs nginx + PUC_lua
☆16Updated last year
RWKV / rwkv-onnx
A converter and basic tester for rwkv onnx
☆42Updated last year
taylorchu / kokoro-onnx
☆19Updated 3 months ago
hscspring / llama.np
Inference Llama/Llama2/Llama3 Modes in NumPy
☆20Updated last year
the-crypt-keeper / ggml-downloader
Simple, Fast, Parallel Huggingface GGML model downloader written in python
☆24Updated last year
MaggotHATE / Llama_chat
A chat UI for Llama.cpp
☆13Updated last week
thansen0 / fastllm.cpp
A low latency, fault tolerant API for accessing LLM's written in C++ using llama.cpp.
☆10Updated last month
Verdagon / Anima
33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU
☆13Updated last year
nkypy / candle-rwkv
RWKV models and examples powered by candle.
☆18Updated 2 months ago
astramind-ai / Pulsar
The hearth of The Pulsar App, fast, secure and shared inference with modern UI
☆56Updated 5 months ago
dmatora / LLM-inference-speed-benchmarks
☆18Updated 7 months ago
FFengIll / embedding.cpp
ggml implementation of BERT Embedding
☆25Updated last year
playaswd / rwkv-by-hand-excel
This project demonstrates the computation process of the RWKV (Receptance Weighted Key Value) model through Excel spreadsheets.
☆14Updated 2 weeks ago
samuel-vitorino / lm.rs-webui
Light WebUI for lm.rs
☆23Updated 7 months ago
lukasVierling / FaceRWKV
Course Project for COMP4471 on RWKV
☆17Updated last year
deepgrove-ai / Bonsai
☆19Updated last month
tensorpro / tpu_rwkv
JAX implementations of RWKV
☆19Updated last year
philpax / ggml
Tensor library for machine learning
☆21Updated last year
jllllll / GPTQ-for-LLaMa-CUDA
A combination of Oobabooga's fork and the main cuda branch of GPTQ-for-LLaMa in a package format.
☆22Updated last year
nexusflowai / NexusBench
Nexusflow function call, tool use, and agent benchmarks.
☆19Updated 5 months ago
AXKuhta / rwkv-onnx-dml
Run ONNX RWKV-v4 models with GPU acceleration using DirectML [Windows], or just on CPU [Windows AND Linux]; Limited to 430M model at this…
☆20Updated 2 years ago
Codys12 / airllm
AirLLM 70B inference with single 4GB GPU
☆12Updated 9 months ago
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆70Updated 6 months ago
sammcj / ollama-artefacts
Build HTML artefacts with Ollama
☆11Updated 5 months ago