skeskinen / bert.cpp

ggml implementation of BERT

☆460

Related projects: ⓘ

bigcode-project / starcoder.cpp
C++ implementation for 💫StarCoder
☆443Updated last year
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆245Updated 7 months ago
johnsmith0031 / alpaca_lora_4bit
☆533Updated 9 months ago
NouamaneTazi / bloomz.cpp
C++ implementation for BLOOM
☆813Updated last year
ggml-org / p1
LLM-based code completion engine
☆172Updated last year
rmihaylov / falcontune
Tune any FALCON in 4-bit
☆469Updated last year
NolanoOrg / cformers
SoTA Transformers with C-backend for fast inference on your CPU.
☆311Updated 9 months ago
mbzuai-nlp / LaMini-LM
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
☆810Updated last year
RWKV / rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
☆1,403Updated last month
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆545Updated 2 months ago
monatis / clip.cpp
CLIP inference in plain C/C++ with no extra dependencies
☆433Updated last month
thomasantony / llamacpp-python
Python bindings for llama.cpp
☆199Updated last year
harrisonvanderbyl / rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…
☆304Updated 7 months ago
zphang / minimal-llama
☆453Updated 11 months ago
jondurbin / bagel
A bagel, with everything.
☆306Updated 5 months ago
kuleshov-group / llmtools
Finetuning Large Language Models on One Consumer GPU in Under 4 Bits
☆697Updated 3 months ago
jondurbin / airoboros
Customizable implementation of the self-instruct paper.
☆1,004Updated 6 months ago
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆405Updated 8 months ago
lastmile-ai / llama-retrieval-plugin
LLaMa retrieval plugin script using OpenAI's retrieval plugin
☆326Updated last year
xyzhang626 / embeddings.cpp
ggml implementation of embedding models including SentenceTransformer and BGE
☆50Updated 8 months ago
yxuansu / OpenAlpaca
OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA
☆301Updated last year
togethercomputer / redpajama.cpp
Extend the original llama.cpp repo to support redpajama model.
☆117Updated 2 weeks ago
Vahe1994 / SpQR
☆520Updated 8 months ago
abetlen / ggml-python
Python bindings for ggml
☆125Updated 2 weeks ago
PotatoSpudowski / fastLLaMa
fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backe…
☆408Updated last year
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆657Updated 5 months ago
Maknee / minigpt4.cpp
Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)
☆555Updated last year
DachengLi1 / LongChat
Official repository for LongChat and LongEval
☆505Updated 3 months ago
jquesnelle / yarn
YaRN: Efficient Context Window Extension of Large Language Models
☆1,308Updated 5 months ago
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,792Updated 7 months ago