skeskinen / bert.cpp
ggml implementation of BERT
β460Updated 6 months ago
Related projects: β
- C++ implementation for π«StarCoderβ443Updated last year
- Falcon LLM ggml framework with CPU and GPU supportβ245Updated 7 months ago
- β533Updated 9 months ago
- C++ implementation for BLOOMβ813Updated last year
- LLM-based code completion engineβ172Updated last year
- Tune any FALCON in 4-bitβ469Updated last year
- SoTA Transformers with C-backend for fast inference on your CPU.β311Updated 9 months ago
- LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructionsβ810Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language modelβ1,403Updated last month
- Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascriptβ545Updated 2 months ago
- CLIP inference in plain C/C++ with no extra dependenciesβ433Updated last month
- Python bindings for llama.cppβ199Updated last year
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenciβ¦β304Updated 7 months ago
- β453Updated 11 months ago
- A bagel, with everything.β306Updated 5 months ago
- Finetuning Large Language Models on One Consumer GPU in Under 4 Bitsβ697Updated 3 months ago
- Customizable implementation of the self-instruct paper.β1,004Updated 6 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformersβ405Updated 8 months ago
- LLaMa retrieval plugin script using OpenAI's retrieval pluginβ326Updated last year
- ggml implementation of embedding models including SentenceTransformer and BGEβ50Updated 8 months ago
- OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMAβ301Updated last year
- Extend the original llama.cpp repo to support redpajama model.β117Updated 2 weeks ago
- β520Updated 8 months ago
- Python bindings for ggmlβ125Updated 2 weeks ago
- fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backeβ¦β408Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retrainingβ657Updated 5 months ago
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)β555Updated last year
- Official repository for LongChat and LongEvalβ505Updated 3 months ago
- YaRN: Efficient Context Window Extension of Large Language Modelsβ1,308Updated 5 months ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,792Updated 7 months ago