abhisheknair10 / Llama3.cu
Lightweight Llama 3 8B Inference Engine in CUDA C
☆36Updated this week
Alternatives and similar repositories for Llama3.cu:
Users that are interested in Llama3.cu are comparing it to the libraries listed below
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 2 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆35Updated this week
- Experiments with BitNet inference on CPU☆52Updated 9 months ago
- ☆44Updated 5 months ago
- Training hybrid models for dummies.☆16Updated 3 weeks ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆43Updated last week
- AirLLM 70B inference with single 4GB GPU☆12Updated 5 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆26Updated this week
- Fast approximate inference on a single GPU with sparsity aware offloading☆38Updated last year
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆37Updated 11 months ago
- One Line To Build Zero-Data Classifiers in Minutes☆33Updated 3 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆54Updated last week
- Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.☆11Updated this week
- A simple library for working with Hugging Face models.☆14Updated last week
- RWKV-7: Surpassing GPT☆68Updated last month
- GGML implementation of BERT model with Python bindings and quantization.☆52Updated 10 months ago
- Explore training for quantized models☆11Updated this week
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆46Updated 7 months ago
- entropix style sampling + GUI☆25Updated 2 months ago
- Implementation of Spectral State Space Models☆18Updated 10 months ago
- ☆25Updated 4 months ago
- Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of…☆48Updated 3 months ago
- Course Project for COMP4471 on RWKV☆16Updated 11 months ago
- Ensure AI-generated output follows predefined schemas without compromising creativity, speed, or context.☆16Updated last week
- Make triton easier☆42Updated 6 months ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆22Updated 6 months ago
- ☆21Updated 7 months ago
- Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆28Updated 11 months ago
- Latent Large Language Models☆17Updated 4 months ago
- Text generation in Python, as easy as possible☆47Updated this week