RahulSChand / llama2.c-for-dummiesLinks
Step by step explanation/tutorial of llama2.c
☆223Updated last year
Alternatives and similar repositories for llama2.c-for-dummies
Users that are interested in llama2.c-for-dummies are comparing it to the libraries listed below
Sorting:
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆339Updated 3 months ago
- Easy and Efficient Quantization for Transformers☆198Updated last month
- Sakura-SOLAR-DPO: Merge, SFT, and DPO☆116Updated last year
- Efficient fine-tuning for ko-llm models☆182Updated last year
- Newsletter bot for 🤗 Daily Papers☆126Updated last week
- OSLO: Open Source for Large-scale Optimization☆175Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 9 months ago
- manage histories of LLM applied applications☆91Updated last year
- Inference of Mamba models in pure C☆189Updated last year
- 1-Click is all you need.☆62Updated last year
- Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.☆46Updated 3 weeks ago
- Inference Llama 2 in one file of pure C++☆83Updated 2 years ago
- Extension of Langchain for RAG. Easy benchmarking, multiple retrievals, reranker, time-aware RAG, and so on...☆281Updated last year
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆396Updated 2 months ago
- code to train a gpt-2 model to train it on tiny stories dataset according to the TinyStories paper☆39Updated last year
- llama3.np is a pure NumPy implementation for Llama 3 model.☆987Updated 3 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆277Updated last year
- evolve llm training instruction, from english instruction to any language.☆118Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆376Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated last week
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 11 months ago
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated last year
- SGLang is fast serving framework for large language models and vision language models.☆24Updated last week
- A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation.☆78Updated last month
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆95Updated last year
- ☆12Updated last year
- ☆286Updated this week
- ☆26Updated last year
- ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.☆166Updated 2 months ago
- A performance library for machine learning applications.☆184Updated last year