tairov / llama2.mojoLinks
Inference Llama 2 in one file of pure π₯
β2,118Updated last month
Alternatives and similar repositories for llama2.mojo
Users that are interested in llama2.mojo are comparing it to the libraries listed below
Sorting:
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,875Updated last year
- Inference Llama 2 in one file of pure Pythonβ422Updated last year
- A Machine Learning framework from scratch in Pure Mojo π₯β438Updated 9 months ago
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,165Updated last year
- β1,028Updated last year
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.β6,146Updated 2 months ago
- A curated list of awesome Mojo π₯ frameworks, libraries, software and resourcesβ1,039Updated this week
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language modelβ1,552Updated 7 months ago
- port of Andrjey Karpathy's llm.c to Mojoβ359Updated 3 months ago
- Run Mixtral-8x7B models in Colab or consumer desktopsβ2,325Updated last year
- Training LLMs with QLoRA + FSDPβ1,529Updated last year
- LLM powered development for VSCodeβ1,305Updated last year
- Mamba-Chat: A chat LLM based on the state-space model architecture πβ934Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flβ¦β2,499Updated last year
- β864Updated last year
- Llama 2 Everywhere (L2E)β1,520Updated 2 months ago
- Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.β685Updated last year
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,361Updated 2 months ago
- [ICML'24] Magicoder: Empowering Code Generation with OSS-Instructβ2,054Updated last year
- llama3.np is a pure NumPy implementation for Llama 3 model.β991Updated 6 months ago
- An Extensible Deep Learning Libraryβ2,285Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,113Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.β2,905Updated 2 years ago
- AICI: Prompts as (Wasm) Programsβ2,052Updated 9 months ago
- RayLLM - LLMs on Ray (Archived). Read README for more info.β1,262Updated 8 months ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,988Updated 7 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,074Updated 4 months ago
- ggml implementation of BERTβ496Updated last year
- Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and π video, up to 5x faster thanβ¦β1,191Updated 2 weeks ago
- β1,009Updated 9 months ago