vedaldi / micro_llama
A tiny, didactical implementation of LLAMA 3
☆35Updated 3 months ago
Alternatives and similar repositories for micro_llama:
Users that are interested in micro_llama are comparing it to the libraries listed below
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆36Updated last year
- Multi-Layer Key-Value sharing experiments on Pythia models☆31Updated 9 months ago
- WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。☆13Updated 11 months ago
- ☆16Updated 2 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆46Updated 6 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆17Updated 2 weeks ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆41Updated 9 months ago
- ☆20Updated 3 weeks ago
- ☆31Updated 2 months ago
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆40Updated last month
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 9 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆36Updated 6 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Minimal RLHF implementation built on top of minGPT.☆29Updated 8 months ago
- ☆30Updated 10 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Updated 5 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆16Updated 9 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆75Updated 5 months ago
- Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍☆28Updated last year
- An object detection codebase based on MegEngine.☆28Updated 2 years ago
- ☆25Updated last month
- DPO, but faster 🚀☆40Updated 3 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆30Updated 10 months ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 4 months ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 9 months ago
- Multimodal RewardBench☆32Updated last month
- ☆15Updated last year
- differentiable top-k operator☆21Updated 3 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆130Updated 9 months ago