[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆127Jan 14, 2025Updated last year
Alternatives and similar repositories for RethinkTinyLM
Users that are interested in RethinkTinyLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆103Jun 14, 2024Updated last year
- ☆38Feb 8, 2024Updated 2 years ago
- Pytorch code for paper: Full-Stack Filters to Build Minimum Viable CNNs☆16Sep 10, 2019Updated 6 years ago
- A repository for DenseSSMs☆90Apr 11, 2024Updated 2 years ago
- InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…☆419Aug 21, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆643Mar 4, 2024Updated 2 years ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Jun 28, 2024Updated last year
- [NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"☆237Sep 30, 2024Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Sep 20, 2024Updated last year
- Reward Model을 이용하여 언어모델의 답변을 평가하기☆29Feb 23, 2024Updated 2 years ago
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆48Jul 12, 2024Updated last year
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆34Aug 9, 2023Updated 2 years ago
- ☆59Jul 9, 2024Updated last year
- Offical code repository for PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation, EMNLP 2023☆12Dec 13, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆119Sep 26, 2024Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Oct 1, 2025Updated 7 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆139Jun 12, 2024Updated last year
- Spartan is an algorithm for training sparse neural network models. This repository accompanies the paper "Spartan Differentiable Sparsity…☆25Oct 31, 2022Updated 3 years ago
- ☆33May 26, 2024Updated last year
- ☆10Jun 19, 2023Updated 2 years ago
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆668May 10, 2025Updated 11 months ago
- a family of highly capabale yet efficient large multimodal models☆193Aug 23, 2024Updated last year
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆113May 22, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- MPI Code Generation through Domain-Specific Language Models☆15Nov 19, 2024Updated last year
- Its an open source LLM based on MOE Structure.☆58Jul 2, 2024Updated last year
- Reaching LLaMA2 Performance with 0.1M Dollars☆987Jul 23, 2024Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆282Jun 25, 2024Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆183Jul 12, 2024Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆450Oct 16, 2024Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,333Mar 6, 2025Updated last year
- ☆403Dec 12, 2024Updated last year
- Binary neural networks developed by Huawei Noah's Ark Lab☆29Feb 19, 2021Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Reference implementation of Megalodon 7B model☆525May 17, 2025Updated 11 months ago
- ☆75May 30, 2025Updated 11 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Open source intent recognition framework powered by LLMs.☆27Dec 17, 2024Updated last year
- KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models☆25Aug 24, 2024Updated last year
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆21Mar 18, 2025Updated last year
- hllama is a library which aims to provide a set of utility tools for large language models.☆10Apr 16, 2024Updated 2 years ago