Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
☆102Jul 9, 2024Updated last year
Alternatives and similar repositories for MiniMA
Users that are interested in MiniMA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"☆29Jul 14, 2023Updated 2 years ago
- **ASCM4ABSA** - Our code and proposed data for NLPCC 2022 paper titled "Aspect-specific Context Modeling for Aspect-based Sentiment Analy…☆12Mar 26, 2023Updated 3 years ago
- Code and data for COLING 2022 paper titled "Structural Bias For Aspect Sentiment Triplet Extraction"☆26May 28, 2023Updated 2 years ago
- The code and preprocessed data for ACL 2021 paper titled "Exploiting Position Bias for Robust Aspect Sentiment Classification"☆27Aug 5, 2021Updated 4 years ago
- Code for SIGIR 2019 paper titled "Syntax-Aware Aspect-Level Sentiment Classification with Proximity-Weighted Convolution Network"☆25Nov 21, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- Code of the COLING22 paper "uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers"☆19Aug 17, 2022Updated 3 years ago
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆60May 28, 2024Updated last year
- Reproduction of the complete process of DeepSeek-R1 on small-scale models, including Pre-training, SFT, and RL.☆29Mar 11, 2025Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆37Aug 14, 2024Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆38Jan 3, 2024Updated 2 years ago
- Unofficial implementation of AlpaGasus☆94Sep 23, 2023Updated 2 years ago
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 9 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆33Nov 29, 2023Updated 2 years ago
- Simple Model Similarities Analysis☆21Feb 3, 2024Updated 2 years ago
- distill large scale web page text☆12Jul 29, 2023Updated 2 years ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated 2 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆78Jun 17, 2024Updated last year
- A trainable user simulator☆34Jun 30, 2025Updated 10 months ago
- ☆37Oct 10, 2024Updated last year
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆392Jul 9, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆196Mar 25, 2024Updated 2 years ago
- codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification☆19Dec 10, 2021Updated 4 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆127May 7, 2024Updated 2 years ago
- ☆99Jun 27, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- OpenBA-V2: 3B LLM (Large Language Model) with T5 architecture, utilizing model pruning technique and continuing pretraining from OpenBA-1…☆25May 10, 2024Updated 2 years ago
- Count Tokens of Code (forked from gocloc)☆45Aug 19, 2024Updated last year
- A bagel, with everything.☆326Apr 11, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos.☆33Apr 18, 2026Updated last month
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆87Dec 14, 2023Updated 2 years ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆450Oct 16, 2024Updated last year
- This repository lists papers, codes, and datasets in Biomedical Text Summarisation based on PLM☆23Oct 4, 2022Updated 3 years ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- 🐜🔧 A minimalistic tool to fine-tune your LLMs☆18Aug 17, 2023Updated 2 years ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Feb 29, 2024Updated 2 years ago