AlignInc / aligner-replication
The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
☆22Updated 10 months ago
Alternatives and similar repositories for aligner-replication:
Users that are interested in aligner-replication are comparing it to the libraries listed below
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆45Updated 9 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated 5 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- ☆57Updated last month
- ☆33Updated 10 months ago
- FuseAI Project☆85Updated 2 months ago
- Exploration of automated dataset selection approaches at large scales.☆38Updated last month
- o1 Chain of Thought Examples☆33Updated 6 months ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- Lottery Ticket Adaptation☆39Updated 5 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆31Updated 8 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆47Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆53Updated last year
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- Reformatted Alignment☆115Updated 7 months ago
- accompany material for sleep time compute paper☆17Updated this week
- An Experiment on Dynamic NTK Scaling RoPE☆63Updated last year
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆68Updated last month
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 7 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆24Updated 2 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆79Updated 2 weeks ago
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆38Updated 3 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆92Updated last week
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆26Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆50Updated 4 months ago
- Toy implementation of Strawberry☆31Updated 7 months ago
- ☆89Updated 6 months ago
- Replicating O1 inference-time scaling laws☆83Updated 4 months ago