Feeling confused about super alignment? Here is a reading list
☆44Jan 9, 2024Updated 2 years ago
Alternatives and similar repositories for about-super-alignment
Users that are interested in about-super-alignment are comparing it to the libraries listed below
Sorting:
- Explore what LLMs are really leanring over SFT☆28Mar 30, 2024Updated last year
- PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach☆32Nov 6, 2023Updated 2 years ago
- Crafting Adversarial Examples for Neural Machine Translation☆10Apr 7, 2023Updated 2 years ago
- Amazon Chess in MCTS algorithms, including UCT MC-RAVE and etc..☆10Aug 5, 2018Updated 7 years ago
- Tools for formatting WMT hypothesis and test sets in XML☆27Apr 18, 2025Updated 10 months ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Langua…☆13Nov 11, 2024Updated last year
- SpyGame: An interactive multi-agent framework to evaluate intelligence with large language models :D☆15Nov 9, 2023Updated 2 years ago
- 信分基建 🚧 学术数据库☆12Mar 22, 2023Updated 2 years ago
- ☆13Oct 18, 2023Updated 2 years ago
- ☆14Jun 20, 2022Updated 3 years ago
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Jan 4, 2024Updated 2 years ago
- Let ChatGPT help you learn English in an innovative way☆14Feb 9, 2023Updated 3 years ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆73Jan 16, 2026Updated last month
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆17Feb 26, 2024Updated 2 years ago
- ☆18Jun 13, 2023Updated 2 years ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆120Dec 10, 2024Updated last year
- Teaching Models to Express Their Uncertainty in Words☆39May 26, 2022Updated 3 years ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆16Sep 15, 2023Updated 2 years ago
- LaTeX Drawing☆18Dec 22, 2025Updated 2 months ago
- WorldSense benchmark for grounded reasoning in language models☆24Nov 28, 2023Updated 2 years ago
- 大语言模型训练和服务调研☆37Aug 4, 2023Updated 2 years ago
- GAU-alpha-pytorch☆20May 11, 2022Updated 3 years ago
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…☆184Jun 8, 2025Updated 8 months ago
- OpenmindClub IA005 project:以认知的角度看阅读☆19Jul 10, 2019Updated 6 years ago
- Template for Openmind Argument Analysis☆18Jul 3, 2019Updated 6 years ago
- Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning☆47Jan 22, 2026Updated last month
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆63Mar 26, 2024Updated last year
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- MurisPro-专业的小鼠管理软件,造福广大需要动物实验的朋友☆22Dec 28, 2025Updated 2 months ago
- ☆22Sep 19, 2023Updated 2 years ago
- [NeurIPS 2022] Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings☆22Jan 30, 2023Updated 3 years ago
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆54Sep 3, 2024Updated last year
- [NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"☆41May 22, 2025Updated 9 months ago
- ☆98Sep 25, 2025Updated 5 months ago
- ☆32Jan 26, 2026Updated last month
- 此项目为「安人书院」书友们的博客聚合。☆44Feb 7, 2019Updated 7 years ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Sep 20, 2024Updated last year
- CMU Vision-Language-Autonomy Challenge - Matterport Setup☆28Mar 30, 2025Updated 11 months ago