fzp0424 / MT-Ladder
[EMNLP'24] Code and data for paper "Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level"
☆14Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for MT-Ladder
- TEaR framework for paper "TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement"☆37Updated 4 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆51Updated 3 months ago
- [EMNLP 2023] ALCUNA: Large Language Models Meet New Knowledge☆25Updated last year
- Multilingual safety benchmark for Large Language Models☆24Updated 2 months ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆43Updated last year
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆58Updated 8 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆42Updated 2 months ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆102Updated 2 months ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆66Updated 2 years ago
- TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models☆59Updated 9 months ago
- Analyzing LLM Alignment via Token distribution shift☆13Updated 9 months ago
- ☆22Updated last year
- ☆74Updated last year
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆60Updated 8 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆20Updated 8 months ago
- ☆48Updated this week
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆34Updated last month
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆47Updated 4 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆54Updated 10 months ago
- ☆80Updated 2 years ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆55Updated last year
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆22Updated 3 months ago
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆22Updated 3 months ago
- [EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards☆44Updated 6 months ago
- ☆57Updated last month
- ☆26Updated 6 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆50Updated 6 months ago
- ☆27Updated last year
- ☆16Updated last week
- ☆23Updated 6 months ago