☆29Nov 14, 2025Updated 6 months ago
Alternatives and similar repositories for tower-eval
Users that are interested in tower-eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Aug 23, 2024Updated last year
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆60Jun 3, 2024Updated 2 years ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆17Apr 22, 2026Updated last month
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆132Apr 23, 2026Updated last month
- Code and data for the paper "Disentangling Uncertainty in Machine Translation Evaluation", accepted at EMNLP 2022.☆23Jun 23, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Evaluation results for Machine Translation within the BigScience project☆11May 15, 2023Updated 3 years ago
- Repository for "Uncertainty-Aware Machine Translation Evaluation", accepted to Findings of EMNLP 2021.☆33Sep 22, 2021Updated 4 years ago
- ☆15Apr 12, 2021Updated 5 years ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆21Feb 14, 2025Updated last year
- Sampling-Based Minimum Bayes-Risk Decoding for Neural Machine Translation☆16Oct 14, 2022Updated 3 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆22May 24, 2023Updated 3 years ago
- A library for minimum Bayes risk (MBR) decoding☆52Nov 2, 2025Updated 7 months ago
- GEMBA — GPT Estimation Metric Based Assessment☆151Dec 15, 2025Updated 5 months ago
- load word embeddings to Torch.Tensor☆14May 12, 2016Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- We systematically studied the influencing factors when LLM generates benchmarks,By using our code, you can generate high-quality QA datas…☆20May 20, 2025Updated last year
- This repository provides open-source code for sparse continuous distributions and corresponding Fenchel-Young losses.☆15May 10, 2023Updated 3 years ago
- SocialDial: A Benchmark for Socially-Aware Dialogue Systems (SIGIR'23)☆16Aug 4, 2023Updated 2 years ago
- ☆143Apr 8, 2026Updated 2 months ago
- ☆10Aug 31, 2023Updated 2 years ago
- ☆11Aug 2, 2022Updated 3 years ago
- Repo for our AKBC-2021 paper: Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering☆11Oct 10, 2021Updated 4 years ago
- Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"☆16May 31, 2019Updated 7 years ago
- ☆10May 31, 2021Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Library for pruning experts per language pair in NLLB-200☆34Jul 7, 2023Updated 2 years ago
- State-of-the-art LLM-based translation models.☆585Apr 9, 2025Updated last year
- Author implementation of the paper "Don’t paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing"☆20Oct 5, 2020Updated 5 years ago
- Code for "End-to-End Learning of Flowchart Grounded Task-Oriented Dialogs"☆14Oct 10, 2022Updated 3 years ago
- api document for www.xt.com , www.xt.pub etc☆10Jun 17, 2022Updated 3 years ago
- Torchreid-Pip: Packaged version of Torchreid☆13Oct 16, 2022Updated 3 years ago
- TAUS Dynamic Quality Framework API☆11Sep 17, 2020Updated 5 years ago
- 🧠 Workshop Notebook and assets for the Anthropic Hackathon☆12Nov 4, 2023Updated 2 years ago
- ☆20Mar 12, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning☆15May 13, 2025Updated last year
- A large Chinese sentiment lexicon consist of 8000 words☆24Oct 31, 2012Updated 13 years ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- 🤖 Code for our EMNLP 2022 paper: "BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Dataset…☆16Oct 7, 2024Updated last year
- https://wavelandspeech.github.io/☆10Jan 12, 2024Updated 2 years ago
- [ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Pr…☆24Jun 1, 2024Updated 2 years ago
- ☆15Dec 1, 2023Updated 2 years ago