BatsResearch / LexC-GenLinks
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
☆15Updated 8 months ago
Alternatives and similar repositories for LexC-Gen
Users that are interested in LexC-Gen are comparing it to the libraries listed below
Sorting:
- ☆36Updated last year
- Code associated with the paper "Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists"☆49Updated 3 years ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆58Updated last year
- A curated list of research papers and resources on Cultural LLM.☆44Updated 9 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆101Updated last year
- A curated list of awesome datasets with human label variation (un-aggregated labels) in Natural Language Processing and Computer Vision, …☆85Updated last year
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆27Updated 9 months ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆69Updated last year
- Crosslingual Reasoning through Test-Time Scaling☆18Updated last month
- ☆66Updated last year
- 🌾 Universal, customizable and deployable fine-grained evaluation for text generation.☆23Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆130Updated last year
- A Multilingual Replicable Instruction-Following Model☆93Updated 2 years ago
- A Python package to compute HONEST, a score to measure hurtful sentence completions in language models. Published at NAACL 2021.☆21Updated 2 months ago
- ☆98Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆48Updated last year
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆44Updated 10 months ago
- ☆76Updated 3 years ago
- ParaNames: A multilingual resource for parallel names☆34Updated last year
- ☆58Updated 3 years ago
- Semantically Structured Sentence Embeddings☆66Updated 8 months ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆72Updated last year
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆56Updated 2 years ago
- ☆14Updated last year
- Replication code for "With Little Power Comes Great Responsibility"☆39Updated 4 years ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆78Updated last year
- Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.☆32Updated last year
- ☆27Updated 3 weeks ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Updated 2 years ago
- A repository with several curated datasets of counter-narratives to fight online hate speech.☆89Updated 2 years ago