Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
☆96Aug 18, 2023Updated 2 years ago
Alternatives and similar repositories for Okapi
Users that are interested in Okapi are comparing it to the libraries listed below
Sorting:
- Multilingual Large Language Models Evaluation Benchmark☆132Aug 21, 2024Updated last year
- ⚡ LLaMA-2 model experiment☆12Nov 22, 2023Updated 2 years ago
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Jun 6, 2022Updated 3 years ago
- VNHSGE: Vietnamese High School Graduation Examination Dataset for Large Language Models☆29Jul 24, 2023Updated 2 years ago
- ☆21Sep 3, 2024Updated last year
- ☆78May 4, 2024Updated last year
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper he…☆27Aug 8, 2025Updated 6 months ago
- Pre-training script for BART in JAX/Flax☆38Aug 4, 2022Updated 3 years ago
- Goldfish: Monolingual language models for 350 languages.☆23Aug 25, 2024Updated last year
- ☆18Nov 25, 2022Updated 3 years ago
- ACL 2023 Dual-Alignment Pre-training for Cross-lingual Sentence Embedding☆24Aug 21, 2024Updated last year
- This is the official repository for Vista dataset - A Vietnamese multimodal dataset contains more than 700,000 samples of conversations a…☆26May 14, 2024Updated last year
- Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"☆26Jun 3, 2025Updated 8 months ago
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆340Dec 18, 2024Updated last year
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Jan 16, 2024Updated 2 years ago
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- decontamination☆26Dec 3, 2025Updated 2 months ago
- Source codes of ACL 2022-Efficient Cluster-based k-Nearest-Neighbor Machine Translation☆26Sep 30, 2022Updated 3 years ago
- ☆26Jan 28, 2024Updated 2 years ago
- BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese (INTERSPEECH 2022)☆103Jul 22, 2024Updated last year
- Showcasing various NLP Downstream tasks Training with pre-trained Language models using Pytorch Lightning☆13Aug 7, 2022Updated 3 years ago
- ☆12Nov 15, 2022Updated 3 years ago
- My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensor…☆12Mar 18, 2022Updated 3 years ago
- AI Based "Happiness Optimizer"☆12Oct 20, 2024Updated last year
- Multilingual Entity Linking model by BELA model☆12Jul 20, 2023Updated 2 years ago
- Evaluation results for Machine Translation within the BigScience project☆11May 15, 2023Updated 2 years ago
- ☆11Oct 3, 2022Updated 3 years ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆82Apr 11, 2024Updated last year
- Multicultural Proverbs and Sayings☆12Jan 11, 2025Updated last year
- Use spaCy for NLP and output to the FoLiA XML format.☆12Feb 27, 2024Updated 2 years ago
- ☆11Feb 24, 2022Updated 4 years ago
- restore tone for missing tone sentences☆13Jul 29, 2019Updated 6 years ago
- Classifying Relations by Ranking with Convolutional Neural Networks☆12May 22, 2019Updated 6 years ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- Knowledge Graph-augmented NMT☆11Sep 20, 2021Updated 4 years ago
- We finetune Bloomz-7b1-mt using LoRA with the chatdoctor-200k dataset at here https://huggingface.co/LinhDuong/doctorwithbloomz-7b1-mt an…☆30Apr 4, 2023Updated 2 years ago