nlp-uoregon / Okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
☆91Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Okapi
- Multilingual Large Language Models Evaluation Benchmark☆107Updated 3 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆97Updated 7 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- Code for Zero-Shot Tokenizer Transfer☆117Updated last month
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆66Updated 8 months ago
- A Multilingual Replicable Instruction-Following Model☆94Updated last year
- ☆95Updated last year
- ☆73Updated last year
- ☆65Updated last year
- [TMLR'23] Contrastive Search Is What You Need For Neural Text Generation☆118Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆92Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆29Updated last year
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆86Updated last year
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆160Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆78Updated 3 months ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆81Updated 3 weeks ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 8 months ago
- Scalable training for dense retrieval models.☆271Updated last year
- Finetune mistral-7b-instruct for sentence embeddings☆72Updated 6 months ago
- ☆126Updated 7 months ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆193Updated last week
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆96Updated last year
- Train Llama 2 & 3 on the SQuAD v2 task as an example of how to specialize a generalized (foundation) model.☆47Updated 5 months ago
- A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.☆70Updated 3 months ago
- VNHSGE: Vietnamese High School Graduation Examination Dataset for Large Language Models☆25Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆64Updated last month
- ☆168Updated last year
- Machine Reading Comprehension special for the Vietnamese language☆38Updated 2 years ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆56Updated 5 months ago
- Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…☆136Updated last year