apple / ml-lucid-datagenLinks
☆31Updated last year
Alternatives and similar repositories for ml-lucid-datagen
Users that are interested in ml-lucid-datagen are comparing it to the libraries listed below
Sorting:
- Reformatted Alignment☆113Updated last year
- Unofficial implementation of AlpaGasus☆93Updated 2 years ago
- 🚢 Data Toolkit for Sailor Language Models☆94Updated 9 months ago
- FuseAI Project☆87Updated 10 months ago
- ☆320Updated last year
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆51Updated last year
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Updated last year
- ☆23Updated 2 years ago
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆117Updated 2 years ago
- ☆313Updated last year
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆60Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆191Updated last year
- Data preparation code for Amber 7B LLM☆93Updated last year
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Updated 2 years ago
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆230Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆170Updated last year
- Official repo for "Make Your LLM Fully Utilize the Context"☆261Updated last year
- Benchmark baseline for retrieval qa applications☆118Updated last year
- [EMNLP-2024] ⚓️ Sailor: Open Language Models for South-East Asia☆138Updated 11 months ago
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆159Updated 6 months ago
- ☆78Updated last year
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆100Updated 2 years ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Updated last year
- Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"☆102Updated last year
- ☆58Updated last year
- Experiments on speculative sampling with Llama models☆127Updated 2 years ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆152Updated last year
- Evaluating tool-augmented LLMs in conversation settings☆88Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆78Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated 11 months ago