apple / ml-lucid-datagen
☆27Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for ml-lucid-datagen
- Unofficial implementation of AlpaGasus☆84Updated last year
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆85Updated last month
- Expert Specialized Fine-Tuning☆144Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 2 weeks ago
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)☆62Updated 11 months ago
- Reformatted Alignment☆112Updated last month
- ☆56Updated 8 months ago
- 🚢 Data Toolkit for Sailor Language Models☆81Updated 4 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆73Updated 9 months ago
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆58Updated 7 months ago
- FuseAI Project☆76Updated 2 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆194Updated last week
- ☆44Updated last month
- This is the official repository for Inheritune.☆105Updated last month
- ⏳ ChatLog: Recording and Analysing ChatGPT Across Time☆94Updated 5 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆75Updated 9 months ago
- ☆26Updated 4 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆114Updated this week
- Data preparation code for Amber 7B LLM☆82Updated 6 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆190Updated 3 weeks ago
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆105Updated 3 weeks ago
- Self-Alignment with Principle-Following Reward Models☆148Updated 8 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆46Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 8 months ago
- Reasoning by Communicating with Agents☆21Updated last month
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆51Updated this week
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆34Updated 10 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆56Updated 8 months ago
- A collection of instruction data and scripts for machine translation.☆20Updated last year