LLM360 / Analysis360
Open Implementations of LLM Analyses
☆102Updated 5 months ago
Alternatives and similar repositories for Analysis360:
Users that are interested in Analysis360 are comparing it to the libraries listed below
- Pre-training code for CrystalCoder 7B LLM☆54Updated 10 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 10 months ago
- Data preparation code for Amber 7B LLM☆86Updated 10 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆104Updated 6 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆117Updated last year
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- Evaluating LLMs with fewer examples☆147Updated 11 months ago
- ☆119Updated 5 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆73Updated 5 months ago
- ☆74Updated last year
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆75Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆136Updated 4 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆106Updated 4 months ago
- evol augment any dataset online☆59Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆108Updated 3 weeks ago
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆218Updated 4 months ago
- 🚢 Data Toolkit for Sailor Language Models☆87Updated last month
- CodeUltraFeedback: aligning large language models to coding preferences☆70Updated 9 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆84Updated last year
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 9 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆107Updated 9 months ago
- Codebase accompanying the Summary of a Haystack paper.☆75Updated 6 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆131Updated 4 months ago
- ☆60Updated 10 months ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆52Updated 4 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆298Updated last year
- Pre-training code for Amber 7B LLM☆165Updated 10 months ago