LLM360/amber-data-prep

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LLM360/amber-data-prep)

LLM360 / amber-data-prep

Data preparation code for Amber 7B LLM

☆93

Alternatives and similar repositories for amber-data-prep

Users that are interested in amber-data-prep are comparing it to the libraries listed below

Sorting:

LLM360 / amber-train
View on GitHub
Pre-training code for Amber 7B LLM
☆172May 10, 2024Updated last year
LLM360 / crystalcoder-train
View on GitHub
Pre-training code for CrystalCoder 7B LLM
☆57May 10, 2024Updated last year
LLM360 / Analysis360
View on GitHub
Open Implementations of LLM Analyses
☆107Oct 8, 2024Updated last year
ctlllll / reward_collapse
View on GitHub
☆26May 30, 2023Updated 2 years ago
EhsanMashhadi / ISSRE2023-BugSeverityPrediction
View on GitHub
Code of our paper "Method-Level Bug Severity Prediction using Source Code Metrics and LLMs" which is accepted to ISSRE 2023.
☆10Nov 12, 2023Updated 2 years ago
premAI-io / serverless-examples
View on GitHub
🚀 End-to-end examples and analysis of deploying LLMs serverless using Modal, Runpod, and Beam
☆28Mar 25, 2024Updated last year
hadasah / btm
View on GitHub
☆77Apr 29, 2024Updated last year
russss / iv
View on GitHub
Terminal Image Viewer for iTerm2
☆12Jul 6, 2019Updated 6 years ago
dashends / CodeSyntax
View on GitHub
Code and dataset for EMNLP 2022 Findings paper "Benchmarking Language Models for Code Syntax Understanding"
☆16Oct 24, 2022Updated 3 years ago
google-research-datasets / QuoteSum
View on GitHub
QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …
☆13Mar 25, 2024Updated last year
ENOT-AutoDL / gpt-j-6B-tensorrt-int8
View on GitHub
GPT-J 6B inference on TensorRT with INT-8 precision
☆11Apr 5, 2023Updated 2 years ago
superdesk / superdesk-planning
View on GitHub
Planning feature for Superdesk
☆12Updated this week
miguelCalado / prompt-to-prompt-tensorflow
View on GitHub
TensorFlow implementation of the "Prompt-to-Prompt Image Editing with Cross Attention Control" for Stable Diffusion
☆16Mar 25, 2023Updated 2 years ago
LLM360 / k2-train
View on GitHub
☆52Jun 6, 2024Updated last year
facebookresearch / MobileLLM
View on GitHub
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
☆1,410Apr 21, 2025Updated 10 months ago
google-research-datasets / QAmeleon
View on GitHub
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆35Aug 15, 2023Updated 2 years ago
ruyimarone / data-portraits
View on GitHub
Documenting large text datasets 🖼️ 📚
☆14Dec 17, 2024Updated last year
myshell-ai / JetMoE
View on GitHub
Reaching LLaMA2 Performance with 0.1M Dollars
☆988Jul 23, 2024Updated last year
mireshghallah / ft-memorization
View on GitHub
☆13Oct 20, 2022Updated 3 years ago
IBM / ensemble-instruct
View on GitHub
codebase release for EMNLP2023 paper publication
☆19Sep 18, 2025Updated 5 months ago
facebookresearch / ToolVerifier
View on GitHub
This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.
☆22Mar 11, 2024Updated last year
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆29Oct 9, 2025Updated 4 months ago
haon-chen / MoCa
View on GitHub
☆67Aug 14, 2025Updated 6 months ago
WENGSYX / LMTuner
View on GitHub
LMTuner: Make the LLM Better for Everyone
☆38Sep 21, 2023Updated 2 years ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated last year
anyscale / long-context-fine-tuning-blogpost
View on GitHub
☆17Feb 19, 2024Updated 2 years ago
zhihanyang2022 / alpha-zero
View on GitHub
Minimal AlphaZero in PyTorch, trained on Connect4 on a 6x6 board.
☆21Aug 12, 2022Updated 3 years ago
euirim / goodwiki
View on GitHub
Package and scripts used to build a dataset of Wikipedia articles in Markdown.
☆20Sep 11, 2023Updated 2 years ago
huggingface / gaia
View on GitHub
Hugging Face and Pyserini interoperability
☆19May 18, 2023Updated 2 years ago
Jiahao004 / DeepTheorem
View on GitHub
☆25Jun 10, 2025Updated 8 months ago
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆81Dec 25, 2025Updated 2 months ago
ttw1018 / MoPE-DST
View on GitHub
The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"
☆19Jan 25, 2025Updated last year
qiuzh20 / EMoE
View on GitHub
Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]
☆39May 28, 2024Updated last year
adihaviv / nopos
View on GitHub
☆22Jul 27, 2023Updated 2 years ago
EleutherAI / pile_dedupe
View on GitHub
Pile Deduplication Code
☆18May 15, 2023Updated 2 years ago
googleinterns / localizing-paragraph-memorization
View on GitHub
☆15Feb 21, 2024Updated 2 years ago
hetailang / SqueezeAttention
View on GitHub
☆37Oct 10, 2024Updated last year
EffiVLM-Bench / EffiVLM-Bench
View on GitHub
☆33Jun 3, 2025Updated 8 months ago
YiteWang / NTK-SAP
View on GitHub
[ICLR2023] NTK-SAP: Improving neural network pruning by aligning training dynamics
☆20May 1, 2023Updated 2 years ago