The RedStone repository includes code for preparing extensive datasets used in training large language models.
☆161Apr 21, 2026Updated last month
Alternatives and similar repositories for RedStone
Users that are interested in RedStone are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Heuristic filtering framework for RefineCode☆85Mar 13, 2025Updated last year
- ☆229Oct 27, 2025Updated 6 months ago
- ☆171May 2, 2024Updated 2 years ago
- DataComp for Language Models☆1,445Sep 9, 2025Updated 8 months ago
- ☆63Jun 12, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- WanJuan3.0(“万卷·丝路”)一个作为综合性的纯文本语料库,采集了多个国家地区的网络公开信息、文献、专利等资料,数据总规模超1.2TB,Token总数超过300B,处于国际领先水平,首期开源的语料库主要由泰语、俄语、阿拉伯语、韩语和越南语5个子集构成,每个子集的数据…☆46Feb 13, 2025Updated last year
- PEACE: Empowering Geologic Map Holistic Understanding with MLLMs [Official, CVPR 2025]☆86Apr 13, 2026Updated last month
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆110Jul 15, 2025Updated 10 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆37Aug 14, 2024Updated last year
- ☆569Nov 20, 2024Updated last year
- Ongoing research project for code&math LLMs☆31Jul 4, 2025Updated 10 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆40May 31, 2025Updated 11 months ago
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆21Mar 18, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆496Mar 19, 2024Updated 2 years ago
- ☆101Feb 11, 2026Updated 3 months ago
- Official Repo for Open-Reasoner-Zero☆2,091Jun 2, 2025Updated 11 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 10 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆42Mar 23, 2023Updated 3 years ago
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- ☆52May 19, 2025Updated last year
- [EMNLP 2025] TongSearch-QR☆44Dec 4, 2025Updated 5 months ago
- ☆64Apr 9, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆52May 11, 2025Updated last year
- ☆43Nov 1, 2024Updated last year
- Advancing LLM with Diverse Coding Capabilities☆79Jul 25, 2024Updated last year
- Muon is Scalable for LLM Training☆1,480Aug 3, 2025Updated 9 months ago
- A robust web archive analytics toolkit☆140Updated this week
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 6 months ago
- Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".☆124Feb 7, 2026Updated 3 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆122Dec 10, 2024Updated last year
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.☆109Apr 4, 2025Updated last year
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆193Feb 17, 2025Updated last year
- ☆48Dec 30, 2024Updated last year
- LCA-on-the-line (ICML 2024 Oral)☆14Feb 13, 2025Updated last year
- Muon fsdp 2☆58Aug 8, 2025Updated 9 months ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆12Mar 27, 2025Updated last year
- A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.☆18Aug 16, 2022Updated 3 years ago