swiss-ai/pretrain-data

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/swiss-ai/pretrain-data)

swiss-ai / pretrain-data

Pretraining data reconstruction scripts for Apertus

☆125

Alternatives and similar repositories for pretrain-data

Users that are interested in pretrain-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fra31 / rlhf-trojan-competition-submission
View on GitHub
☆19Feb 25, 2024Updated 2 years ago
stringandstickytape / MaxsAiStudio
View on GitHub
A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.
☆36May 11, 2026Updated 2 weeks ago
arnav-gudibande / koala-test-set
View on GitHub
The test set for Koala
☆45Mar 31, 2023Updated 3 years ago
MGEdata / SteelScientist
View on GitHub
☆27Apr 15, 2025Updated last year
QingyangDong-qd220 / BandgapDatabase1
View on GitHub
Codes to generate a bandgap database using ChemDataExtractor.
☆10Jun 4, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
KrakenCode / MusicGeneration-PianoMusic
View on GitHub
AI Music Generation group project
☆12May 16, 2018Updated 8 years ago
zzbuzzard / stable-diffusion-infinite-scroll
View on GitHub
An app which uses inpainting to create an infinitely scrolling image
☆11Jun 11, 2024Updated last year
motherboardgithub / archive_tweet
View on GitHub
A Twitter bot that archives tweets on demand.
☆27Jun 24, 2018Updated 7 years ago
Sangwon91 / MOF-NET
View on GitHub
☆10Mar 25, 2023Updated 3 years ago
whittlem / trading-ai-lstm
View on GitHub
Predict stock prices using an AI LSTM model
☆10Oct 4, 2023Updated 2 years ago
gpu-pdl-nudt / GeRelion
View on GitHub
GPU-enhanced parallel implementation of single particle cryo-EM image processing
☆12Oct 2, 2017Updated 8 years ago
pcxod / olex2
View on GitHub
☆21Updated this week
maxdotio / mighty-batch
View on GitHub
Highly concurrent and fast content processing for Mighty Inference Server
☆10Feb 6, 2023Updated 3 years ago
davmacario / MDI-LLM
View on GitHub
Implementation of Model-Distributed Inference for Large Language Models, built on top of LitGPT
☆14Aug 26, 2025Updated 9 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Tongzhou-Yu / DigitalhumanProject
View on GitHub
Based on Unity and Daz3D using iPhone X
☆12Feb 1, 2021Updated 5 years ago
YonekuraLab / yoneoLocr
View on GitHub
Real-time object locator / evaluator for cryo-EM data collection
☆12Dec 7, 2024Updated last year
ArturTanona / grpo_unsloth_docker
View on GitHub
☆57Feb 10, 2025Updated last year
anpaure / cp_eval
View on GitHub
Tiny evaluation of leading LLMs on competitive programming problems
☆14Apr 10, 2026Updated last month
rwgk / sginfo
View on GitHub
SgInfo - Space Group Info
☆18Feb 13, 2022Updated 4 years ago
ra101 / Essentials-Unpackd
View on GitHub
`unpackd` is a tool for Pokémon Essentials, to extract data binaries to readable .rb and .yaml files and to combine them back, Thus makin…
☆17Feb 20, 2023Updated 3 years ago
mist475 / MCPatcherForge
View on GitHub
MCPatcher as a 1.7.10 forge mod, using mixins
☆12Jan 25, 2024Updated 2 years ago
hrtan / MoSo
View on GitHub
☆10Updated this week
stavrostheocharis / auto-streamlit-studio
View on GitHub
AutoStreamlit Studio is an intelligent assistant designed to streamline the creation of Streamlit applications. Whether you're a seasoned…
☆18Jul 1, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
croal99 / walrus-share
View on GitHub
☆11Nov 2, 2024Updated last year
EvanZhuang / knowledge_flow
View on GitHub
Official Implementation of Knowledge Flow Prompting
☆35Oct 20, 2025Updated 7 months ago
HendrikStrobelt / LMdiff
View on GitHub
A diff tool for language models
☆44Dec 28, 2023Updated 2 years ago
rodionovd / NeverGonnaGiveYouUp
View on GitHub
An OS X kernel module that protects a userland process from being terminated in any way
☆14Dec 7, 2015Updated 10 years ago
eth-lre / LLM_ICL
View on GitHub
ACL24
☆11Jun 7, 2024Updated last year
tfjgeorge / nngeometry-examples
View on GitHub
Example code for the NNGeometry PyTorch library
☆10Aug 20, 2025Updated 9 months ago
LawyerlyOrg / lawyerly
View on GitHub
Developing a legal research tool leveraging ChatGPT / GPT-4
☆14Mar 10, 2024Updated 2 years ago
musyoku / chainer-nn
View on GitHub
☆10Oct 16, 2017Updated 8 years ago
TAR-ALEX / llm-html
View on GitHub
☆20Jul 4, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
soft-matter / trackpy-examples
View on GitHub
sample images, examples, and speed tests for trackpy
☆23Jul 10, 2025Updated 10 months ago
xszheng2020 / memorization
View on GitHub
An Empirical Study of Memorization in NLP (ACL 2022)
☆13Jun 22, 2022Updated 3 years ago
CharlyRien / wakfu-autobuilder
View on GitHub
A codebase that utilizes a genetic algorithm to identify the optimal equipment combination, given a set of characteristics as input to th…
☆15May 18, 2026Updated last week
niashwin / geometry-of-consolidation
View on GitHub
NeurIPS 2026 paper: The Geometry of Consolidation — follow-up to HIDE and No-Escape.
☆108May 5, 2026Updated 3 weeks ago
kayaayberk / generative-ui-github-assistant
View on GitHub
An AI-powered GitHub search tool utilising Generative UI
☆14Jul 20, 2024Updated last year
agno-agi / personalized-agentic-rag
View on GitHub
☆13Jun 5, 2024Updated last year
huggingface / hf-rocm-kernels
View on GitHub
☆24Apr 7, 2026Updated last month