Laz4rz / RL
☆16Updated 3 months ago
Alternatives and similar repositories for RL:
Users that are interested in RL are comparing it to the libraries listed below
- ☆45Updated 3 weeks ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆62Updated this week
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 5 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated 2 months ago
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆14Updated last month
- An introduction to LLM Sampling☆77Updated 4 months ago
- One click away from a locally downloaded, fine-tuned model, hosted on hugging face, with inference built in. In two hours.☆21Updated last month
- look how they massacred my boy☆63Updated 6 months ago
- Tiny evaluation of leading LLMs on competitive programming problems☆14Updated 4 months ago
- Testing paligemma2 finetuning on reasoning dataset☆18Updated 3 months ago
- Simple GRPO scripts and configurations.☆58Updated 2 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆63Updated last month
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆96Updated last month
- ☆27Updated 9 months ago
- ☆38Updated 9 months ago
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆28Updated this week
- AI eXplainable Inference & Search. Open Sourcing on-premise, ultra-fast latency intelligence to all.☆32Updated last month
- Hub for researchers exploring VLMs and Multimodal Learning:)☆25Updated last week
- ☆66Updated 11 months ago
- Mention any three favourite things and get recommendations in the form of a flow chart by Claude Haiku.☆12Updated last year
- Smart commit messages☆18Updated 6 months ago
- 🦾💻🌐 distributed training & serverless inference at scale on RunPod☆17Updated 11 months ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Updated 6 months ago
- Perf monitoring CLI tool for Apple Silicon☆16Updated last year
- Transformers from scratch using PyTorch & NumPy.☆22Updated 2 months ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated last month
- A repository containing general tutorials I'd like to share with the world.☆38Updated this week
- A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks☆34Updated 10 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆65Updated last month
- Set of scripts to finetune LLMs☆37Updated last year