rkinas / reasoning_models_how_toLinks
This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest research, methodologies, and techniques for fine-tuning language models.
β106Updated last month
Alternatives and similar repositories for reasoning_models_how_to
Users that are interested in reasoning_models_how_to are comparing it to the libraries listed below
Sorting:
- One click templates for inferencing Language Modelsβ213Updated 3 weeks ago
- π€ Benchmark Large Language Models Reliably On Your Dataβ389Updated last week
- Build datasets using natural languageβ522Updated 3 months ago
- Generate large synthetic dataβ440Updated this week
- β262Updated 2 months ago
- Inference, Fine Tuning and many more recipes with Gemma family of modelsβ267Updated last month
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb datasetβ¦β15Updated 5 months ago
- β154Updated 4 months ago
- GRadient-INformed MoEβ264Updated 11 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ276Updated last year
- Simple examples using Argilla tools to build AIβ54Updated 9 months ago
- β134Updated last year
- Toolkit for attaching, training, saving and loading of new heads for transformer modelsβ285Updated 5 months ago
- List of resources, libraries and more for developers who would like to build with open-source machine learning off-the-shelfβ200Updated last year
- All credits go to HuggingFace's Daily AI papers (https://huggingface.co/papers) and the research community. πAudio summaries here (httpsβ¦β194Updated this week
- β86Updated 11 months ago
- Verifiers for LLM Reinforcement Learningβ75Updated 3 weeks ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS modelsβ410Updated last week
- A simple MLX implementation for pretraining LLMs on Apple Silicon.β85Updated last week
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMsβ314Updated last month
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β238Updated 10 months ago
- β134Updated last week
- β51Updated last month
- An automated tool for discovering insights from research papaer corporaβ138Updated last year
- β157Updated 2 months ago
- Automatically evaluate your LLMs in Google Colabβ655Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ232Updated 10 months ago
- β102Updated last year
- Solving data for LLMs - Create quality synthetic datasets!β151Updated 7 months ago
- Complete implementation of Llama2 with/without KV cache & inference πβ48Updated last year