Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
☆26Sep 10, 2024Updated last year
Alternatives and similar repositories for LoL-RL
Users that are interested in LoL-RL are comparing it to the libraries listed below
Sorting:
- All-in-one repository for Fine-tuning & Pretraining (Large) Language Models☆15Mar 8, 2023Updated 2 years ago
- Official implementation of the algorithmic approach presented in the research paper entitled "Risk-Sensitive Policy with Distributional R…☆15Dec 19, 2022Updated 3 years ago
- A python library to find differences between audio and transcriptions☆19Nov 14, 2023Updated 2 years ago
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆116Jun 28, 2025Updated 8 months ago
- ☆26May 30, 2023Updated 2 years ago
- Python code to perform risk-sensitive Reinforcement Learning with dynamic convex risk measures☆23Feb 21, 2024Updated 2 years ago
- Heavyweight Python dynamic analysis framework☆17Apr 17, 2024Updated last year
- The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)☆158Jan 6, 2023Updated 3 years ago
- ☆29Nov 30, 2021Updated 4 years ago
- This is the official PyTorch repo for "UNIREX: A Unified Learning Framework for Language Model Rationale Extraction" (ICML 2022).☆27Feb 14, 2023Updated 3 years ago
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Aug 25, 2023Updated 2 years ago
- 基于GSConv+SlimNeck的YOLOv5的消防通道占用检测系统☆10Nov 24, 2023Updated 2 years ago
- Verilog code for a low power RFID chip that will communicate with I2C sensors.☆13Apr 18, 2014Updated 11 years ago
- Multi-step AI agents powered by Gemini 2.0 and the LangGraph framework. These agents orchestrate complex workflows and enhance their reas…☆10Dec 19, 2024Updated last year
- 本文提出了一种基于多视图卷积神经网络的三维物体识别算法,以实现三维物体的准确识别。首先实现一个标准的卷积神经网络架构,该架构经过训练可以独立地识别形状的渲染视图,以实现即使从单一视图中也可以识别出一个三维形状。随后使用该三维物体多个角度的二维视图通过卷积神经网络识别的结果进…☆11May 16, 2022Updated 3 years ago
- Physical Downlink Shared Channel (PDSCH) in 5G New Radio.☆12Jan 29, 2024Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆35Aug 15, 2023Updated 2 years ago
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks☆36Oct 31, 2024Updated last year
- ☆10Apr 20, 2023Updated 2 years ago
- ☆13Nov 21, 2025Updated 3 months ago
- Some implementations from the paper robust risk aware reinforcement learning☆36Dec 15, 2021Updated 4 years ago
- EOSIO-Taurus - The Most Powerful Infrastructure for Decentralized Applications☆13Mar 29, 2024Updated last year
- Dataset2024☆11Jun 12, 2025Updated 8 months ago
- [EACL 2023] CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification☆42Apr 29, 2023Updated 2 years ago
- A JAX library for building lattice-based speech transducer models☆46Updated this week
- Robot graph navigation via carrot planning☆13Jan 28, 2026Updated last month
- Provides fully configure Visual Studio Solution for ORTools☆10Aug 30, 2019Updated 6 years ago
- Firefox and Chrome compatible extension that acts as annotation tool for websites (Named Entity Recognition)☆10Feb 17, 2019Updated 7 years ago
- This project demonstrates how Low Density Parity Check (LDPC) Code and Multiple Input Multiple Output (MIMO) can be employed in Vehicular…☆14Jan 24, 2022Updated 4 years ago
- Master Thesis☆10Jan 28, 2023Updated 3 years ago
- Code for our paper "Performance Study on a CSMA/CA-Based MAC Protocol for Multi-User MIMO Wireless LANs"☆12Aug 31, 2019Updated 6 years ago
- 使用Cordic算法函数运算,在资源受限的设备上运行(如资源较少的FPGA、嵌入式MCU),避免了浮点运算、乘法、除法,只用移位和加法函数的计算。☆11Mar 22, 2024Updated last year
- A Federated Learning Method for Real-time Emotion State Classification from Multi-modal Streaming☆11Sep 15, 2022Updated 3 years ago
- Not just a PDE toolbox. Adapt your ideas from a clean, modular code base with Femeko.☆15Feb 22, 2026Updated last week
- AI-based Resource Provisioning of IoE Services in 6G: A Deep Reinforcement Learning Approach☆12Mar 31, 2021Updated 4 years ago
- wifi☆12Jun 13, 2017Updated 8 years ago
- World Model for Natural Gas Trade☆10Feb 8, 2018Updated 8 years ago
- Prototype of Python / GeoGebra interoperability☆17Feb 7, 2026Updated 3 weeks ago