HITsz-TMG / YiZhao
YiZhao: A 2TB Open Financial Corpus. Data and tools for generating and inspecting YiZhao, a safe, high-quality, open-source bilingual financial corpus (Chinese and English).
☆19Updated 3 months ago
Alternatives and similar repositories for YiZhao:
Users that are interested in YiZhao are comparing it to the libraries listed below
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆30Updated 10 months ago
- The demo, code and data of FollowRAG☆70Updated 3 months ago
- Open source code of the paper: "OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain"☆54Updated 3 months ago
- A Toolkit for Table-based Question Answering☆110Updated last year
- Official completion of “Training on the Benchmark Is Not All You Need”.☆30Updated 2 months ago
- ☆45Updated 9 months ago
- ☆94Updated 3 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆32Updated 3 months ago
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆44Updated 11 months ago
- ☆124Updated 3 weeks ago
- the newest version of llama3,source code explained line by line using Chinese☆22Updated 11 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆42Updated 9 months ago
- Knowledge-Reasoning Synergy Reinforcement Learning.☆31Updated 3 weeks ago
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆62Updated 8 months ago
- [ACL 2024 Findings] Learning Fine-Grained Grounded Citations for Attributed Large Language Models☆17Updated 5 months ago
- ☆51Updated 6 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆67Updated last week
- The official Github repository for paper "R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation" (EMNLP 2024 Fin…☆30Updated 3 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆229Updated last month
- ☆47Updated last month
- Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"☆41Updated 5 months ago
- Informative Conversational Query Rewriting☆27Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated 11 months ago
- ☆142Updated 8 months ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Updated 11 months ago
- The code of arxiv paper: "CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis"☆23Updated 2 months ago
- ☆36Updated 6 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆60Updated 5 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆46Updated 9 months ago