pandalla / DataTager
Fine-Tune LLM Synthetic-Data application and "From Data to AGI: Unlocking the Secrets of Large Language Model"
☆16Updated 8 months ago
Alternatives and similar repositories for DataTager:
Users that are interested in DataTager are comparing it to the libraries listed below
- the newest version of llama3,source code explained line by line using Chinese☆22Updated 11 months ago
- TianGong-AI-Unstructure☆62Updated last week
- Evaluation for AI apps and agent☆36Updated last year
- Imitate OpenAI with Local Models☆88Updated 7 months ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆69Updated last year
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆64Updated 8 months ago
- ☆122Updated last year
- ☆36Updated 6 months ago
- 顾名思义:手搓的RAG☆121Updated last year
- A Toolkit for Table-based Question Answering☆110Updated last year
- ☆81Updated last year
- ☆142Updated 9 months ago
- LLM+RAG for QA☆21Updated last year
- ☆94Updated 3 months ago
- ☆37Updated 10 months ago
- ☆19Updated 3 months ago
- Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"☆74Updated 7 months ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆22Updated 8 months ago
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆59Updated 5 months ago
- Recursive Abstractive Processing for Tree-Organized Retrieval☆11Updated 10 months ago
- ☆26Updated 5 months ago
- Open replication of DeepSeek R1 for text-to-graph extraction.☆88Updated 2 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆30Updated 10 months ago
- Fast pdf translate是一款pdf翻译软件,基于MinerU实现pdf转markdown的功能,接着对markdown进行分割, 送给大模型翻译,最后组装翻译结果并由pypandoc生成结果pdf。☆12Updated last week
- ☆54Updated 5 months ago
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆28Updated 2 weeks ago
- 代码大模型 预训练&微调&DPO 数据处理 业界处理pipeline sota☆35Updated 8 months ago
- TechGPT 2.0: Technology-Oriented Generative Pretrained Transformer 2.0☆110Updated 7 months ago
- ☆53Updated 5 months ago
- 中文原生检索增强生成测评基准☆113Updated 11 months ago