ChristopheZhao/SFT_data_generation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ChristopheZhao/SFT_data_generation)

ChristopheZhao / SFT_data_generation

Instruction Tuning data generation uses LLM in a specific scenario.

☆22

Alternatives and similar repositories for SFT_data_generation

Users that are interested in SFT_data_generation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

limenlp / safer-instruct
View on GitHub
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Feb 22, 2024Updated 2 years ago
VITA-Group / DP-OPT
View on GitHub
[ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer
☆48May 30, 2024Updated 2 years ago
kevinj22 / CNRec
View on GitHub
CNRec Data Associated with Content based News Recommendation via Shortest Entity Distance over Knowledge Graph
☆10Feb 26, 2019Updated 7 years ago
Text2TCS / Term-Extraction-With-Language-Models
View on GitHub
Extracting terms from text using XLM-R for token and sequence classification
☆15Apr 18, 2022Updated 4 years ago
magicgh / Self-MAP
View on GitHub
[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents
☆16Oct 12, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
passing2961 / Stark
View on GitHub
Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…
☆19Dec 27, 2024Updated last year
morning-hao / domain-self-instruct
View on GitHub
受到self-instruct启发,除了通用LLM还能做垂直领域的小LLM实现定制效果，通过GPT获得question和answer来作为训练数据
☆18May 12, 2023Updated 3 years ago
csxrzhang / NLPDataSet
View on GitHub
chinese NLP dataset
☆18Nov 6, 2020Updated 5 years ago
SteveTsui / IDa-Det
View on GitHub
☆12Apr 3, 2023Updated 3 years ago
ytyz1307zzh / Auto-Instruct
View on GitHub
Code repo for EMNLP 2023 paper "Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models"
☆23Nov 13, 2023Updated 2 years ago
MorningForest / BertGCN
View on GitHub
☆13Feb 16, 2023Updated 3 years ago
pillowsofwind / Course-Correction
View on GitHub
[EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"
☆20Oct 2, 2024Updated last year
yoonseok312 / GRAM
View on GitHub
Official PyTorch implementation of GRAM [NAACL 2022 Main, Oral]
☆12Jul 24, 2023Updated 3 years ago
Blue-Raincoat / SelectIT
View on GitHub
☆24Oct 14, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
RockyHHH / Safety-Evaluating
View on GitHub
本文提出了一个基于“文心一言”的中国LLMs的安全评估基准，其中包括8种典型的安全场景和6种指令攻击类型。此外，本文还提出了安全评估的框架和过程，利用手动编写和收集开源数据的测试Prompts，以及人工干预结合利用LLM强大的评估能力作为“共同评估者”。
☆35Sep 1, 2023Updated 2 years ago
HiLab-git / DCA-Net
View on GitHub
☆12May 19, 2024Updated 2 years ago
ptonlix / LangChain-Emoji
View on GitHub
AI Emoji Argue Agent 🚀 基于LangChain的开源表情包斗图Agent
☆29May 30, 2024Updated 2 years ago
bohanzhuang / Group-Net-semantic-segmentation
View on GitHub
Structured Binary Neural Networks for Image Recognition
☆16Oct 12, 2022Updated 3 years ago
lucywang720 / model-surgery
View on GitHub
☆32Feb 23, 2025Updated last year
RobvanGastel / removing-pos-vit-bias
View on GitHub
Using RASA post-training to remove positional bias from pretrained encoders like DINOv3
☆16Feb 8, 2026Updated 5 months ago
deeplearning-wisc / picle
View on GitHub
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆28Jun 27, 2024Updated 2 years ago
bingqiang2021 / AIGC-Search
View on GitHub
☆11Aug 26, 2024Updated last year
ZhangYiqun018 / Multimodel-Dialog
View on GitHub
自己阅读的多模态对话系统论文（及部分笔记）汇总
☆22Jan 5, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
tcmyxc / FocalLoss
View on GitHub
分类任务的 Focal Loss，PyTorch 实现
☆10Jun 13, 2023Updated 3 years ago
thu-coai / SafeUnlearning
View on GitHub
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
☆32Jul 9, 2024Updated 2 years ago
fenixsoft / monolithic_arch_golang
View on GitHub
Fenix's Bookstore impl by golang
☆14Jan 21, 2021Updated 5 years ago
HuCaoFighting / Event-based-Vision-for-Robotics
View on GitHub
[IEEE SPM 2020] Collect some papers about event-based Autonomous Driving & Event-based Robotic-Grasping.
☆18May 9, 2025Updated last year
andyzoujm / breaking-llama-guard
View on GitHub
Code to break Llama Guard
☆32Dec 7, 2023Updated 2 years ago
uw-nsl / safechain
View on GitHub
[ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
☆30Apr 2, 2025Updated last year
alenai97 / PEFT-MLLM
View on GitHub
Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"
☆25Nov 10, 2024Updated last year
BeyonderXX / ShadowAlignment
View on GitHub
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models
☆35Oct 19, 2023Updated 2 years ago
AylaRT / ACTER
View on GitHub
ACTER is a manually annotated dataset for term extraction, covering 3 languages (English, French, and Dutch), and 4 domains (corruption, …
☆25Apr 8, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
EasonWong0327 / QA-Systems-Hub
View on GitHub
It includes various question-answering technology sub-projects
☆25Aug 23, 2025Updated 11 months ago
1996Paul-Wen / KBERT-editedbywxx
View on GitHub
基于BERT和知识图谱的中文电子病例医学命名实体识别
☆18Jun 4, 2021Updated 5 years ago
xinliangnote / vvbot
View on GitHub
vvbot 智能微信助理，微信机器人，您的高效运营伙伴。
☆14Jul 23, 2024Updated 2 years ago
ValueCompass / Alignment-Goal-Survey
View on GitHub
☆30Feb 16, 2024Updated 2 years ago
Bauhinia-AI / evol-character
View on GitHub
Based on the Evol-character framework and OpenAI API, enabling fine-grained role-playing data generation 🎭🧩.
☆29Feb 1, 2024Updated 2 years ago
Zehui-Lin / PerceptGuide
View on GitHub
[MedIA‘25] This repository is the official implementation for "An Orchestration Learning Framework for Ultrasound Imaging: Prompt-Guided …
☆16Nov 10, 2025Updated 8 months ago
jiahe7ay / MiniCharacterLLM
View on GitHub
这是一个一键让小参数大模型进行角色扮演的项目，从数据构成和训练都包含在这项目中
☆27Mar 31, 2024Updated 2 years ago