中文自然语言处理数据集,平时做做实验的材料。欢迎补充提交合并。
☆37Dec 3, 2021Updated 4 years ago
Alternatives and similar repositories for ChineseNLPCorpus
Users that are interested in ChineseNLPCorpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A large corpus of Chinese fixed phrases and idioms scraped from a reputable educational website (30310 instances). 一个大型的中文成语及俗语语料库,内含3031…☆14Oct 29, 2021Updated 4 years ago
- Annotations and code for the EMNLP 2018 paper 'Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations'☆10Feb 20, 2023Updated 3 years ago
- Data simulation scripts for paper "Target Sound Extraction with Variable Cross-modality Clues"☆17May 19, 2023Updated 3 years ago
- ☆10Jun 5, 2021Updated 5 years ago
- 近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言☆171Mar 4, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆13Jun 10, 2023Updated 3 years ago
- A collection of tools for reading/processing the multilingual Bible corpus☆16Oct 10, 2022Updated 3 years ago
- A collection of beautiful plots, and other data visualization stuff.☆16Jan 8, 2022Updated 4 years ago
- Submission archive for the MS MARCO passage ranking leaderboard☆13Apr 21, 2023Updated 3 years ago
- A Python library for the Qieyun phonological system☆12Apr 1, 2025Updated last year
- A Multi-Granularity-Aware Aspect Learning Model for Multi-Aspect Dense Retrieval☆15Jan 2, 2024Updated 2 years ago
- 基于Roformer的文本相似度☆12Aug 2, 2021Updated 4 years ago
- Query-conditioned target sound extraction model☆30Mar 25, 2025Updated last year
- nlp codes for study☆18Mar 30, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- abstract syntax tree☆15Jun 13, 2015Updated 10 years ago
- An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners☆10Feb 23, 2024Updated 2 years ago
- The official implement of paper "Does Federated Learning Really Need Backpropagation?"☆23Feb 9, 2023Updated 3 years ago
- 中华经典文献数据集☆22Jun 29, 2023Updated 2 years ago
- CCL2019,“小牛杯”中文幽默计算任务的数据集及baseline☆25Aug 27, 2024Updated last year
- A tool/script for batch speech data enhancement with speed/volume/RIRS/MUSAN☆25Jun 28, 2020Updated 5 years ago
- Soar Agent (and SML code) that learns through situated interactive instruction in a robotic environment☆36Sep 15, 2023Updated 2 years ago
- A helper package to get information of scholarly articles from DBLP using its public API☆16May 13, 2025Updated last year
- This is a repo consisting of papers about LLMs' perception of their knowledge boundaries; Uncertainty Quantification; Honesty Alignment; …☆25Nov 25, 2025Updated 6 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆24Aug 24, 2023Updated 2 years ago
- TextHide: Tackling Data Privacy in Language Understanding Tasks☆30Apr 19, 2021Updated 5 years ago
- DocBankLoader is a dataset loader for DocBank, and can convert DocBank to the Object Detection models' format.☆24Mar 17, 2021Updated 5 years ago
- Next Crawler 是使用Playwright + Next.js + Prisma等主流技术搭建的网页数据采集器,通过可视化的UI进行配置,即可周期性的通过Playwright驱动浏览器爬取网页数据。☆48Mar 25, 2026Updated 2 months ago
- Code for the paper "Factorising Meaning and Form for Intent-Preserving Paraphrasing", Tom Hosking & Mirella Lapata (ACL 2021)☆27Nov 8, 2023Updated 2 years ago
- 一个复旦幻灯片的 Typst 主题。An unofficial Fudan slide theme for Typst.☆16Mar 19, 2024Updated 2 years ago
- [NIPS 2023] AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation☆12May 19, 2023Updated 3 years ago
- Machine Learning & Security Seminar @Purdue University☆25May 9, 2023Updated 3 years ago
- 豆瓣电影爬虫,爬取评论情况并进行分析,使用echart进行可视化☆13Apr 19, 2020Updated 6 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapte…☆17Jan 15, 2024Updated 2 years ago
- Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)☆42Oct 13, 2023Updated 2 years ago
- Collection of papers, benchmarks and newest trends in the domain of End-to-end ToDs☆14Nov 18, 2023Updated 2 years ago
- The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generat…☆31Sep 21, 2021Updated 4 years ago
- LaTeX Proposal Template for the University of Chinese Academy of Sciences☆20Oct 14, 2023Updated 2 years ago
- A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.☆37Updated this week
- MENYO-20k Corpus in "The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation" in MT Summit 2021☆14Jan 16, 2023Updated 3 years ago