SkydustZ / AEC-domain-corporaLinks
The code and dataset for the paper "Pretrained Domain-Specific Language Model for General Information Retrieval Tasks in the AEC Domain"
☆22Updated 2 years ago
Alternatives and similar repositories for AEC-domain-corpora
Users that are interested in AEC-domain-corpora are comparing it to the libraries listed below
Sorting:
- 雅意信息抽取大模型:在百万级人工构造的高质量信息抽取数据上进行指令微调,由中科闻歌算法团队研发。 (Repo for YAYI Unified Information Extraction Model)☆306Updated 11 months ago
- VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)☆194Updated 2 years ago
- Pytorch implementation of JointBERT: "BERT for Joint Intent Classification and Slot Filling"☆40Updated last year
- TechGPT: Technology-Oriented Generative Pretrained Transformer☆225Updated 2 years ago
- The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul ope…☆823Updated last year
- 文档方向分类☆220Updated 8 months ago
- ☆132Updated 2 years ago
- 通过浏览器渲染生成表格图像☆231Updated last year
- Generate dialog data from documents using LLM like ChatGLM2 or ChatGPT;利用ChatGLM2,ChatGPT等大模型根据文档生成对话数据集☆158Updated last year
- 基于scrapy的层次优先队列方法爬取中文维基百科,并自动抽取结构和半结构数据☆153Updated 2 years ago
- AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models☆446Updated last year
- CDLA: A Chinese document layout analysis (CDLA) dataset☆273Updated 3 years ago
- 大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning☆67Updated last year
- Unified Structure Generation for Universal Information Extraction☆932Updated 2 years ago
- Universal information extraction with instruction learning☆388Updated 4 months ago
- PaddleNLP UIE模型的PyTorch版实现☆641Updated last year
- ☆253Updated 2 years ago
- ☆29Updated 3 weeks ago
- Integrating ONgDB database into langchain ecosystem☆77Updated 2 years ago
- 本项目使用大语言模型(LLM)进行开放领域三元组抽取。☆27Updated last year
- We released BERT-wwm, a Chinese pre-training model based on Whole Word Masking technology, and models closely related to this technology.…☆61Updated 2 years ago
- 该项目是为了使用layoutlmv3针对中文图片训练和推理。 其中主要解决三个问题: 1.数据标准化成可以的训练数据集格式 2.layoutlmv3-base-chinese 分词修改 2.超过512长度的文本切分和滑窗操作☆53Updated 10 months ago
- MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Gr…☆550Updated 2 years ago
- 使用sentencepiece中BPE训练中文词表,并在transformers中进行使用。☆118Updated 2 years ago
- KgCLUE: 大规模中文开源知识图谱问答☆445Updated 3 years ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆292Updated 10 months ago
- 中文世界的NLP自动标注开源工具,简单样本,交给LabelFast。☆74Updated 6 months ago
- ☆102Updated 2 years ago
- LLM for NER☆77Updated 11 months ago
- An Open-sourced Knowledgable Large Language Model Framework.☆1,330Updated 6 months ago