alanshi / charset_mnbvcView external linksLinks
本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作
☆70Oct 17, 2025Updated 3 months ago
Alternatives and similar repositories for charset_mnbvc
Users that are interested in charset_mnbvc are comparing it to the libraries listed below
Sorting:
- 文本去重☆78May 23, 2024Updated last year
- this repo is mnbvc text quality classification using fastText☆16Oct 2, 2023Updated 2 years ago
- ☆19May 11, 2024Updated last year
- 一个兼容A1111WebuiAPI的API后端(带简易Gradio WebUI),支持调用各种在线AI绘图网站,以及更多AI功能/A backend API compatible with A1111WebuiAPI (with a simple Gradio WebUI)…☆20Dec 12, 2024Updated last year
- Python implementation of AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, w…☆49Mar 22, 2025Updated 10 months ago
- Fine-tuned BERT on SQuAd 2.0 Dataset. Applied Knowledge Distillation (KD) and fine-tuned DistilBERT (student) using BERT as the teacher m…☆26Feb 13, 2021Updated 5 years ago
- the newest version of llama3,source code explained line by line using Chinese☆22Apr 19, 2024Updated last year
- ☆10Oct 16, 2025Updated 4 months ago
- realize the reinforcement learning training for gpt2 llama bloom and so on llm model☆27Sep 19, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- [COLING 2024] CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News☆44Jan 26, 2026Updated 3 weeks ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Jan 9, 2025Updated last year
- ☆13Mar 14, 2023Updated 2 years ago
- Code used for sourcing and cleaning the BigScience ROOTS corpus☆318Mar 20, 2023Updated 2 years ago
- Man machine verification module based on ABP vNext☆15Aug 11, 2025Updated 6 months ago
- Directed masked autoencoders☆14Feb 5, 2026Updated last week
- ☆15Oct 24, 2023Updated 2 years ago
- This repo is built for showing how to generate PPT use python☆43Aug 10, 2024Updated last year
- This is some summary code and model☆40Dec 14, 2021Updated 4 years ago
- 👀 一个基于 quill 的 Vue 富文本编辑器组件,开箱即用。☆11Jan 5, 2023Updated 3 years ago
- Elevator is an open source, on-disk key-value store. Provides high-performance bulk read-write operations over very large datasets while …☆70May 14, 2014Updated 11 years ago
- Vue TypeScript Chrome Extension Template Project☆11Mar 2, 2023Updated 2 years ago
- A ES6 enabled Webpack 4 starter template with demo for making Phaser 3 plugins.☆11May 24, 2018Updated 7 years ago
- PyTorch - Albert Large V2, Bert Base Uncased, Bert Large Uncased WWM Finetuned Squad, Distil Roberta Base, Roberta Base Squad2, Roberta l…☆11Jul 10, 2020Updated 5 years ago
- ☆10Oct 17, 2021Updated 4 years ago
- Python bindings for NVIDIA CUDA APIs.☆13Mar 2, 2024Updated last year
- DF Extract Lib☆14Nov 3, 2025Updated 3 months ago
- ☆12Mar 13, 2023Updated 2 years ago
- 本项目提供了面向中文的XLNet预训练模型,旨在丰富中文自然语言处理资源,提供多元化的中文预训练模型选择。 我们欢迎各位专家学者下载使用,并共同促进和发展中文资源建设。☆11May 30, 2023Updated 2 years ago
- N8N community node manage your documents with Paperless-ngx.☆25Dec 23, 2025Updated last month
- ☆15Feb 23, 2025Updated 11 months ago
- Claude Code Proxy SSY 是一个代理服务,可以将Claude API调用转换为胜算云格式。 它允许您在支持Claude应用程序中使用胜算云全球模型API。☆12Jul 18, 2025Updated 6 months ago
- 一个为RAG系统设计的Markdown文档工具,提供标题结构自动抽取和文档分割两大功能。完整保留文档层级结构,解决传统切分器丢失标题层级与破坏表格完整性的问题。A hierarchy-preserving Markdown document splitter for RAG…☆12Jan 2, 2025Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- 这里将paddle中的ocr等模型转为onnx格式,并利用java版深度框架djl加载这些onnx模型进行推理预测尝试。☆13Nov 15, 2022Updated 3 years ago
- AI_Powered_Dev_Search_Engine☆12Mar 10, 2024Updated last year
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆20Updated this week
- Github老玩家自己搭的服务器,老飞飞原版,可联机-天马座☆11May 14, 2019Updated 6 years ago
- C# wrapper for msnhnet.☆10Aug 14, 2020Updated 5 years ago