xverse-ai / XVERSE-MoE-A36BLinks

XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.

☆38

Alternatives and similar repositories for XVERSE-MoE-A36B

Users that are interested in XVERSE-MoE-A36B are comparing it to the libraries listed below

Sorting:

MetaStone-AI / MetaStone-S1
The open-source code of MetaStone-S1.
☆107Updated 2 months ago
du-nlp-lab / MLR-Copilot
☆67Updated 6 months ago
SkyworkAI / MindLink
☆97Updated 2 months ago
saxenarohit / MovieSum
☆15Updated last year
Tencent / Hunyuan-TurboS
☆87Updated 5 months ago
TIGER-AI-Lab / One-Shot-CFT
The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]
☆32Updated last month
LLM360 / k2-data-prep
☆21Updated last year
LLM360 / crystalcoder-data-prep
Data preparation code for CrystalCoder 7B LLM
☆45Updated last year
StarRing2022 / R1-Nature
最简易的R1结果在小模型上的复现，阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证，对于强推理能力，think思考过程性内容是AGI/ASI的核心。
☆45Updated 8 months ago
FreedomIntelligence / FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆41Updated last year
thunlp / APB
Official Implementation of APB (ACL 2025 main Oral)
☆31Updated 8 months ago
zhaochenyang20 / Prompt2Model-Self-Guide
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper
☆33Updated last year
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆27Updated last year
vaew / SkyScript-100M
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2
☆127Updated 11 months ago
woct0rdho / transformers-qwen3-moe-fused
Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth
☆197Updated this week
18907305772 / FuseAI
FuseAI Project
☆87Updated 9 months ago
zai-org / GLM-Edge
GLM Series Edge Models
☆149Updated 4 months ago
princeton-nlp / ELIZA-Transformer
[NAACL 2025] Representing Rule-based Chatbots with Transformers
☆22Updated 8 months ago
Tongyi-Zhiwen / QwenLong-L1
☆298Updated 4 months ago
Tencent-Hunyuan / Hunyuan-7B
Tencent Hunyuan 7B (short as Hunyuan-7B) is one of the large language dense models of Tencent Hunyuan
☆66Updated 2 months ago
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
bigai-nlco / TokenSwift
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
☆115Updated 5 months ago
RUC-NLPIR / HiRA
The code for paper: Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search
☆59Updated 3 months ago
thunlp / Delta-CoMe
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
☆57Updated 11 months ago
xverse-ai / XVERSE-MoE-A4.2B
XVERSE-MoE-A4.2B: A multilingual large language model developed by XVERSE Technology Inc.
☆39Updated last year
neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆52Updated 10 months ago
krystalan / DRT
Deep Reasoning Translation (DRT) Project
☆233Updated last month
SLIT-AI / FuseChat-3.0
☆18Updated 6 months ago
zaydzuhri / pythia-mlkv
Multi-Layer Key-Value sharing experiments on Pythia models
☆34Updated last year
fabienfrfr / tptt
😊 TPTT: Transforming Pretrained Transformers into Titans
☆29Updated last week