百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本
☆49Aug 27, 2023Updated 2 years ago
Alternatives and similar repositories for baichuan-Dynamic-NTK-ALiBi
Users that are interested in baichuan-Dynamic-NTK-ALiBi are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- NTK scaled version of ALiBi position encoding in Transformer.☆69Aug 16, 2023Updated 2 years ago
- This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …☆45Nov 30, 2023Updated 2 years ago
- LongQLoRA: Extent Context Length of LLMs Efficiently☆168Nov 12, 2023Updated 2 years ago
- NLP models and codes for BAAI-JD joint project.☆10May 27, 2020Updated 5 years ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Testing DeepSpeed integration in 🤗 Accelerate☆11Jun 28, 2022Updated 3 years ago
- 基于RWKV模型的角色扮演,实际上是个改的妈都不认识的 RWKV_Role_Playing☆17Aug 17, 2023Updated 2 years ago
- A Toolkit for Fine-Tuning Large Language Models with LoRA and DeepSpeed☆11Apr 14, 2023Updated 3 years ago
- An Experiment on Dynamic NTK Scaling RoPE☆64Nov 26, 2023Updated 2 years ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆21Jul 31, 2023Updated 2 years ago
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆256Aug 1, 2023Updated 2 years ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- English or Chinses GPT2Dialog model from GPT2-chitchat☆12Feb 23, 2020Updated 6 years ago
- ☆62Jun 17, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆20May 27, 2025Updated 11 months ago
- ☆84Sep 9, 2023Updated 2 years ago
- ☆152Apr 16, 2024Updated 2 years ago
- A Generative Dialogue State Tracking Model☆23Jun 24, 2021Updated 4 years ago
- ICLR 2022☆18Apr 15, 2022Updated 4 years ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Jun 20, 2024Updated last year
- The 1st place solution for SIGIR 2020 E-Commerce Workshop Multimodal Product Classification Challenge☆21Aug 3, 2020Updated 5 years ago
- [EMNLP 2019] Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation☆30May 22, 2023Updated 2 years ago
- ☆12Sep 30, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- YaRN: Efficient Context Window Extension of Large Language Models☆1,708Apr 17, 2024Updated 2 years ago
- Fast and low-memory attention layer written in CUDA☆20Jul 14, 2023Updated 2 years ago
- ☆28Updated this week
- ☆30Aug 21, 2025Updated 8 months ago
- ☆30Aug 8, 2024Updated last year
- 数据预处理——插值法填补缺失值,并且标记填充位置☆10Apr 19, 2019Updated 7 years ago
- Offline Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits☆11Oct 21, 2024Updated last year
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆260Dec 16, 2024Updated last year
- Evaluation for AI apps and agent☆45Jan 18, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Triton for AMD MI25/50/60. Development repository for the Triton language and compiler☆32Dec 15, 2025Updated 4 months ago
- 基于baichuan-7b的开源多模态大语言模型☆72Dec 7, 2023Updated 2 years ago
- 基于Pytorch + BERT的抽取式机器阅读理解☆21Dec 8, 2022Updated 3 years ago
- 基于预训练模型的中文关键词抽取方法(论文SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model 的中文版代码)☆12May 17, 2020Updated 5 years ago
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆35Jan 9, 2024Updated 2 years ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- ☆43Dec 15, 2023Updated 2 years ago