百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本
☆49Aug 27, 2023Updated 2 years ago
Alternatives and similar repositories for baichuan-Dynamic-NTK-ALiBi
Users that are interested in baichuan-Dynamic-NTK-ALiBi are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- NTK scaled version of ALiBi position encoding in Transformer.☆69Aug 16, 2023Updated 2 years ago
- This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …☆45Nov 30, 2023Updated 2 years ago
- LongQLoRA: Extent Context Length of LLMs Efficiently☆169Nov 12, 2023Updated 2 years ago
- Testing DeepSpeed integration in 🤗 Accelerate☆11Jun 28, 2022Updated 4 years ago
- An Experiment on Dynamic NTK Scaling RoPE☆65Nov 26, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆21Jul 31, 2023Updated 2 years ago
- To assess the longtext capabilities more comprehensively, we propose Needle-in-a-Haystack PLUS, which shifts the focus from simple fact r…☆13Mar 4, 2024Updated 2 years ago
- ☆27Nov 17, 2022Updated 3 years ago
- MeloTTS demo on Axera☆13Nov 18, 2025Updated 7 months ago
- Papers on summarization published in recent years☆15Nov 8, 2019Updated 6 years ago
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆253Aug 1, 2023Updated 2 years ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆41Jan 4, 2024Updated 2 years ago
- English or Chinses GPT2Dialog model from GPT2-chitchat☆12Feb 23, 2020Updated 6 years ago
- ☆62Jun 17, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Nov 11, 2024Updated last year
- ☆84Sep 9, 2023Updated 2 years ago
- ☆153Apr 16, 2024Updated 2 years ago
- A Generative Dialogue State Tracking Model☆23Jun 24, 2021Updated 5 years ago
- ICLR 2022☆18Apr 15, 2022Updated 4 years ago
- Realtime segmentation with ENet, the fast and accurate segmentation net.☆14Dec 7, 2018Updated 7 years ago
- The 1st place solution for SIGIR 2020 E-Commerce Workshop Multimodal Product Classification Challenge☆21Aug 3, 2020Updated 5 years ago
- A comparison of pretraining framework for LLM☆22Feb 6, 2025Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,729Apr 17, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Fast and low-memory attention layer written in CUDA☆20Jul 14, 2023Updated 2 years ago
- ☆28Apr 24, 2026Updated 2 months ago
- ☆30Aug 21, 2025Updated 10 months ago
- Code for "Cross-Domain Sentiment Classification with Contrastive Learning and Mutual Information Maximization" (ICASSP 2021)☆22Apr 20, 2022Updated 4 years ago
- ☆30Aug 8, 2024Updated last year
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆261Dec 16, 2024Updated last year
- Evaluation for AI apps and agent☆46Jan 18, 2024Updated 2 years ago
- 基于Pytorch + BERT的抽取式机器阅读理解☆21Dec 8, 2022Updated 3 years ago
- ☆115Jan 8, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆35Jan 9, 2024Updated 2 years ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,049Apr 14, 2024Updated 2 years ago
- smp2018用户画像技术评测☆21Jul 17, 2018Updated 7 years ago
- ☆43Dec 15, 2023Updated 2 years ago
- Counting-Stars (★)☆83Nov 24, 2025Updated 7 months ago
- Self-Controlled Memory System for LLMs☆50Apr 26, 2024Updated 2 years ago