From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens ...

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Wu, Tong; Shen, Junzhe; Jia, Zixia; Wang, Yuxuan; Zheng, Zilong
الموضوع:
Computation and Language cs.CL; FOS: Computer and information sciences
نوع التسجيلة:
article in journal/newspaper
report
اللغة:
unknown

معلومة اضافية
- بيانات النشر:
  arXiv
- الموضوع:
  2025
- Collection:
  DataCite Metadata Store (German National Library of Science and Technology)
- نبذة مختصرة :
  Generating ultra-long sequences with large language models (LLMs) has become increasingly crucial but remains a highly time-intensive task, particularly for sequences up to 100K tokens. While traditional speculative decoding methods exist, simply extending their generation limits fails to accelerate the process and can be detrimental. Through an in-depth analysis, we identify three major challenges hindering efficient generation: frequent model reloading, dynamic key-value (KV) management and repetitive generation. To address these issues, we introduce TOKENSWIFT, a novel framework designed to substantially accelerate the generation process of ultra-long sequences while maintaining the target model's inherent quality. Experimental results demonstrate that TOKENSWIFT achieves over 3 times speedup across models of varying scales (1.5B, 7B, 8B, 14B) and architectures (MHA, GQA). This acceleration translates to hours of time savings for ultra-long sequence generation, establishing TOKENSWIFT as a scalable and ...
- الرقم المعرف:
  10.48550/arxiv.2502.18890
- الدخول الالكتروني :
  https://dx.doi.org/10.48550/arxiv.2502.18890
  https://arxiv.org/abs/2502.18890
- Rights:
  Creative Commons Attribution Share Alike 4.0 International ; https://creativecommons.org/licenses/by-sa/4.0/legalcode ; cc-by-sa-4.0
- الرقم المعرف:
  edsbas.86222A0B

تعليقات

No Comments.

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens ...

اتصل بنا

اتبع