DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

PyTorch December 31, 1969
Video Thumbnail

You May Also Like

AI Assistant

Loading...