离线推理
离线推理示例展示了如何在离线环境中使用 vLLM,以批量方式查询模型进行预测。我们建议从 Basic 开始。
示例
- Audio Language
- Basic
- Chat With Tools
- CPU Offload Lmcache
- Data Parallel
- Disaggregated Prefill
- Disaggregated Prefill Lmcache
- Distributed
- Eagle
- Encoder Decoder
- Encoder Decoder Multimodal
- LLM Engine Example
- LoRA With Quantization Inference
- Mistral-Small
- MLPSpeculator
- MultiLoRA Inference
- Neuron
- Neuron INT8 Quantization
- Offline Inference with the OpenAI Batch file format
- Prefix Caching
- Prithvi Geospatial Mae
- Profiling
- vLLM TPU Profiling
- Reproduciblity
- RLHF
- RLHF Colocate
- RLHF Utils
- Save Sharded State
- Simple Profiling
- Structured Outputs
- Torchrun Example
- TPU
- Vision Language
- Vision Language Embedding
- Vision Language Multi Image