在线服务
在线示例展示了如何在在线环境中使用 vLLM,在线环境要求实时预测。
示例
- API Client
- Helm Charts
- Cohere Rerank Client
- Disaggregated Prefill
- Gradio OpenAI Chatbot Webserver
- Gradio Webserver
- Jinaai Rerank Client
- Multi-Node-Serving
- OpenAI Chat Completion Client
- OpenAI Chat Completion Client For Multimodal
- OpenAI Chat Completion Client With Tools
- OpenAI Chat Completion Structured Outputs
- OpenAI Chat Completion Structured Outputs With Reasoning
- OpenAI Chat Completion Tool Calls With Reasoning
- OpenAI Chat Completion With Reasoning
- OpenAI Chat Completion With Reasoning Streaming
- OpenAI Chat Embedding Client For Multimodal
- OpenAI Completion Client
- OpenAI Cross Encoder Score
- OpenAI Embedding Client
- OpenAI Pooling Client
- OpenAI Transcription Client
- Setup OpenTelemetry POC
- Prometheus and Grafana
- Run Cluster
- Sagemaker-Entrypoint