介紹
本文將使用gateway api inference extension作為envoy的ext_proc服務端
啟動Ext_Proc
基于Gateway API Inference Extension
https://github.com/kubernetes-sigs/gateway-api-inference-extension.git
先clone代碼到本地
git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension.git
回退到一個較早commit,該commit是一個最基礎的實現
git reset --hard 90c2b645c1515e760e511d00ed9f6e5324084acc
切換到examples/poc/ext-proc目錄
將main.go中以下k8s相關內容注釋掉
直接指定Pod IP和Pod啟動服務,Pod名隨意,以下只是使用vllm容器啟動的deepseek服務
go run main.go --podIPs 128.128.0.14:8000 --pods pod1
會看到定時從指定的IP中獲取指標
配置Envoy
增加ext_proc配置
1. 增加一個http-filter:類型為ext_proc,放在router之前
2. 增加一個cluster:指向上述啟動的服務
admin:address:socket_address:protocol: TCPaddress: 0.0.0.0port_value: 9901
static_resources:listeners:- name: listener_0address:socket_address:address: 0.0.0.0port_value: 10000filter_chains:- filters:- name: envoy.filters.network.http_connection_managertyped_config:"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManagerstat_prefix: ingress_httpaccess_log:- name: envoy.access_loggers.stdouttyped_config:"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLogroute_config:name: local_routevirtual_hosts:- name: local_servicedomains: ["*"]routes:- match:prefix: "/"route:cluster: original_destination_clusterhttp_filters:- name: envoy.filters.http.ext_proctyped_config:"@type": type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessorgrpc_service:envoy_grpc:cluster_name: ext_procfailure_mode_allow: trueprocessing_mode:request_header_mode: SENDresponse_header_mode: SENDrequest_body_mode: BUFFERED_PARTIALresponse_body_mode: BUFFERED_PARTIALrequest_trailer_mode: SENDresponse_trailer_mode: SENDmessage_timeout: 1000s- name: envoy.filters.http.routertyped_config:"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Routerclusters:- name: ext_proctype: STATICconnect_timeout: 86400shttp2_protocol_options: {}lb_policy: ROUND_ROBINload_assignment:cluster_name: ext_procendpoints:- lb_endpoints:- endpoint:address:socket_address:address: 10.42.16.26port_value: 9002load_balancing_weight: 1- name: original_destination_clustertype: ORIGINAL_DSTconnect_timeout: 1000slb_policy: CLUSTER_PROVIDEDcircuit_breakers:thresholds:- max_connections: 40000max_pending_requests: 40000max_requests: 40000original_dst_lb_config:use_http_header: true#http_header_name: x-gateway-destination-endpointhttp_header_name: target-pod
重啟envoy服務
測試
不用在請求頭里自己加header,envoy會自動加上
curl http://128.128.0.13:10000/v1/completions -H "Content-Type: application/json" -d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","prompt": "San Francisco is a","max_tokens": 7,"temperature": 0}' -v
參考
深入解析 Envoy 外部處理過濾器(ext_proc) - Jimmy Song
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/test/testdata/envoy.yaml
External Processing Filter (proto) — envoy 1.36.0-dev-0c7818 documentation