部署 TensorFlow Serving 推理服务
更新时间:2022-12-01
本文介绍如何部署 TensorFlow Serving 推理服务,并指定队列、GPU资源。
前提条件
- 您已成功安装 CCE GPU Manager 和 CCE AI Job Scheduler 组件,否则云原生 AI 功能将无法使用。
操作步骤示例
这里用 TensorFlow Serving 作为示例,演示如何通过 deployment 部署推理服务。
-
部署 TensorFlow Serving 推理服务
- 指定使用 default 队列:scheduling.volcano.sh/queue-name: default
- 申请 1张GPU卡的50%的算力,10Gi显存
- 调度器指定为 volcano (必须)
参考 yaml 如下:
Plain Text
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: gpu-demo
5 namespace: default
6spec:
7 replicas: 1
8 selector:
9 matchLabels:
10 app: gpu-demo
11 template:
12 metadata:
13 annotations:
14 scheduling.volcano.sh/queue-name: default
15 labels:
16 app: gpu-demo
17 spec:
18 containers:
19 - image: registry.baidubce.com/cce-public/tensorflow-serving:demo-gpu
20 imagePullPolicy: Always
21 name: gpu-demo
22 env:
23 - name: MODEL_NAME
24 value: half_plus_two
25 ports:
26 - containerPort: 8501
27 resources:
28 limits:
29 cpu: "2"
30 memory: 2Gi
31 baidu.com/v100_32g_cgpu: "1"
32 baidu.com/v100_32g_cgpu_core: "50"
33 baidu.com/v100_32g_cgpu_memory: "10"
34 requests:
35 cpu: "2"
36 memory: 2Gi
37 baidu.com/v100_32g_cgpu: "1"
38 baidu.com/v100_32g_cgpu_core: "50"
39 baidu.com/v100_32g_cgpu_memory: "10"
40 # if gpu core isolation is enabled, set the following preStop hook for graceful shutdown.
41 # `tf_serving_entrypoint.sh` needs to be replaced with the name of your gpu process.
42 lifecycle:
43 preStop:
44 exec:
45 command: ["/bin/sh", "-c", "kill -10 `ps -ef | grep tf_serving_entrypoint.sh | grep -v grep | awk '{print $2}'` && sleep 1"]
46 dnsPolicy: ClusterFirst
47 restartPolicy: Always
48 schedulerName: volcano
- 执行以下命令,查看任务运行状态
Plain Text
1kubectl get deployments
2NAME READY UP-TO-DATE AVAILABLE AGE
3gpu-demo 1/1 1 1 30s
4
5kubectl get pod -o wide
6NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
7gpu-demo-65767d67cc-xhdgg 1/1 Running 0 63s 172.23.1.86 192.168.48.8 <none> <none>
- 验证 Tensorflow 推理服务是否可用
Plain Text
1# 需替换 <172.23.1.86> 为实际 pod ip
2curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST http://172.23.1.86:8501/v1/models/half_plus_two:predict
3
4# 输出类似如下结果:
5{
6 "predictions": [2.5, 3.0, 4.5]
7}
队列使用说明
可通过 annotations 指定队列
Plain Text
1annotations:
2 scheduling.volcano.sh/queue-name: <队列名称>
资源申请说明
单卡独占示例
Plain Text
1resources:
2 requests:
3 baidu.com/v100_32g_cgpu: 1 // 1卡
4 cpu: "4"
5 memory: 6Gi
6 limits:
7 baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
8 cpu: "4"
9 memory: 6Gi
多卡独占示例:
Plain Text
1resources:
2 requests:
3 baidu.com/v100_32g_cgpu: 2 // 2卡
4 cpu: "4"
5 memory: 6Gi
6 limits:
7 baidu.com/v100_32g_cgpu: 2 // limit与request必须一致
8 cpu: "4"
9 memory: 6Gi
单卡共享【不进行算力隔离,只有显存隔离】示例:
Plain Text
1resources:
2 requests:
3 baidu.com/v100_32g_cgpu: 1 // 1卡
4 baidu.com/v100_32g_cgpu_memory: 10 // 10GB
5 cpu: "4"
6 memory: 6Gi
7 limits:
8 baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
9 baidu.com/v100_32g_cgpu_memory: 10
10 cpu: "4"
11 memory: 6Gi
单卡共享【同时支持显存隔离和算力隔离】示例:
Plain Text
1resources:
2 requests:
3 baidu.com/v100_32g_cgpu: 1 // 1卡
4 baidu.com/v100_32g_cgpu_core: 50 // 50%, 0.5卡算力
5 baidu.com/v100_32g_cgpu_memory: 10 // 10GB
6 cpu: "4"
7 memory: 6Gi
8 limits:
9 baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
10 baidu.com/v100_32g_cgpu_core: 50
11 baidu.com/v100_32g_cgpu_memory: 10
12 cpu: "4"
13 memory: 6Gi
GPU卡类型和资源名称对比关系
目前以下类型的GPU支持显存和算力的共享与隔离:
GPU卡型号 | 资源名称 |
---|---|
Tesla V100-SXM2-16GB | baidu.com/v100_16g_cgpu |
Tesla V100-SXM2-32GB | baidu.com/v100_32g_cgpu |
Tesla T4 | baidu.com/t4_16g_cgpu |