Deploying an object detection model with Nvidia Triton Inference Server
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.
The NVIDIA NGC catalog is the hub for GPU-optimized software for deep learning (DL), machine learning (ML), and high-performance computing (HPC) that accelerates deployment to development workflows so data scientists, developers, and researchers can focus on building solutions, gathering insights, and delivering business value
In this blog post we will demonstrate how to deploy NVIDIA’s Triton Inference Server to run an object detection service. Object detection is a computer vision technique that allows you to identify objects in images or videos. It can be used to count, locate, and label objects in a scene for complex image classification. In this walkthrough, we show how to set up, run, and test the object detection service with a sample image of a “cloud guru mug”.
The NGC catalog in AWS Marketplace includes listings such as NVIDIA Triton Inference Server and can be launched directly on various AWS services.
NGC page: https://www.nvidia.com/en-us/gpu-cloud/
Prerequisites
- An active AWS account
- IAM roles and policies with
AmazonEC2ContainerRegistryFullAccess
to access AWS services. For more information, see Adding and removing IAM identity permissions in the AWS Identity and Access Management User Guide. - A launched Amazon EC2 instance that uses the NVIDIA Deep Learning AMI. The instance must be powered by NVIDIA GPUs, either the P4d, P3 or G4dn instance type, and use the NVIDIA Deep Learning AMI. To launch the instance, follow these steps:
- Follow the instructions in the NGC on AWS Virtual Machines documentation.
- We recommend using a p3.8xlarge instance, which gives you access to 4 V100 GPUs.
- You also have the option to use lower cost p3.2xlarge instances, which which gives you access to a single V100 GPU.
- In a terminal window, connect to the EC2 instance via SSH.
- Install the AWS Command Line Interface (AWS CLI) with the following commands:
sudo apt-get install python-pip
sudo pip install awscli
aws configure
- Keep your SSH terminal window open for Step 1.
Deploying an object detection model with Nvidia Triton Inference Server
Step 1: Pull the Triton Inference Server container from the NVIDIA NGC catalog in AWS Marketplace.
To pull the Triton Inference Server container, do the following:
A. Subscribe to the software
- Navigate to the NVIDIA NGC catalog in AWS Marketplace.
- Choose Triton Inference Server.
- On the product page upper right, choose Continue to Subscribe.
- On the configuration page, for Delivery Method, choose Triton Inference Server and, for Software Version, choose the most recent version.
- Choose Continue to Launch. This takes you to the launch screen.
B. Pull the container into your launched EC2 instance
- On the launch screen from step 1.A.5, choose View Container Image Details. This opens a popup with pull command instructions. In this popup, step 1 is a command to authenticate your Docker client to your Amazon Elastic Container Registry (Amazon ECR). Step 2 lists the Docker container URI. You use both of these in the next steps. Refer to the above screenshot.
- If you closed your EC2 terminal window from the Prerequisites section, open a new one.
- Authenticate your Docker client to the Amazon ECR registry. To do this, in the EC2 instance terminal window you opened in step 1.B.2, copy and run the first command from the View Container Image Details popup from step 1.B.1.
- From the popup in step 1.B.1, copy the Docker container URI.
- To pull the container into your EC2 instance terminal window, run the following command in the terminal window, replacing
[container URI from Step 1.B.2]
with your pasted Docker container URI from step 1.B.4:
docker pull [container URI from Step 1.B.2]
If successful, it returns a message similar to this one:
20.11-py3: Pulling from nvidia/containers/nvidia/tritonserver
Digest: sha256:2e7e43190b375031ce804228fd1a1544aa8c48a1db8ffb82b21fa33051cdfdbe
Status: Downloaded newer image for 970825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tritonserver:20.11-py3
Step 2: Download a pretrained model and create an object detection service
To create the object detection inference service, you need a pretrained model for object detection. I downloaded the Dense Convolutional Network (DenseNet) model, based on an ONNX Runtime backend. ONNX Runtime has the capability to train existing models through its optimized backend.
To set up your object detection service, do the following:
A. Create a repository structure compatible with the Triton container you subscribed to in Step 1. To do this, in the EC2 instance’s terminal window, run the following command:
mkdir -p model_repository/densenet_onnx/1
B. To download the DenseNet model, run the following command:
wget -O model_repository/densenet_onnx/1/model.onnx https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx
If successful, it returns a message similar to this one:
2020-12-18 20:25:46 (14.1 MB/s) - ‘model_repository/densenet_onnx/1/model.onnx’ saved [32719461/32719461]
C. To download the associated Triton configuration files for this particular model, run the following command:
wget -O model_repository/densenet_onnx/config.pbtxt https://raw.githubusercontent.com/triton-inference-server/server/master/docs/examples/model_repository/densenet_onnx/config.pbtxt
If successful, it returns a message similar to this one:
2020-12-18 20:30:49 (35.5 MB/s) - ‘model_repository/densenet_onnx/config.pbtxt’ saved [387/387]
D. To download a list of over 1,000 labels that the DenseNet model is trained to classify objects with, run the following command:
wget -O model_repository/densenet_onnx/densenet_labels.txt https://raw.githubusercontent.com/triton-inference-server/server/master/docs/examples/model_repository/densenet_onnx/densenet_labels.txt
If successful, it returns a message similar to this one:
2020-12-18 20:33:10 (78.0 MB/s) - ‘model_repository/densenet_onnx/densenet_labels.txt’ saved [10311/10311]
E. To deploy the DenseNet model to serve object detection request using the Triton Inference Server container, run the following command:
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/model_repository:/models <container URI from Step 1.B.2 above> tritonserver --model-repository=/models
If successful, it returns a message similar to this one:
I1218 20:14:59.358545 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001
I1218 20:14:59.361457 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000
I1218 20:14:59.403923 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002
Congratulations! Your object detection service is set up. Triton now displays the three API endpoints where it can receive inference requests using HTTP/REST or GRPC protocols or C API.
Keep the Triton container running in this terminal window. While you will take no further action in this window, do not close it. In the next step, you will send real-time object detection inference requests to the Triton Inference service running in this terminal window.
Step 3: See your model in action
To send inference requests to the object detection model, you need the Triton Inference Server — Client SDK container. Triton Inference Server provides a data center inference solution optimized for NVIDIA GPUs. It maximizes inference utilization and performance on GPUs via an HTTP or gRPC endpoint, allowing remote clients to request inference for any model that is being managed by the server, as well as providing real-time metrics on latency and requests. The Triton Inference Server — Client SDK can be used to build end-user client applications that can make inference requests to the server.
To get this container, follow these steps.
A. Launch Triton Inference Server
- Navigate to the NVIDIA NGC catalog in AWS Marketplace.
- At the center of the page in the Search bar, enter Triton Inference Server. Choose the Triton Inference Server card and select Triton Inference Server.
- On the product page upper right, choose Continue to Subscribe, and then Continue to Configuration.
- On the configuration page, for Delivery Method, choose Triton Inference Server — Client SDK variant of the product and for Software Version, choose the most recent version.
- Choose Continue to Launch.
- On the launch screen, in the middle of the page, choose View Container Image Details. Copy the container URI from Step 2 in the pop-up.
B. Pull the Triton Inference Server — Client SDK into your launched EC2 instance
- With the terminal window from step 2 still open, open an additional terminal window. All your actions in this step take place in this new window, but the step 2 terminal window must remain open to keep the Triton server application running.
- Connect to the same EC2 instance created as per the pre-requisites section via SSH in this new terminal window.
- In your new terminal window, pull the Triton Inference Server — Client SDK by running the following command.
docker pull <container URI from Step 3.A.6>
If successful, it returns a message similar to this one:
20.11-py3-clientsdk: Pulling from nvidia/containers/nvidia/tritonserver
Digest: sha256:2e7e43190b375031ce804228fd1a1544aa8c48a1db8ffb82b21fa33051cdfdbe
Status: Downloaded newer image for 709825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tritonserver:20.11-py3-clientsdk
- In the same terminal window, run the Triton Inference Server — Client SDK container by running the following command:
docker run -it --rm --net=host <container URI from Step 3.A.6>
If successful, you see the # prompt.
- To test your object detection service, send an inference request to the object detection service by running the following command:
./install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
The Triton Inference Server — Client SDK container has the below example image of a coffee mug preloaded to test inference. When you successfully run the command in step 3.B.5, the object detection service accurately tags the image with COFFEE MUG, CUP, and COFFEEPOT labels. Refer to the following image of a black coffee mug with the NVIDIA logo on it.
If successful, , it returns a message similar to this one:
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
15.346228 (504) = COFFEE MUG
13.224319 (968) = CUP
10.422960 (505) = COFFEEPOT
You have now confirmed that your newly installed object detection service is working properly.
Cleanup
After completing this walkthrough, to avoid additional usage charges, stop any EC2 Instances you have started.
Conclusion
In this walkthrough, we showed how to set up, run, and test an object detection service with a sample image. We demonstrated how to build and deploy an AI-powered solution with the NVIDIA NGC catalog in AWS Marketplace. Deploying an object detection service with Triton Inference Server is just one example, and you can follow similar steps to discover, access, and deploy other NVIDIA AI software.