Sagemaker deployment config for sub second real time inference

by vibranium - opened Dec 11, 2023

Dec 11, 2023

I am trying to deploy this model on the A100 p4d sagemaker endpoint instance. What are the configs for TGI that can be used for sub second real time inference? Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment