Sagemaker deployment config for sub second real time inference

#9
by vibranium - opened

I am trying to deploy this model on the A100 p4d sagemaker endpoint instance. What are the configs for TGI that can be used for sub second real time inference? Thanks!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment