Application fails to startup on NVIDIA T4, L4 GPU

#30
by asr143r - opened

I am working on testing cloned copy of this space for study purposes and want to avoid GPU not available issues that happen frequently while running on ZeroGPU. Hence I changed the hardware type in cloned space from ZeroGPU to T4 (also tried with L4) NVIDIA GPU options. However, after the build process completes, the application is stuck at startup stage for quite long and eventually fails with RunTime Errors like below (these could be warnings but it seems it is hitting timeout):

===== Application Startup at 2025-01-09 07:06:37 =====

/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
The config attributes {'decay': 0.9999, 'inv_gamma': 1.0, 'min_decay': 0.0, 'optimization_step': 37000, 'power': 0.6666666666666666, 'update_after_step': 0, 'use_ema_warmup': False} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(

Anyone had this issue and found resolution?

I am also planning to switch to T4/L4 to avoid GPU not available issues. Could you let me know if you were able to resolve it? Or did you find an alternative approach to avoid these runtime errors? Any insights would be really helpful!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment