diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,76 +1,108 @@ -[2025-02-27 20:31:39,118][00031] Saving configuration to /kaggle/working/train_dir/default_experiment/config.json... -[2025-02-27 20:31:39,120][00031] Rollout worker 0 uses device cpu -[2025-02-27 20:31:39,120][00031] Rollout worker 1 uses device cpu -[2025-02-27 20:31:39,121][00031] Rollout worker 2 uses device cpu -[2025-02-27 20:31:39,122][00031] Rollout worker 3 uses device cpu -[2025-02-27 20:31:39,123][00031] Rollout worker 4 uses device cpu -[2025-02-27 20:31:39,123][00031] Rollout worker 5 uses device cpu -[2025-02-27 20:31:39,124][00031] Rollout worker 6 uses device cpu -[2025-02-27 20:31:39,125][00031] Rollout worker 7 uses device cpu -[2025-02-27 20:31:39,127][00031] Rollout worker 8 uses device cpu -[2025-02-27 20:31:39,128][00031] Rollout worker 9 uses device cpu -[2025-02-27 20:31:39,129][00031] Rollout worker 10 uses device cpu -[2025-02-27 20:31:39,130][00031] Rollout worker 11 uses device cpu -[2025-02-27 20:31:39,130][00031] Rollout worker 12 uses device cpu -[2025-02-27 20:31:39,131][00031] Rollout worker 13 uses device cpu -[2025-02-27 20:31:39,132][00031] Rollout worker 14 uses device cpu -[2025-02-27 20:31:39,133][00031] Rollout worker 15 uses device cpu -[2025-02-27 20:31:39,134][00031] Rollout worker 16 uses device cpu -[2025-02-27 20:31:39,134][00031] Rollout worker 17 uses device cpu -[2025-02-27 20:31:39,135][00031] Rollout worker 18 uses device cpu -[2025-02-27 20:31:39,137][00031] Rollout worker 19 uses device cpu -[2025-02-27 20:31:39,722][00031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-02-27 20:31:39,723][00031] InferenceWorker_p0-w0: min num requests: 6 -[2025-02-27 20:31:39,829][00031] Starting all processes... -[2025-02-27 20:31:39,830][00031] Starting process learner_proc0 -[2025-02-27 20:31:39,921][00031] Starting all processes... -[2025-02-27 20:31:39,930][00031] Starting process inference_proc0-0 -[2025-02-27 20:31:39,931][00031] Starting process rollout_proc0 -[2025-02-27 20:31:39,931][00031] Starting process rollout_proc1 -[2025-02-27 20:31:39,933][00031] Starting process rollout_proc2 -[2025-02-27 20:31:39,936][00031] Starting process rollout_proc3 -[2025-02-27 20:31:39,958][00031] Starting process rollout_proc4 -[2025-02-27 20:31:39,965][00031] Starting process rollout_proc5 -[2025-02-27 20:31:39,967][00031] Starting process rollout_proc6 -[2025-02-27 20:31:39,971][00031] Starting process rollout_proc7 -[2025-02-27 20:31:39,979][00031] Starting process rollout_proc8 -[2025-02-27 20:31:39,983][00031] Starting process rollout_proc9 -[2025-02-27 20:31:39,995][00031] Starting process rollout_proc10 -[2025-02-27 20:31:39,999][00031] Starting process rollout_proc11 -[2025-02-27 20:31:40,007][00031] Starting process rollout_proc12 -[2025-02-27 20:31:40,015][00031] Starting process rollout_proc13 -[2025-02-27 20:31:40,019][00031] Starting process rollout_proc14 -[2025-02-27 20:31:40,218][00031] Starting process rollout_proc16 -[2025-02-27 20:31:40,160][00031] Starting process rollout_proc15 -[2025-02-27 20:31:40,368][00031] Starting process rollout_proc17 -[2025-02-27 20:31:40,508][00031] Starting process rollout_proc18 -[2025-02-27 20:31:40,549][00031] Starting process rollout_proc19 -[2025-02-27 20:31:54,595][00222] Worker 4 uses CPU cores [0] -[2025-02-27 20:31:54,668][00031] Heartbeat connected on RolloutWorker_w4 -[2025-02-27 20:31:57,462][00217] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-02-27 20:31:57,462][00217] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-02-27 20:31:57,568][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-02-27 20:31:57,570][00196] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-02-27 20:31:57,585][00217] Num visible devices: 1 -[2025-02-27 20:31:57,629][00031] Heartbeat connected on InferenceWorker_p0-w0 -[2025-02-27 20:31:57,650][00196] Num visible devices: 1 -[2025-02-27 20:31:57,683][00196] Starting seed is not provided -[2025-02-27 20:31:57,683][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-02-27 20:31:57,684][00196] Initializing actor-critic model on device cuda:0 -[2025-02-27 20:31:57,683][00031] Heartbeat connected on Batcher_0 -[2025-02-27 20:31:57,684][00196] RunningMeanStd input shape: (3, 72, 128) -[2025-02-27 20:31:57,698][00196] RunningMeanStd input shape: (1,) -[2025-02-27 20:31:57,818][00196] ConvEncoder: input_channels=3 -[2025-02-27 20:31:58,270][00216] Worker 0 uses CPU cores [0] -[2025-02-27 20:31:58,277][00226] Worker 9 uses CPU cores [1] -[2025-02-27 20:31:58,512][00031] Heartbeat connected on RolloutWorker_w9 -[2025-02-27 20:31:58,520][00031] Heartbeat connected on RolloutWorker_w0 -[2025-02-27 20:31:58,684][00218] Worker 1 uses CPU cores [1] -[2025-02-27 20:31:58,883][00031] Heartbeat connected on RolloutWorker_w1 -[2025-02-27 20:31:58,972][00196] Conv encoder output size: 512 -[2025-02-27 20:31:58,973][00196] Policy head output size: 512 -[2025-02-27 20:31:59,158][00196] Created Actor Critic model with architecture: -[2025-02-27 20:31:59,159][00196] ActorCriticSharedWeights( +[2025-02-27 20:55:11,700][00031] Saving configuration to /kaggle/working/train_dir/default_experiment/config.json... +[2025-02-27 20:55:11,702][00031] Rollout worker 0 uses device cpu +[2025-02-27 20:55:11,703][00031] Rollout worker 1 uses device cpu +[2025-02-27 20:55:11,704][00031] Rollout worker 2 uses device cpu +[2025-02-27 20:55:11,704][00031] Rollout worker 3 uses device cpu +[2025-02-27 20:55:11,705][00031] Rollout worker 4 uses device cpu +[2025-02-27 20:55:11,707][00031] Rollout worker 5 uses device cpu +[2025-02-27 20:55:11,708][00031] Rollout worker 6 uses device cpu +[2025-02-27 20:55:11,708][00031] Rollout worker 7 uses device cpu +[2025-02-27 20:55:11,709][00031] Rollout worker 8 uses device cpu +[2025-02-27 20:55:11,710][00031] Rollout worker 9 uses device cpu +[2025-02-27 20:55:11,711][00031] Rollout worker 10 uses device cpu +[2025-02-27 20:55:11,712][00031] Rollout worker 11 uses device cpu +[2025-02-27 20:55:11,712][00031] Rollout worker 12 uses device cpu +[2025-02-27 20:55:11,713][00031] Rollout worker 13 uses device cpu +[2025-02-27 20:55:11,716][00031] Rollout worker 14 uses device cpu +[2025-02-27 20:55:11,717][00031] Rollout worker 15 uses device cpu +[2025-02-27 20:55:11,717][00031] Rollout worker 16 uses device cpu +[2025-02-27 20:55:11,718][00031] Rollout worker 17 uses device cpu +[2025-02-27 20:55:11,719][00031] Rollout worker 18 uses device cpu +[2025-02-27 20:55:11,720][00031] Rollout worker 19 uses device cpu +[2025-02-27 20:55:12,275][00031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-27 20:55:12,276][00031] InferenceWorker_p0-w0: min num requests: 6 +[2025-02-27 20:55:12,383][00031] Starting all processes... +[2025-02-27 20:55:12,384][00031] Starting process learner_proc0 +[2025-02-27 20:55:12,474][00031] Starting all processes... +[2025-02-27 20:55:12,483][00031] Starting process inference_proc0-0 +[2025-02-27 20:55:12,484][00031] Starting process rollout_proc0 +[2025-02-27 20:55:12,485][00031] Starting process rollout_proc1 +[2025-02-27 20:55:12,485][00031] Starting process rollout_proc2 +[2025-02-27 20:55:12,485][00031] Starting process rollout_proc3 +[2025-02-27 20:55:12,485][00031] Starting process rollout_proc4 +[2025-02-27 20:55:12,486][00031] Starting process rollout_proc5 +[2025-02-27 20:55:12,492][00031] Starting process rollout_proc6 +[2025-02-27 20:55:12,492][00031] Starting process rollout_proc7 +[2025-02-27 20:55:12,505][00031] Starting process rollout_proc8 +[2025-02-27 20:55:12,505][00031] Starting process rollout_proc9 +[2025-02-27 20:55:12,507][00031] Starting process rollout_proc10 +[2025-02-27 20:55:12,507][00031] Starting process rollout_proc11 +[2025-02-27 20:55:12,507][00031] Starting process rollout_proc12 +[2025-02-27 20:55:12,508][00031] Starting process rollout_proc13 +[2025-02-27 20:55:12,508][00031] Starting process rollout_proc14 +[2025-02-27 20:55:12,850][00031] Starting process rollout_proc15 +[2025-02-27 20:55:12,954][00031] Starting process rollout_proc16 +[2025-02-27 20:55:12,979][00031] Starting process rollout_proc17 +[2025-02-27 20:55:13,007][00031] Starting process rollout_proc18 +[2025-02-27 20:55:13,057][00031] Starting process rollout_proc19 +[2025-02-27 20:55:30,625][00219] Worker 2 uses CPU cores [2] +[2025-02-27 20:55:30,771][00217] Worker 0 uses CPU cores [0] +[2025-02-27 20:55:30,822][00228] Worker 11 uses CPU cores [3] +[2025-02-27 20:55:31,031][00031] Heartbeat connected on RolloutWorker_w11 +[2025-02-27 20:55:31,081][00222] Worker 5 uses CPU cores [1] +[2025-02-27 20:55:31,093][00031] Heartbeat connected on RolloutWorker_w5 +[2025-02-27 20:55:31,191][00031] Heartbeat connected on RolloutWorker_w0 +[2025-02-27 20:55:31,239][00031] Heartbeat connected on RolloutWorker_w2 +[2025-02-27 20:55:31,507][00236] Worker 19 uses CPU cores [3] +[2025-02-27 20:55:31,519][00031] Heartbeat connected on RolloutWorker_w19 +[2025-02-27 20:55:31,755][00221] Worker 3 uses CPU cores [3] +[2025-02-27 20:55:31,759][00218] Worker 1 uses CPU cores [1] +[2025-02-27 20:55:31,766][00031] Heartbeat connected on RolloutWorker_w3 +[2025-02-27 20:55:31,783][00031] Heartbeat connected on RolloutWorker_w1 +[2025-02-27 20:55:32,275][00223] Worker 6 uses CPU cores [2] +[2025-02-27 20:55:32,333][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-27 20:55:32,333][00196] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-02-27 20:55:32,409][00196] Num visible devices: 1 +[2025-02-27 20:55:32,432][00196] Starting seed is not provided +[2025-02-27 20:55:32,433][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-27 20:55:32,433][00196] Initializing actor-critic model on device cuda:0 +[2025-02-27 20:55:32,433][00196] RunningMeanStd input shape: (3, 72, 128) +[2025-02-27 20:55:32,434][00031] Heartbeat connected on RolloutWorker_w6 +[2025-02-27 20:55:32,436][00031] Heartbeat connected on Batcher_0 +[2025-02-27 20:55:32,443][00196] RunningMeanStd input shape: (1,) +[2025-02-27 20:55:32,453][00231] Worker 14 uses CPU cores [2] +[2025-02-27 20:55:32,528][00196] ConvEncoder: input_channels=3 +[2025-02-27 20:55:32,590][00031] Heartbeat connected on RolloutWorker_w14 +[2025-02-27 20:55:32,628][00230] Worker 13 uses CPU cores [1] +[2025-02-27 20:55:32,701][00225] Worker 8 uses CPU cores [0] +[2025-02-27 20:55:32,710][00224] Worker 7 uses CPU cores [3] +[2025-02-27 20:55:32,738][00031] Heartbeat connected on RolloutWorker_w13 +[2025-02-27 20:55:32,744][00220] Worker 4 uses CPU cores [0] +[2025-02-27 20:55:32,748][00216] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-27 20:55:32,748][00216] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-02-27 20:55:32,758][00227] Worker 9 uses CPU cores [1] +[2025-02-27 20:55:32,791][00226] Worker 10 uses CPU cores [2] +[2025-02-27 20:55:32,789][00031] Heartbeat connected on RolloutWorker_w8 +[2025-02-27 20:55:32,792][00031] Heartbeat connected on RolloutWorker_w4 +[2025-02-27 20:55:32,820][00031] Heartbeat connected on RolloutWorker_w7 +[2025-02-27 20:55:32,824][00216] Num visible devices: 1 +[2025-02-27 20:55:32,839][00031] Heartbeat connected on RolloutWorker_w10 +[2025-02-27 20:55:32,845][00031] Heartbeat connected on InferenceWorker_p0-w0 +[2025-02-27 20:55:32,851][00229] Worker 12 uses CPU cores [0] +[2025-02-27 20:55:32,868][00031] Heartbeat connected on RolloutWorker_w9 +[2025-02-27 20:55:32,888][00031] Heartbeat connected on RolloutWorker_w12 +[2025-02-27 20:55:32,901][00232] Worker 15 uses CPU cores [3] +[2025-02-27 20:55:32,935][00031] Heartbeat connected on RolloutWorker_w15 +[2025-02-27 20:55:32,947][00235] Worker 18 uses CPU cores [2] +[2025-02-27 20:55:32,959][00031] Heartbeat connected on RolloutWorker_w18 +[2025-02-27 20:55:32,964][00234] Worker 17 uses CPU cores [1] +[2025-02-27 20:55:32,977][00031] Heartbeat connected on RolloutWorker_w17 +[2025-02-27 20:55:32,982][00233] Worker 16 uses CPU cores [0] +[2025-02-27 20:55:32,994][00031] Heartbeat connected on RolloutWorker_w16 +[2025-02-27 20:55:33,014][00196] Conv encoder output size: 512 +[2025-02-27 20:55:33,015][00196] Policy head output size: 512 +[2025-02-27 20:55:33,026][00196] Created Actor Critic model with architecture: +[2025-02-27 20:55:33,026][00196] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -101,2135 +133,2190 @@ ) ) (core): ModelCoreRNN( - (core): LSTM(512, 512) + (core): GRU(512, 256, num_layers=2) ) (decoder): MlpDecoder( (mlp): Identity() ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (critic_linear): Linear(in_features=256, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + (distribution_linear): Linear(in_features=256, out_features=5, bias=True) ) ) -[2025-02-27 20:31:59,298][00235] Worker 18 uses CPU cores [2] -[2025-02-27 20:31:59,303][00229] Worker 13 uses CPU cores [1] -[2025-02-27 20:31:59,445][00219] Worker 2 uses CPU cores [2] -[2025-02-27 20:31:59,469][00031] Heartbeat connected on RolloutWorker_w18 -[2025-02-27 20:31:59,479][00233] Worker 16 uses CPU cores [0] -[2025-02-27 20:31:59,474][00031] Heartbeat connected on RolloutWorker_w13 -[2025-02-27 20:31:59,486][00225] Worker 8 uses CPU cores [0] -[2025-02-27 20:31:59,494][00220] Worker 3 uses CPU cores [3] -[2025-02-27 20:31:59,505][00223] Worker 6 uses CPU cores [2] -[2025-02-27 20:31:59,508][00031] Heartbeat connected on RolloutWorker_w3 -[2025-02-27 20:31:59,554][00227] Worker 11 uses CPU cores [3] -[2025-02-27 20:31:59,569][00031] Heartbeat connected on RolloutWorker_w2 -[2025-02-27 20:31:59,587][00224] Worker 7 uses CPU cores [3] -[2025-02-27 20:31:59,589][00031] Heartbeat connected on RolloutWorker_w6 -[2025-02-27 20:31:59,602][00031] Heartbeat connected on RolloutWorker_w16 -[2025-02-27 20:31:59,609][00031] Heartbeat connected on RolloutWorker_w8 -[2025-02-27 20:31:59,622][00031] Heartbeat connected on RolloutWorker_w11 -[2025-02-27 20:31:59,623][00031] Heartbeat connected on RolloutWorker_w7 -[2025-02-27 20:31:59,633][00228] Worker 12 uses CPU cores [0] -[2025-02-27 20:31:59,640][00231] Worker 14 uses CPU cores [2] -[2025-02-27 20:31:59,678][00031] Heartbeat connected on RolloutWorker_w12 -[2025-02-27 20:31:59,682][00236] Worker 19 uses CPU cores [3] -[2025-02-27 20:31:59,698][00031] Heartbeat connected on RolloutWorker_w19 -[2025-02-27 20:31:59,702][00031] Heartbeat connected on RolloutWorker_w14 -[2025-02-27 20:31:59,729][00232] Worker 15 uses CPU cores [3] -[2025-02-27 20:31:59,744][00031] Heartbeat connected on RolloutWorker_w15 -[2025-02-27 20:31:59,779][00196] Using optimizer -[2025-02-27 20:31:59,779][00230] Worker 10 uses CPU cores [2] -[2025-02-27 20:31:59,793][00221] Worker 5 uses CPU cores [1] -[2025-02-27 20:31:59,795][00031] Heartbeat connected on RolloutWorker_w10 -[2025-02-27 20:31:59,796][00234] Worker 17 uses CPU cores [1] -[2025-02-27 20:31:59,818][00031] Heartbeat connected on RolloutWorker_w5 -[2025-02-27 20:31:59,819][00031] Heartbeat connected on RolloutWorker_w17 -[2025-02-27 20:32:01,573][00196] No checkpoints found -[2025-02-27 20:32:01,573][00196] Did not load from checkpoint, starting from scratch! -[2025-02-27 20:32:01,574][00196] Initialized policy 0 weights for model version 0 -[2025-02-27 20:32:01,577][00196] LearnerWorker_p0 finished initialization! -[2025-02-27 20:32:01,578][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-02-27 20:32:01,579][00031] Heartbeat connected on LearnerWorker_p0 -[2025-02-27 20:32:01,661][00217] RunningMeanStd input shape: (3, 72, 128) -[2025-02-27 20:32:01,663][00217] RunningMeanStd input shape: (1,) -[2025-02-27 20:32:01,675][00217] ConvEncoder: input_channels=3 -[2025-02-27 20:32:01,796][00217] Conv encoder output size: 512 -[2025-02-27 20:32:01,796][00217] Policy head output size: 512 -[2025-02-27 20:32:01,881][00031] Inference worker 0-0 is ready! -[2025-02-27 20:32:01,883][00031] All inference workers are ready! Signal rollout workers to start! -[2025-02-27 20:32:02,176][00228] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,182][00225] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,192][00231] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,197][00216] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,195][00223] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,201][00233] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,203][00222] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,205][00235] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,206][00219] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,211][00234] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,211][00230] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,219][00226] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,233][00218] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,232][00221] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,239][00229] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,254][00224] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,257][00220] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,258][00232] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,263][00236] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:02,265][00227] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:32:04,115][00230] Decorrelating experience for 0 frames... -[2025-02-27 20:32:04,115][00234] Decorrelating experience for 0 frames... -[2025-02-27 20:32:04,478][00031] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-02-27 20:32:04,569][00226] Decorrelating experience for 0 frames... -[2025-02-27 20:32:04,895][00225] Decorrelating experience for 0 frames... -[2025-02-27 20:32:04,904][00216] Decorrelating experience for 0 frames... -[2025-02-27 20:32:04,910][00228] Decorrelating experience for 0 frames... -[2025-02-27 20:32:04,914][00233] Decorrelating experience for 0 frames... -[2025-02-27 20:32:04,990][00236] Decorrelating experience for 0 frames... -[2025-02-27 20:32:05,022][00224] Decorrelating experience for 0 frames... -[2025-02-27 20:32:05,047][00232] Decorrelating experience for 0 frames... -[2025-02-27 20:32:05,272][00219] Decorrelating experience for 0 frames... -[2025-02-27 20:32:05,398][00228] Decorrelating experience for 32 frames... -[2025-02-27 20:32:05,481][00227] Decorrelating experience for 0 frames... -[2025-02-27 20:32:05,780][00235] Decorrelating experience for 0 frames... -[2025-02-27 20:32:06,019][00236] Decorrelating experience for 32 frames... -[2025-02-27 20:32:06,055][00219] Decorrelating experience for 32 frames... -[2025-02-27 20:32:06,057][00232] Decorrelating experience for 32 frames... -[2025-02-27 20:32:06,496][00219] Decorrelating experience for 64 frames... -[2025-02-27 20:32:06,805][00221] Decorrelating experience for 0 frames... -[2025-02-27 20:32:06,803][00229] Decorrelating experience for 0 frames... -[2025-02-27 20:32:06,834][00225] Decorrelating experience for 32 frames... -[2025-02-27 20:32:06,836][00226] Decorrelating experience for 32 frames... -[2025-02-27 20:32:06,843][00228] Decorrelating experience for 64 frames... -[2025-02-27 20:32:06,978][00227] Decorrelating experience for 32 frames... -[2025-02-27 20:32:07,031][00236] Decorrelating experience for 64 frames... -[2025-02-27 20:32:07,168][00218] Decorrelating experience for 0 frames... -[2025-02-27 20:32:07,221][00234] Decorrelating experience for 32 frames... -[2025-02-27 20:32:07,645][00226] Decorrelating experience for 64 frames... -[2025-02-27 20:32:07,651][00230] Decorrelating experience for 32 frames... -[2025-02-27 20:32:07,725][00223] Decorrelating experience for 0 frames... -[2025-02-27 20:32:07,881][00221] Decorrelating experience for 32 frames... -[2025-02-27 20:32:08,126][00222] Decorrelating experience for 0 frames... -[2025-02-27 20:32:08,188][00225] Decorrelating experience for 64 frames... -[2025-02-27 20:32:08,228][00230] Decorrelating experience for 64 frames... -[2025-02-27 20:32:08,609][00228] Decorrelating experience for 96 frames... -[2025-02-27 20:32:08,710][00235] Decorrelating experience for 32 frames... -[2025-02-27 20:32:09,039][00218] Decorrelating experience for 32 frames... -[2025-02-27 20:32:09,045][00234] Decorrelating experience for 64 frames... -[2025-02-27 20:32:09,080][00227] Decorrelating experience for 64 frames... -[2025-02-27 20:32:09,157][00226] Decorrelating experience for 96 frames... -[2025-02-27 20:32:09,292][00229] Decorrelating experience for 32 frames... -[2025-02-27 20:32:09,478][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-02-27 20:32:09,559][00223] Decorrelating experience for 32 frames... -[2025-02-27 20:32:09,603][00222] Decorrelating experience for 32 frames... -[2025-02-27 20:32:09,780][00225] Decorrelating experience for 96 frames... -[2025-02-27 20:32:10,089][00216] Decorrelating experience for 32 frames... -[2025-02-27 20:32:10,172][00220] Decorrelating experience for 0 frames... -[2025-02-27 20:32:10,231][00228] Decorrelating experience for 128 frames... -[2025-02-27 20:32:10,435][00221] Decorrelating experience for 64 frames... -[2025-02-27 20:32:10,477][00227] Decorrelating experience for 96 frames... -[2025-02-27 20:32:10,766][00236] Decorrelating experience for 96 frames... -[2025-02-27 20:32:10,984][00234] Decorrelating experience for 96 frames... -[2025-02-27 20:32:11,064][00219] Decorrelating experience for 96 frames... -[2025-02-27 20:32:11,094][00225] Decorrelating experience for 128 frames... -[2025-02-27 20:32:11,174][00230] Decorrelating experience for 96 frames... -[2025-02-27 20:32:11,227][00218] Decorrelating experience for 64 frames... -[2025-02-27 20:32:11,430][00226] Decorrelating experience for 128 frames... -[2025-02-27 20:32:11,623][00220] Decorrelating experience for 32 frames... -[2025-02-27 20:32:11,809][00222] Decorrelating experience for 64 frames... -[2025-02-27 20:32:11,924][00218] Decorrelating experience for 96 frames... -[2025-02-27 20:32:12,235][00235] Decorrelating experience for 64 frames... -[2025-02-27 20:32:12,274][00219] Decorrelating experience for 128 frames... -[2025-02-27 20:32:12,388][00218] Decorrelating experience for 128 frames... -[2025-02-27 20:32:12,787][00216] Decorrelating experience for 64 frames... -[2025-02-27 20:32:12,936][00230] Decorrelating experience for 128 frames... -[2025-02-27 20:32:12,966][00229] Decorrelating experience for 64 frames... -[2025-02-27 20:32:13,063][00233] Decorrelating experience for 32 frames... -[2025-02-27 20:32:13,241][00225] Decorrelating experience for 160 frames... -[2025-02-27 20:32:13,329][00228] Decorrelating experience for 160 frames... -[2025-02-27 20:32:13,575][00218] Decorrelating experience for 160 frames... -[2025-02-27 20:32:13,652][00224] Decorrelating experience for 32 frames... -[2025-02-27 20:32:13,873][00220] Decorrelating experience for 64 frames... -[2025-02-27 20:32:13,897][00223] Decorrelating experience for 64 frames... -[2025-02-27 20:32:14,478][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-02-27 20:32:14,554][00216] Decorrelating experience for 96 frames... -[2025-02-27 20:32:14,592][00226] Decorrelating experience for 160 frames... -[2025-02-27 20:32:14,699][00234] Decorrelating experience for 128 frames... -[2025-02-27 20:32:14,746][00225] Decorrelating experience for 192 frames... -[2025-02-27 20:32:14,881][00236] Decorrelating experience for 128 frames... -[2025-02-27 20:32:14,900][00220] Decorrelating experience for 96 frames... -[2025-02-27 20:32:14,993][00230] Decorrelating experience for 160 frames... -[2025-02-27 20:32:15,384][00229] Decorrelating experience for 96 frames... -[2025-02-27 20:32:15,498][00224] Decorrelating experience for 64 frames... -[2025-02-27 20:32:15,850][00228] Decorrelating experience for 192 frames... -[2025-02-27 20:32:16,002][00224] Decorrelating experience for 96 frames... -[2025-02-27 20:32:16,024][00216] Decorrelating experience for 128 frames... -[2025-02-27 20:32:16,137][00234] Decorrelating experience for 160 frames... -[2025-02-27 20:32:16,170][00235] Decorrelating experience for 96 frames... -[2025-02-27 20:32:16,515][00224] Decorrelating experience for 128 frames... -[2025-02-27 20:32:16,550][00233] Decorrelating experience for 64 frames... -[2025-02-27 20:32:16,643][00219] Decorrelating experience for 160 frames... -[2025-02-27 20:32:17,060][00224] Decorrelating experience for 160 frames... -[2025-02-27 20:32:17,212][00230] Decorrelating experience for 192 frames... -[2025-02-27 20:32:17,532][00225] Decorrelating experience for 224 frames... -[2025-02-27 20:32:17,625][00229] Decorrelating experience for 128 frames... -[2025-02-27 20:32:17,683][00233] Decorrelating experience for 96 frames... -[2025-02-27 20:32:17,752][00226] Decorrelating experience for 192 frames... -[2025-02-27 20:32:17,774][00218] Decorrelating experience for 192 frames... -[2025-02-27 20:32:17,988][00231] Decorrelating experience for 0 frames... -[2025-02-27 20:32:18,706][00234] Decorrelating experience for 192 frames... -[2025-02-27 20:32:18,796][00230] Decorrelating experience for 224 frames... -[2025-02-27 20:32:18,860][00216] Decorrelating experience for 160 frames... -[2025-02-27 20:32:18,869][00229] Decorrelating experience for 160 frames... -[2025-02-27 20:32:18,939][00231] Decorrelating experience for 32 frames... -[2025-02-27 20:32:19,478][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-02-27 20:32:19,642][00228] Decorrelating experience for 224 frames... -[2025-02-27 20:32:19,652][00230] Decorrelating experience for 256 frames... -[2025-02-27 20:32:19,770][00218] Decorrelating experience for 224 frames... -[2025-02-27 20:32:19,796][00222] Decorrelating experience for 96 frames... -[2025-02-27 20:32:19,977][00226] Decorrelating experience for 224 frames... -[2025-02-27 20:32:20,223][00225] Decorrelating experience for 256 frames... -[2025-02-27 20:32:20,530][00224] Decorrelating experience for 192 frames... -[2025-02-27 20:32:21,109][00223] Decorrelating experience for 96 frames... -[2025-02-27 20:32:21,213][00222] Decorrelating experience for 128 frames... -[2025-02-27 20:32:21,214][00221] Decorrelating experience for 96 frames... -[2025-02-27 20:32:21,218][00229] Decorrelating experience for 192 frames... -[2025-02-27 20:32:21,361][00219] Decorrelating experience for 192 frames... -[2025-02-27 20:32:21,450][00224] Decorrelating experience for 224 frames... -[2025-02-27 20:32:21,861][00222] Decorrelating experience for 160 frames... -[2025-02-27 20:32:22,062][00232] Decorrelating experience for 64 frames... -[2025-02-27 20:32:22,247][00230] Decorrelating experience for 288 frames... -[2025-02-27 20:32:22,448][00235] Decorrelating experience for 128 frames... -[2025-02-27 20:32:22,516][00234] Decorrelating experience for 224 frames... -[2025-02-27 20:32:22,628][00221] Decorrelating experience for 128 frames... -[2025-02-27 20:32:22,666][00236] Decorrelating experience for 160 frames... -[2025-02-27 20:32:23,036][00231] Decorrelating experience for 64 frames... -[2025-02-27 20:32:23,444][00222] Decorrelating experience for 192 frames... -[2025-02-27 20:32:23,495][00226] Decorrelating experience for 256 frames... -[2025-02-27 20:32:23,774][00221] Decorrelating experience for 160 frames... -[2025-02-27 20:32:23,916][00219] Decorrelating experience for 224 frames... -[2025-02-27 20:32:23,942][00233] Decorrelating experience for 128 frames... -[2025-02-27 20:32:24,256][00228] Decorrelating experience for 256 frames... -[2025-02-27 20:32:24,478][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-02-27 20:32:24,513][00229] Decorrelating experience for 224 frames... -[2025-02-27 20:32:24,536][00223] Decorrelating experience for 128 frames... -[2025-02-27 20:32:24,714][00231] Decorrelating experience for 96 frames... -[2025-02-27 20:32:25,120][00227] Decorrelating experience for 128 frames... -[2025-02-27 20:32:25,356][00216] Decorrelating experience for 192 frames... -[2025-02-27 20:32:25,702][00230] Decorrelating experience for 320 frames... -[2025-02-27 20:32:25,736][00236] Decorrelating experience for 192 frames... -[2025-02-27 20:32:25,798][00229] Decorrelating experience for 256 frames... -[2025-02-27 20:32:26,049][00222] Decorrelating experience for 224 frames... -[2025-02-27 20:32:26,356][00233] Decorrelating experience for 160 frames... -[2025-02-27 20:32:26,627][00235] Decorrelating experience for 160 frames... -[2025-02-27 20:32:26,882][00220] Decorrelating experience for 128 frames... -[2025-02-27 20:32:27,417][00223] Decorrelating experience for 160 frames... -[2025-02-27 20:32:27,430][00234] Decorrelating experience for 256 frames... -[2025-02-27 20:32:27,434][00224] Decorrelating experience for 256 frames... -[2025-02-27 20:32:27,449][00227] Decorrelating experience for 160 frames... -[2025-02-27 20:32:27,815][00228] Decorrelating experience for 288 frames... -[2025-02-27 20:32:28,215][00216] Decorrelating experience for 224 frames... -[2025-02-27 20:32:28,496][00229] Decorrelating experience for 288 frames... -[2025-02-27 20:32:28,634][00230] Decorrelating experience for 352 frames... -[2025-02-27 20:32:28,811][00226] Decorrelating experience for 288 frames... -[2025-02-27 20:32:28,897][00220] Decorrelating experience for 160 frames... -[2025-02-27 20:32:29,115][00222] Decorrelating experience for 256 frames... -[2025-02-27 20:32:29,323][00219] Decorrelating experience for 256 frames... -[2025-02-27 20:32:29,478][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-02-27 20:32:29,664][00224] Decorrelating experience for 288 frames... -[2025-02-27 20:32:29,785][00235] Decorrelating experience for 192 frames... -[2025-02-27 20:32:30,052][00236] Decorrelating experience for 224 frames... -[2025-02-27 20:32:30,192][00225] Decorrelating experience for 288 frames... -[2025-02-27 20:32:30,205][00216] Decorrelating experience for 256 frames... -[2025-02-27 20:32:30,206][00227] Decorrelating experience for 192 frames... -[2025-02-27 20:32:30,634][00223] Decorrelating experience for 192 frames... -[2025-02-27 20:32:30,847][00226] Decorrelating experience for 320 frames... -[2025-02-27 20:32:31,251][00235] Decorrelating experience for 224 frames... -[2025-02-27 20:32:31,451][00222] Decorrelating experience for 288 frames... -[2025-02-27 20:32:31,767][00221] Decorrelating experience for 192 frames... -[2025-02-27 20:32:31,835][00228] Decorrelating experience for 320 frames... -[2025-02-27 20:32:31,915][00229] Decorrelating experience for 320 frames... -[2025-02-27 20:32:31,971][00220] Decorrelating experience for 192 frames... -[2025-02-27 20:32:32,005][00223] Decorrelating experience for 224 frames... -[2025-02-27 20:32:32,358][00234] Decorrelating experience for 288 frames... -[2025-02-27 20:32:32,671][00216] Decorrelating experience for 288 frames... -[2025-02-27 20:32:32,728][00236] Decorrelating experience for 256 frames... -[2025-02-27 20:32:33,035][00230] Worker 10, sleep for 0.500 sec to decorrelate experience collection -[2025-02-27 20:32:33,061][00235] Decorrelating experience for 256 frames... -[2025-02-27 20:32:33,365][00225] Decorrelating experience for 320 frames... -[2025-02-27 20:32:33,537][00230] Worker 10 awakens! -[2025-02-27 20:32:33,597][00221] Decorrelating experience for 224 frames... -[2025-02-27 20:32:33,743][00224] Decorrelating experience for 320 frames... -[2025-02-27 20:32:33,858][00228] Decorrelating experience for 352 frames... -[2025-02-27 20:32:34,284][00227] Decorrelating experience for 224 frames... -[2025-02-27 20:32:34,287][00234] Decorrelating experience for 320 frames... -[2025-02-27 20:32:34,476][00220] Decorrelating experience for 224 frames... -[2025-02-27 20:32:34,478][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 4.0. Samples: 120. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-02-27 20:32:34,480][00031] Avg episode reward: [(0, '1.520')] -[2025-02-27 20:32:34,575][00235] Decorrelating experience for 288 frames... -[2025-02-27 20:32:34,603][00226] Decorrelating experience for 352 frames... -[2025-02-27 20:32:34,978][00232] Decorrelating experience for 96 frames... -[2025-02-27 20:32:35,192][00216] Decorrelating experience for 320 frames... -[2025-02-27 20:32:35,506][00225] Decorrelating experience for 352 frames... -[2025-02-27 20:32:35,862][00219] Decorrelating experience for 288 frames... -[2025-02-27 20:32:36,027][00229] Decorrelating experience for 352 frames... -[2025-02-27 20:32:36,026][00224] Decorrelating experience for 352 frames... -[2025-02-27 20:32:36,117][00223] Decorrelating experience for 256 frames... -[2025-02-27 20:32:36,273][00220] Decorrelating experience for 256 frames... -[2025-02-27 20:32:36,873][00218] Decorrelating experience for 256 frames... -[2025-02-27 20:32:37,125][00221] Decorrelating experience for 256 frames... -[2025-02-27 20:32:37,257][00228] Worker 12, sleep for 0.600 sec to decorrelate experience collection -[2025-02-27 20:32:37,348][00236] Decorrelating experience for 288 frames... -[2025-02-27 20:32:37,648][00223] Decorrelating experience for 288 frames... -[2025-02-27 20:32:37,689][00233] Decorrelating experience for 192 frames... -[2025-02-27 20:32:37,864][00228] Worker 12 awakens! -[2025-02-27 20:32:38,314][00216] Decorrelating experience for 352 frames... -[2025-02-27 20:32:38,366][00234] Decorrelating experience for 352 frames... -[2025-02-27 20:32:38,492][00226] Worker 9, sleep for 0.450 sec to decorrelate experience collection -[2025-02-27 20:32:38,715][00220] Decorrelating experience for 288 frames... -[2025-02-27 20:32:38,777][00235] Decorrelating experience for 320 frames... -[2025-02-27 20:32:38,949][00226] Worker 9 awakens! -[2025-02-27 20:32:39,057][00232] Decorrelating experience for 128 frames... -[2025-02-27 20:32:39,119][00224] Worker 7, sleep for 0.350 sec to decorrelate experience collection -[2025-02-27 20:32:39,241][00225] Worker 8, sleep for 0.400 sec to decorrelate experience collection -[2025-02-27 20:32:39,473][00224] Worker 7 awakens! -[2025-02-27 20:32:39,478][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 70.5. Samples: 2466. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-02-27 20:32:39,483][00031] Avg episode reward: [(0, '1.946')] -[2025-02-27 20:32:39,649][00225] Worker 8 awakens! -[2025-02-27 20:32:39,741][00231] Decorrelating experience for 128 frames... -[2025-02-27 20:32:40,274][00227] Decorrelating experience for 256 frames... -[2025-02-27 20:32:40,412][00219] Decorrelating experience for 320 frames... -[2025-02-27 20:32:40,476][00218] Decorrelating experience for 288 frames... -[2025-02-27 20:32:40,645][00229] Worker 13, sleep for 0.650 sec to decorrelate experience collection -[2025-02-27 20:32:40,724][00221] Decorrelating experience for 288 frames... -[2025-02-27 20:32:40,968][00222] Decorrelating experience for 320 frames... -[2025-02-27 20:32:41,301][00229] Worker 13 awakens! -[2025-02-27 20:32:41,409][00196] Signal inference workers to stop experience collection... -[2025-02-27 20:32:41,454][00217] InferenceWorker_p0-w0: stopping experience collection -[2025-02-27 20:32:41,624][00235] Decorrelating experience for 352 frames... -[2025-02-27 20:32:42,050][00233] Decorrelating experience for 224 frames... -[2025-02-27 20:32:42,342][00218] Decorrelating experience for 320 frames... -[2025-02-27 20:32:42,542][00236] Decorrelating experience for 320 frames... -[2025-02-27 20:32:42,601][00220] Decorrelating experience for 320 frames... -[2025-02-27 20:32:42,768][00232] Decorrelating experience for 160 frames... -[2025-02-27 20:32:42,788][00219] Decorrelating experience for 352 frames... -[2025-02-27 20:32:43,100][00233] Decorrelating experience for 256 frames... -[2025-02-27 20:32:43,398][00223] Decorrelating experience for 320 frames... -[2025-02-27 20:32:43,577][00196] Signal inference workers to resume experience collection... -[2025-02-27 20:32:43,578][00217] InferenceWorker_p0-w0: resuming experience collection -[2025-02-27 20:32:43,868][00221] Decorrelating experience for 320 frames... -[2025-02-27 20:32:44,009][00222] Decorrelating experience for 352 frames... -[2025-02-27 20:32:44,133][00231] Decorrelating experience for 160 frames... -[2025-02-27 20:32:44,478][00031] Fps is (10 sec: 1638.4, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 16384. Throughput: 0: 146.0. Samples: 5838. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-02-27 20:32:44,479][00031] Avg episode reward: [(0, '2.003')] -[2025-02-27 20:32:44,834][00218] Decorrelating experience for 352 frames... -[2025-02-27 20:32:45,210][00227] Decorrelating experience for 288 frames... -[2025-02-27 20:32:45,522][00220] Decorrelating experience for 352 frames... -[2025-02-27 20:32:45,632][00234] Worker 17, sleep for 0.850 sec to decorrelate experience collection -[2025-02-27 20:32:46,297][00233] Decorrelating experience for 288 frames... -[2025-02-27 20:32:46,397][00232] Decorrelating experience for 192 frames... -[2025-02-27 20:32:46,496][00234] Worker 17 awakens! -[2025-02-27 20:32:46,592][00235] Worker 18, sleep for 0.900 sec to decorrelate experience collection -[2025-02-27 20:32:46,632][00219] Worker 2, sleep for 0.100 sec to decorrelate experience collection -[2025-02-27 20:32:46,680][00223] Decorrelating experience for 352 frames... -[2025-02-27 20:32:46,734][00219] Worker 2 awakens! -[2025-02-27 20:32:47,501][00235] Worker 18 awakens! -[2025-02-27 20:32:47,515][00236] Decorrelating experience for 352 frames... -[2025-02-27 20:32:47,977][00221] Decorrelating experience for 352 frames... -[2025-02-27 20:32:48,702][00222] Worker 4, sleep for 0.200 sec to decorrelate experience collection -[2025-02-27 20:32:48,904][00222] Worker 4 awakens! -[2025-02-27 20:32:48,930][00227] Decorrelating experience for 320 frames... -[2025-02-27 20:32:49,478][00031] Fps is (10 sec: 4096.0, 60 sec: 910.2, 300 sec: 910.2). Total num frames: 40960. Throughput: 0: 220.7. Samples: 9930. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-02-27 20:32:49,479][00031] Avg episode reward: [(0, '2.529')] -[2025-02-27 20:32:49,560][00232] Decorrelating experience for 224 frames... -[2025-02-27 20:32:49,640][00231] Decorrelating experience for 192 frames... -[2025-02-27 20:32:49,664][00218] Worker 1, sleep for 0.050 sec to decorrelate experience collection -[2025-02-27 20:32:49,722][00218] Worker 1 awakens! -[2025-02-27 20:32:50,249][00220] Worker 3, sleep for 0.150 sec to decorrelate experience collection -[2025-02-27 20:32:50,400][00220] Worker 3 awakens! -[2025-02-27 20:32:50,779][00223] Worker 6, sleep for 0.300 sec to decorrelate experience collection -[2025-02-27 20:32:51,092][00223] Worker 6 awakens! -[2025-02-27 20:32:51,733][00233] Decorrelating experience for 320 frames... -[2025-02-27 20:32:51,839][00236] Worker 19, sleep for 0.950 sec to decorrelate experience collection -[2025-02-27 20:32:52,394][00221] Worker 5, sleep for 0.250 sec to decorrelate experience collection -[2025-02-27 20:32:52,497][00227] Decorrelating experience for 352 frames... -[2025-02-27 20:32:52,650][00221] Worker 5 awakens! -[2025-02-27 20:32:52,667][00232] Decorrelating experience for 256 frames... -[2025-02-27 20:32:52,800][00236] Worker 19 awakens! -[2025-02-27 20:32:53,167][00231] Decorrelating experience for 224 frames... -[2025-02-27 20:32:53,972][00217] Updated weights for policy 0, policy_version 10 (0.0281) -[2025-02-27 20:32:54,478][00031] Fps is (10 sec: 6553.5, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 81920. Throughput: 0: 487.1. Samples: 21918. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:32:54,482][00031] Avg episode reward: [(0, '3.531')] -[2025-02-27 20:32:55,954][00233] Decorrelating experience for 352 frames... -[2025-02-27 20:32:56,205][00232] Decorrelating experience for 288 frames... -[2025-02-27 20:32:56,495][00231] Decorrelating experience for 256 frames... -[2025-02-27 20:32:57,013][00227] Worker 11, sleep for 0.550 sec to decorrelate experience collection -[2025-02-27 20:32:57,574][00227] Worker 11 awakens! -[2025-02-27 20:32:59,480][00031] Fps is (10 sec: 8190.5, 60 sec: 2234.1, 300 sec: 2234.1). Total num frames: 122880. Throughput: 0: 774.0. Samples: 34830. Policy #0 lag: (min: 0.0, avg: 1.9, max: 3.0) -[2025-02-27 20:32:59,481][00031] Avg episode reward: [(0, '4.125')] -[2025-02-27 20:32:59,485][00196] Saving new best policy, reward=4.125! -[2025-02-27 20:33:00,630][00231] Decorrelating experience for 288 frames... -[2025-02-27 20:33:00,676][00232] Decorrelating experience for 320 frames... -[2025-02-27 20:33:01,382][00233] Worker 16, sleep for 0.800 sec to decorrelate experience collection -[2025-02-27 20:33:02,190][00233] Worker 16 awakens! -[2025-02-27 20:33:03,698][00217] Updated weights for policy 0, policy_version 20 (0.0014) -[2025-02-27 20:33:04,235][00231] Decorrelating experience for 320 frames... -[2025-02-27 20:33:04,478][00031] Fps is (10 sec: 9011.1, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 912.7. Samples: 41070. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) -[2025-02-27 20:33:04,480][00031] Avg episode reward: [(0, '4.305')] -[2025-02-27 20:33:04,486][00196] Saving new best policy, reward=4.305! -[2025-02-27 20:33:06,495][00232] Decorrelating experience for 352 frames... -[2025-02-27 20:33:08,806][00231] Decorrelating experience for 352 frames... -[2025-02-27 20:33:09,479][00031] Fps is (10 sec: 9831.0, 60 sec: 3686.3, 300 sec: 3402.8). Total num frames: 221184. Throughput: 0: 1232.2. Samples: 55452. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:33:09,481][00031] Avg episode reward: [(0, '4.297')] -[2025-02-27 20:33:10,963][00232] Worker 15, sleep for 0.750 sec to decorrelate experience collection -[2025-02-27 20:33:11,728][00232] Worker 15 awakens! -[2025-02-27 20:33:12,027][00217] Updated weights for policy 0, policy_version 30 (0.0014) -[2025-02-27 20:33:13,635][00231] Worker 14, sleep for 0.700 sec to decorrelate experience collection -[2025-02-27 20:33:14,361][00231] Worker 14 awakens! -[2025-02-27 20:33:14,478][00031] Fps is (10 sec: 9830.4, 60 sec: 4505.6, 300 sec: 3861.9). Total num frames: 270336. Throughput: 0: 1573.9. Samples: 70824. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) -[2025-02-27 20:33:14,480][00031] Avg episode reward: [(0, '3.975')] -[2025-02-27 20:33:19,413][00217] Updated weights for policy 0, policy_version 40 (0.0016) -[2025-02-27 20:33:19,478][00031] Fps is (10 sec: 10650.8, 60 sec: 5461.3, 300 sec: 4369.1). Total num frames: 327680. Throughput: 0: 1746.7. Samples: 78720. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:33:19,481][00031] Avg episode reward: [(0, '4.199')] -[2025-02-27 20:33:24,478][00031] Fps is (10 sec: 10649.7, 60 sec: 6280.5, 300 sec: 4710.4). Total num frames: 376832. Throughput: 0: 2047.2. Samples: 94590. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:33:24,483][00031] Avg episode reward: [(0, '4.609')] -[2025-02-27 20:33:24,489][00196] Saving new best policy, reward=4.609! -[2025-02-27 20:33:27,277][00217] Updated weights for policy 0, policy_version 50 (0.0016) -[2025-02-27 20:33:29,478][00031] Fps is (10 sec: 9830.5, 60 sec: 7099.7, 300 sec: 5011.6). Total num frames: 425984. Throughput: 0: 2327.3. Samples: 110568. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:33:29,480][00031] Avg episode reward: [(0, '4.592')] -[2025-02-27 20:33:34,478][00031] Fps is (10 sec: 9830.6, 60 sec: 7919.0, 300 sec: 5279.3). Total num frames: 475136. Throughput: 0: 2399.7. Samples: 117918. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:33:34,479][00031] Avg episode reward: [(0, '4.586')] -[2025-02-27 20:33:34,486][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_475136.pth... -[2025-02-27 20:33:35,679][00217] Updated weights for policy 0, policy_version 60 (0.0019) -[2025-02-27 20:33:39,478][00031] Fps is (10 sec: 9830.4, 60 sec: 8738.1, 300 sec: 5518.8). Total num frames: 524288. Throughput: 0: 2470.3. Samples: 133080. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:33:39,479][00031] Avg episode reward: [(0, '4.637')] -[2025-02-27 20:33:39,482][00196] Saving new best policy, reward=4.637! -[2025-02-27 20:33:43,439][00217] Updated weights for policy 0, policy_version 70 (0.0016) -[2025-02-27 20:33:44,478][00031] Fps is (10 sec: 10649.4, 60 sec: 9420.8, 300 sec: 5816.3). Total num frames: 581632. Throughput: 0: 2531.0. Samples: 148722. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:33:44,480][00031] Avg episode reward: [(0, '4.409')] -[2025-02-27 20:33:49,478][00031] Fps is (10 sec: 10649.6, 60 sec: 9830.4, 300 sec: 6007.5). Total num frames: 630784. Throughput: 0: 2570.3. Samples: 156732. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:33:49,481][00031] Avg episode reward: [(0, '4.401')] -[2025-02-27 20:33:51,474][00217] Updated weights for policy 0, policy_version 80 (0.0014) -[2025-02-27 20:33:54,478][00031] Fps is (10 sec: 9830.6, 60 sec: 9967.0, 300 sec: 6181.2). Total num frames: 679936. Throughput: 0: 2600.3. Samples: 172464. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:33:54,480][00031] Avg episode reward: [(0, '4.385')] -[2025-02-27 20:33:58,755][00217] Updated weights for policy 0, policy_version 90 (0.0014) -[2025-02-27 20:33:59,479][00031] Fps is (10 sec: 10648.7, 60 sec: 10240.2, 300 sec: 6411.1). Total num frames: 737280. Throughput: 0: 2612.2. Samples: 188376. Policy #0 lag: (min: 0.0, avg: 2.4, max: 4.0) -[2025-02-27 20:33:59,482][00031] Avg episode reward: [(0, '4.522')] -[2025-02-27 20:34:04,478][00031] Fps is (10 sec: 11468.8, 60 sec: 10376.6, 300 sec: 6621.9). Total num frames: 794624. Throughput: 0: 2614.9. Samples: 196392. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:34:04,479][00031] Avg episode reward: [(0, '4.461')] -[2025-02-27 20:34:07,063][00217] Updated weights for policy 0, policy_version 100 (0.0018) -[2025-02-27 20:34:09,478][00031] Fps is (10 sec: 9831.3, 60 sec: 10240.2, 300 sec: 6684.7). Total num frames: 835584. Throughput: 0: 2586.1. Samples: 210966. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:34:09,480][00031] Avg episode reward: [(0, '4.434')] -[2025-02-27 20:34:14,478][00031] Fps is (10 sec: 9829.9, 60 sec: 10376.5, 300 sec: 6868.7). Total num frames: 892928. Throughput: 0: 2581.8. Samples: 226752. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:34:14,480][00031] Avg episode reward: [(0, '4.652')] -[2025-02-27 20:34:14,488][00196] Saving new best policy, reward=4.652! -[2025-02-27 20:34:15,199][00217] Updated weights for policy 0, policy_version 110 (0.0018) -[2025-02-27 20:34:19,478][00031] Fps is (10 sec: 11468.8, 60 sec: 10376.6, 300 sec: 7039.1). Total num frames: 950272. Throughput: 0: 2594.5. Samples: 234672. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:34:19,480][00031] Avg episode reward: [(0, '4.689')] -[2025-02-27 20:34:19,483][00196] Saving new best policy, reward=4.689! -[2025-02-27 20:34:22,656][00217] Updated weights for policy 0, policy_version 120 (0.0014) -[2025-02-27 20:34:24,478][00031] Fps is (10 sec: 10650.2, 60 sec: 10376.6, 300 sec: 7138.7). Total num frames: 999424. Throughput: 0: 2607.9. Samples: 250434. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:34:24,480][00031] Avg episode reward: [(0, '4.566')] -[2025-02-27 20:34:29,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 7231.6). Total num frames: 1048576. Throughput: 0: 2614.4. Samples: 266370. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:34:29,481][00031] Avg episode reward: [(0, '4.475')] -[2025-02-27 20:34:30,511][00217] Updated weights for policy 0, policy_version 130 (0.0015) -[2025-02-27 20:34:34,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 7372.8). Total num frames: 1105920. Throughput: 0: 2615.7. Samples: 274440. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:34:34,484][00031] Avg episode reward: [(0, '4.625')] -[2025-02-27 20:34:38,380][00217] Updated weights for policy 0, policy_version 140 (0.0014) -[2025-02-27 20:34:39,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 7399.2). Total num frames: 1146880. Throughput: 0: 2605.2. Samples: 289698. Policy #0 lag: (min: 0.0, avg: 2.4, max: 4.0) -[2025-02-27 20:34:39,479][00031] Avg episode reward: [(0, '4.614')] -[2025-02-27 20:34:44,478][00031] Fps is (10 sec: 9830.5, 60 sec: 10376.6, 300 sec: 7526.4). Total num frames: 1204224. Throughput: 0: 2591.1. Samples: 304974. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:34:44,479][00031] Avg episode reward: [(0, '4.553')] -[2025-02-27 20:34:46,256][00217] Updated weights for policy 0, policy_version 150 (0.0014) -[2025-02-27 20:34:49,478][00031] Fps is (10 sec: 11468.8, 60 sec: 10513.1, 300 sec: 7645.9). Total num frames: 1261568. Throughput: 0: 2593.7. Samples: 313110. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:34:49,480][00031] Avg episode reward: [(0, '4.166')] -[2025-02-27 20:34:54,192][00217] Updated weights for policy 0, policy_version 160 (0.0015) -[2025-02-27 20:34:54,478][00031] Fps is (10 sec: 10649.0, 60 sec: 10513.0, 300 sec: 7710.1). Total num frames: 1310720. Throughput: 0: 2624.6. Samples: 329076. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:34:54,480][00031] Avg episode reward: [(0, '4.633')] -[2025-02-27 20:34:59,478][00031] Fps is (10 sec: 10649.2, 60 sec: 10513.1, 300 sec: 7817.5). Total num frames: 1368064. Throughput: 0: 2631.1. Samples: 345150. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:34:59,480][00031] Avg episode reward: [(0, '4.519')] -[2025-02-27 20:35:01,648][00217] Updated weights for policy 0, policy_version 170 (0.0014) -[2025-02-27 20:35:04,478][00031] Fps is (10 sec: 11469.4, 60 sec: 10513.1, 300 sec: 7918.9). Total num frames: 1425408. Throughput: 0: 2633.9. Samples: 353196. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) -[2025-02-27 20:35:04,481][00031] Avg episode reward: [(0, '4.666')] -[2025-02-27 20:35:09,356][00217] Updated weights for policy 0, policy_version 180 (0.0017) -[2025-02-27 20:35:09,481][00031] Fps is (10 sec: 10647.1, 60 sec: 10649.1, 300 sec: 7970.5). Total num frames: 1474560. Throughput: 0: 2638.2. Samples: 369162. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:35:09,490][00031] Avg episode reward: [(0, '4.533')] -[2025-02-27 20:35:14,478][00031] Fps is (10 sec: 9011.3, 60 sec: 10376.6, 300 sec: 7976.4). Total num frames: 1515520. Throughput: 0: 2610.1. Samples: 383826. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:35:14,480][00031] Avg episode reward: [(0, '4.461')] -[2025-02-27 20:35:17,532][00217] Updated weights for policy 0, policy_version 190 (0.0014) -[2025-02-27 20:35:19,478][00031] Fps is (10 sec: 9833.1, 60 sec: 10376.5, 300 sec: 8066.0). Total num frames: 1572864. Throughput: 0: 2608.4. Samples: 391818. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:35:19,479][00031] Avg episode reward: [(0, '4.510')] -[2025-02-27 20:35:24,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 8110.1). Total num frames: 1622016. Throughput: 0: 2624.9. Samples: 407820. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:35:24,480][00031] Avg episode reward: [(0, '4.663')] -[2025-02-27 20:35:25,275][00217] Updated weights for policy 0, policy_version 200 (0.0015) -[2025-02-27 20:35:29,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 8192.0). Total num frames: 1679360. Throughput: 0: 2643.3. Samples: 423924. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:35:29,483][00031] Avg episode reward: [(0, '4.727')] -[2025-02-27 20:35:29,486][00196] Saving new best policy, reward=4.727! -[2025-02-27 20:35:32,904][00217] Updated weights for policy 0, policy_version 210 (0.0014) -[2025-02-27 20:35:34,478][00031] Fps is (10 sec: 11468.4, 60 sec: 10513.0, 300 sec: 8270.0). Total num frames: 1736704. Throughput: 0: 2639.3. Samples: 431880. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:35:34,480][00031] Avg episode reward: [(0, '4.875')] -[2025-02-27 20:35:34,489][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000212_1736704.pth... -[2025-02-27 20:35:34,634][00196] Saving new best policy, reward=4.875! -[2025-02-27 20:35:39,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10649.6, 300 sec: 8306.3). Total num frames: 1785856. Throughput: 0: 2638.7. Samples: 447816. Policy #0 lag: (min: 0.0, avg: 2.3, max: 4.0) -[2025-02-27 20:35:39,479][00031] Avg episode reward: [(0, '4.750')] -[2025-02-27 20:35:40,425][00217] Updated weights for policy 0, policy_version 220 (0.0017) -[2025-02-27 20:35:44,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10513.0, 300 sec: 8340.9). Total num frames: 1835008. Throughput: 0: 2614.5. Samples: 462804. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:35:44,480][00031] Avg episode reward: [(0, '4.601')] -[2025-02-27 20:35:48,494][00217] Updated weights for policy 0, policy_version 230 (0.0014) -[2025-02-27 20:35:49,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 8374.0). Total num frames: 1884160. Throughput: 0: 2608.5. Samples: 470580. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:35:49,479][00031] Avg episode reward: [(0, '4.849')] -[2025-02-27 20:35:54,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 8441.3). Total num frames: 1941504. Throughput: 0: 2610.9. Samples: 486648. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:35:54,482][00031] Avg episode reward: [(0, '4.699')] -[2025-02-27 20:35:56,520][00217] Updated weights for policy 0, policy_version 240 (0.0014) -[2025-02-27 20:35:59,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.6, 300 sec: 8470.9). Total num frames: 1990656. Throughput: 0: 2632.3. Samples: 502278. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:35:59,479][00031] Avg episode reward: [(0, '4.618')] -[2025-02-27 20:36:04,295][00217] Updated weights for policy 0, policy_version 250 (0.0014) -[2025-02-27 20:36:04,478][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.5, 300 sec: 8533.3). Total num frames: 2048000. Throughput: 0: 2633.6. Samples: 510330. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:36:04,482][00031] Avg episode reward: [(0, '4.634')] -[2025-02-27 20:36:09,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10377.0, 300 sec: 8559.8). Total num frames: 2097152. Throughput: 0: 2626.3. Samples: 526002. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:36:09,480][00031] Avg episode reward: [(0, '4.544')] -[2025-02-27 20:36:11,835][00217] Updated weights for policy 0, policy_version 260 (0.0014) -[2025-02-27 20:36:14,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10649.6, 300 sec: 8618.0). Total num frames: 2154496. Throughput: 0: 2617.2. Samples: 541698. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:36:14,481][00031] Avg episode reward: [(0, '4.605')] -[2025-02-27 20:36:19,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 8609.6). Total num frames: 2195456. Throughput: 0: 2588.2. Samples: 548346. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:36:19,481][00031] Avg episode reward: [(0, '4.428')] -[2025-02-27 20:36:20,198][00217] Updated weights for policy 0, policy_version 270 (0.0017) -[2025-02-27 20:36:24,478][00031] Fps is (10 sec: 9830.1, 60 sec: 10513.0, 300 sec: 8664.6). Total num frames: 2252800. Throughput: 0: 2580.3. Samples: 563928. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:36:24,481][00031] Avg episode reward: [(0, '4.526')] -[2025-02-27 20:36:28,045][00217] Updated weights for policy 0, policy_version 280 (0.0014) -[2025-02-27 20:36:29,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.5, 300 sec: 8686.6). Total num frames: 2301952. Throughput: 0: 2595.6. Samples: 579606. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:36:29,480][00031] Avg episode reward: [(0, '4.722')] -[2025-02-27 20:36:34,478][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.6, 300 sec: 8738.1). Total num frames: 2359296. Throughput: 0: 2597.1. Samples: 587448. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:36:34,482][00031] Avg episode reward: [(0, '4.404')] -[2025-02-27 20:36:35,995][00217] Updated weights for policy 0, policy_version 290 (0.0014) -[2025-02-27 20:36:39,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.5, 300 sec: 8758.0). Total num frames: 2408448. Throughput: 0: 2593.4. Samples: 603348. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) -[2025-02-27 20:36:39,481][00031] Avg episode reward: [(0, '4.322')] -[2025-02-27 20:36:43,780][00217] Updated weights for policy 0, policy_version 300 (0.0016) -[2025-02-27 20:36:44,478][00031] Fps is (10 sec: 9829.9, 60 sec: 10376.5, 300 sec: 8777.1). Total num frames: 2457600. Throughput: 0: 2595.6. Samples: 619080. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:36:44,481][00031] Avg episode reward: [(0, '4.742')] -[2025-02-27 20:36:49,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 8795.6). Total num frames: 2506752. Throughput: 0: 2590.1. Samples: 626886. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:36:49,480][00031] Avg episode reward: [(0, '4.705')] -[2025-02-27 20:36:52,310][00217] Updated weights for policy 0, policy_version 310 (0.0018) -[2025-02-27 20:36:54,478][00031] Fps is (10 sec: 10649.3, 60 sec: 10376.5, 300 sec: 8841.7). Total num frames: 2564096. Throughput: 0: 2558.0. Samples: 641112. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:36:54,482][00031] Avg episode reward: [(0, '4.680')] -[2025-02-27 20:36:59,478][00031] Fps is (10 sec: 10649.1, 60 sec: 10376.4, 300 sec: 8858.5). Total num frames: 2613248. Throughput: 0: 2555.4. Samples: 656694. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:36:59,481][00031] Avg episode reward: [(0, '4.502')] -[2025-02-27 20:37:00,114][00217] Updated weights for policy 0, policy_version 320 (0.0014) -[2025-02-27 20:37:04,478][00031] Fps is (10 sec: 9831.1, 60 sec: 10240.0, 300 sec: 9025.1). Total num frames: 2662400. Throughput: 0: 2582.8. Samples: 664572. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:37:04,480][00031] Avg episode reward: [(0, '4.686')] -[2025-02-27 20:37:07,652][00217] Updated weights for policy 0, policy_version 330 (0.0019) -[2025-02-27 20:37:09,478][00031] Fps is (10 sec: 10650.1, 60 sec: 10376.5, 300 sec: 9219.5). Total num frames: 2719744. Throughput: 0: 2584.5. Samples: 680232. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:37:09,480][00031] Avg episode reward: [(0, '4.672')] -[2025-02-27 20:37:14,480][00031] Fps is (10 sec: 10647.7, 60 sec: 10239.7, 300 sec: 9386.0). Total num frames: 2768896. Throughput: 0: 2587.2. Samples: 696036. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:37:14,481][00031] Avg episode reward: [(0, '4.767')] -[2025-02-27 20:37:15,642][00217] Updated weights for policy 0, policy_version 340 (0.0017) -[2025-02-27 20:37:19,478][00031] Fps is (10 sec: 9830.1, 60 sec: 10376.5, 300 sec: 9552.7). Total num frames: 2818048. Throughput: 0: 2587.3. Samples: 703878. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:37:19,483][00031] Avg episode reward: [(0, '4.703')] -[2025-02-27 20:37:23,860][00217] Updated weights for policy 0, policy_version 350 (0.0014) -[2025-02-27 20:37:24,478][00031] Fps is (10 sec: 9832.1, 60 sec: 10240.0, 300 sec: 9719.3). Total num frames: 2867200. Throughput: 0: 2557.9. Samples: 718452. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:37:24,479][00031] Avg episode reward: [(0, '4.547')] -[2025-02-27 20:37:29,478][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.6, 300 sec: 9913.7). Total num frames: 2924544. Throughput: 0: 2554.6. Samples: 734034. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:37:29,482][00031] Avg episode reward: [(0, '4.802')] -[2025-02-27 20:37:31,770][00217] Updated weights for policy 0, policy_version 360 (0.0014) -[2025-02-27 20:37:34,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10080.3). Total num frames: 2973696. Throughput: 0: 2555.6. Samples: 741888. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:37:34,481][00031] Avg episode reward: [(0, '4.816')] -[2025-02-27 20:37:34,496][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000363_2973696.pth... -[2025-02-27 20:37:34,644][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_475136.pth -[2025-02-27 20:37:39,478][00031] Fps is (10 sec: 9830.1, 60 sec: 10239.9, 300 sec: 10191.4). Total num frames: 3022848. Throughput: 0: 2575.8. Samples: 757020. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:37:39,481][00031] Avg episode reward: [(0, '4.824')] -[2025-02-27 20:37:39,878][00217] Updated weights for policy 0, policy_version 370 (0.0014) -[2025-02-27 20:37:44,478][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.1, 300 sec: 10274.7). Total num frames: 3072000. Throughput: 0: 2573.5. Samples: 772500. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:37:44,481][00031] Avg episode reward: [(0, '4.840')] -[2025-02-27 20:37:47,701][00217] Updated weights for policy 0, policy_version 380 (0.0015) -[2025-02-27 20:37:49,478][00031] Fps is (10 sec: 10649.4, 60 sec: 10376.4, 300 sec: 10330.2). Total num frames: 3129344. Throughput: 0: 2570.6. Samples: 780252. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:37:49,480][00031] Avg episode reward: [(0, '4.370')] -[2025-02-27 20:37:54,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10240.1, 300 sec: 10358.1). Total num frames: 3178496. Throughput: 0: 2568.7. Samples: 795822. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) -[2025-02-27 20:37:54,480][00031] Avg episode reward: [(0, '4.772')] -[2025-02-27 20:37:56,212][00217] Updated weights for policy 0, policy_version 390 (0.0014) -[2025-02-27 20:37:59,478][00031] Fps is (10 sec: 9831.0, 60 sec: 10240.1, 300 sec: 10358.0). Total num frames: 3227648. Throughput: 0: 2531.3. Samples: 809940. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:37:59,483][00031] Avg episode reward: [(0, '4.954')] -[2025-02-27 20:37:59,484][00196] Saving new best policy, reward=4.954! -[2025-02-27 20:38:04,409][00217] Updated weights for policy 0, policy_version 400 (0.0014) -[2025-02-27 20:38:04,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10358.1). Total num frames: 3276800. Throughput: 0: 2529.9. Samples: 817722. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:38:04,479][00031] Avg episode reward: [(0, '4.756')] -[2025-02-27 20:38:09,478][00031] Fps is (10 sec: 9830.1, 60 sec: 10103.4, 300 sec: 10358.0). Total num frames: 3325952. Throughput: 0: 2549.5. Samples: 833178. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:38:09,480][00031] Avg episode reward: [(0, '4.749')] -[2025-02-27 20:38:12,018][00217] Updated weights for policy 0, policy_version 410 (0.0020) -[2025-02-27 20:38:14,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.8, 300 sec: 10330.3). Total num frames: 3375104. Throughput: 0: 2544.4. Samples: 848532. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:38:14,479][00031] Avg episode reward: [(0, '4.848')] -[2025-02-27 20:38:19,478][00031] Fps is (10 sec: 10649.9, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 3432448. Throughput: 0: 2541.5. Samples: 856254. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:38:19,480][00031] Avg episode reward: [(0, '4.696')] -[2025-02-27 20:38:19,953][00217] Updated weights for policy 0, policy_version 420 (0.0018) -[2025-02-27 20:38:24,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 3481600. Throughput: 0: 2551.4. Samples: 871830. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:38:24,479][00031] Avg episode reward: [(0, '5.137')] -[2025-02-27 20:38:24,490][00196] Saving new best policy, reward=5.137! -[2025-02-27 20:38:28,409][00217] Updated weights for policy 0, policy_version 430 (0.0016) -[2025-02-27 20:38:29,479][00031] Fps is (10 sec: 9829.0, 60 sec: 10103.2, 300 sec: 10358.0). Total num frames: 3530752. Throughput: 0: 2519.9. Samples: 885900. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:38:29,481][00031] Avg episode reward: [(0, '5.723')] -[2025-02-27 20:38:29,484][00196] Saving new best policy, reward=5.723! -[2025-02-27 20:38:34,479][00031] Fps is (10 sec: 9829.4, 60 sec: 10103.3, 300 sec: 10358.0). Total num frames: 3579904. Throughput: 0: 2518.0. Samples: 893562. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:38:34,481][00031] Avg episode reward: [(0, '5.182')] -[2025-02-27 20:38:36,500][00217] Updated weights for policy 0, policy_version 440 (0.0015) -[2025-02-27 20:38:39,478][00031] Fps is (10 sec: 9831.8, 60 sec: 10103.5, 300 sec: 10330.3). Total num frames: 3629056. Throughput: 0: 2516.1. Samples: 909048. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:38:39,482][00031] Avg episode reward: [(0, '4.774')] -[2025-02-27 20:38:44,478][00031] Fps is (10 sec: 9830.8, 60 sec: 10103.4, 300 sec: 10330.2). Total num frames: 3678208. Throughput: 0: 2546.0. Samples: 924510. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:38:44,480][00031] Avg episode reward: [(0, '5.109')] -[2025-02-27 20:38:44,488][00217] Updated weights for policy 0, policy_version 450 (0.0014) -[2025-02-27 20:38:49,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.6, 300 sec: 10358.0). Total num frames: 3735552. Throughput: 0: 2544.4. Samples: 932220. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:38:49,482][00031] Avg episode reward: [(0, '4.840')] -[2025-02-27 20:38:52,584][00217] Updated weights for policy 0, policy_version 460 (0.0014) -[2025-02-27 20:38:54,478][00031] Fps is (10 sec: 10650.2, 60 sec: 10103.5, 300 sec: 10330.3). Total num frames: 3784704. Throughput: 0: 2544.8. Samples: 947694. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:38:54,480][00031] Avg episode reward: [(0, '5.046')] -[2025-02-27 20:38:59,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10302.5). Total num frames: 3833856. Throughput: 0: 2542.8. Samples: 962958. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:38:59,480][00031] Avg episode reward: [(0, '5.409')] -[2025-02-27 20:39:00,661][00217] Updated weights for policy 0, policy_version 470 (0.0017) -[2025-02-27 20:39:04,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10330.2). Total num frames: 3883008. Throughput: 0: 2508.9. Samples: 969156. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:39:04,480][00031] Avg episode reward: [(0, '5.046')] -[2025-02-27 20:39:08,728][00217] Updated weights for policy 0, policy_version 480 (0.0015) -[2025-02-27 20:39:09,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10302.5). Total num frames: 3932160. Throughput: 0: 2508.3. Samples: 984702. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:39:09,479][00031] Avg episode reward: [(0, '5.117')] -[2025-02-27 20:39:14,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 3989504. Throughput: 0: 2541.3. Samples: 1000254. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:39:14,480][00031] Avg episode reward: [(0, '4.895')] -[2025-02-27 20:39:16,902][00217] Updated weights for policy 0, policy_version 490 (0.0016) -[2025-02-27 20:39:19,478][00031] Fps is (10 sec: 10649.3, 60 sec: 10103.4, 300 sec: 10302.5). Total num frames: 4038656. Throughput: 0: 2543.6. Samples: 1008024. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:39:19,480][00031] Avg episode reward: [(0, '4.712')] -[2025-02-27 20:39:24,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10302.5). Total num frames: 4087808. Throughput: 0: 2546.1. Samples: 1023624. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:39:24,479][00031] Avg episode reward: [(0, '4.918')] -[2025-02-27 20:39:24,636][00217] Updated weights for policy 0, policy_version 500 (0.0015) -[2025-02-27 20:39:29,478][00031] Fps is (10 sec: 10649.3, 60 sec: 10240.1, 300 sec: 10302.5). Total num frames: 4145152. Throughput: 0: 2546.7. Samples: 1039110. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:39:29,480][00031] Avg episode reward: [(0, '5.098')] -[2025-02-27 20:39:32,554][00217] Updated weights for policy 0, policy_version 510 (0.0014) -[2025-02-27 20:39:34,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.6, 300 sec: 10302.5). Total num frames: 4186112. Throughput: 0: 2545.9. Samples: 1046784. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:39:34,480][00031] Avg episode reward: [(0, '4.985')] -[2025-02-27 20:39:34,492][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000511_4186112.pth... -[2025-02-27 20:39:34,639][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000212_1736704.pth -[2025-02-27 20:39:39,478][00031] Fps is (10 sec: 9011.7, 60 sec: 10103.5, 300 sec: 10274.7). Total num frames: 4235264. Throughput: 0: 2513.7. Samples: 1060812. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:39:39,480][00031] Avg episode reward: [(0, '4.526')] -[2025-02-27 20:39:41,341][00217] Updated weights for policy 0, policy_version 520 (0.0021) -[2025-02-27 20:39:44,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.1, 300 sec: 10274.7). Total num frames: 4292608. Throughput: 0: 2526.3. Samples: 1076640. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) -[2025-02-27 20:39:44,479][00031] Avg episode reward: [(0, '4.838')] -[2025-02-27 20:39:48,754][00217] Updated weights for policy 0, policy_version 530 (0.0017) -[2025-02-27 20:39:49,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10274.7). Total num frames: 4341760. Throughput: 0: 2561.3. Samples: 1084416. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:39:49,479][00031] Avg episode reward: [(0, '4.682')] -[2025-02-27 20:39:54,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 4399104. Throughput: 0: 2565.2. Samples: 1100136. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:39:54,480][00031] Avg episode reward: [(0, '4.963')] -[2025-02-27 20:39:56,921][00217] Updated weights for policy 0, policy_version 540 (0.0017) -[2025-02-27 20:39:59,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10246.9). Total num frames: 4448256. Throughput: 0: 2566.5. Samples: 1115748. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:39:59,480][00031] Avg episode reward: [(0, '4.968')] -[2025-02-27 20:40:04,478][00031] Fps is (10 sec: 9830.6, 60 sec: 10240.0, 300 sec: 10247.0). Total num frames: 4497408. Throughput: 0: 2567.7. Samples: 1123572. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:40:04,480][00031] Avg episode reward: [(0, '5.042')] -[2025-02-27 20:40:04,614][00217] Updated weights for policy 0, policy_version 550 (0.0014) -[2025-02-27 20:40:09,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 4546560. Throughput: 0: 2536.1. Samples: 1137750. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:40:09,479][00031] Avg episode reward: [(0, '5.023')] -[2025-02-27 20:40:12,831][00217] Updated weights for policy 0, policy_version 560 (0.0014) -[2025-02-27 20:40:14,480][00031] Fps is (10 sec: 9828.7, 60 sec: 10103.2, 300 sec: 10246.9). Total num frames: 4595712. Throughput: 0: 2539.1. Samples: 1153374. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:40:14,481][00031] Avg episode reward: [(0, '5.002')] -[2025-02-27 20:40:19,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 4653056. Throughput: 0: 2540.1. Samples: 1161090. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:40:19,479][00031] Avg episode reward: [(0, '4.845')] -[2025-02-27 20:40:20,939][00217] Updated weights for policy 0, policy_version 570 (0.0018) -[2025-02-27 20:40:24,478][00031] Fps is (10 sec: 10651.5, 60 sec: 10240.0, 300 sec: 10246.9). Total num frames: 4702208. Throughput: 0: 2577.1. Samples: 1176780. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:40:24,479][00031] Avg episode reward: [(0, '5.028')] -[2025-02-27 20:40:28,656][00217] Updated weights for policy 0, policy_version 580 (0.0014) -[2025-02-27 20:40:29,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.6, 300 sec: 10219.2). Total num frames: 4751360. Throughput: 0: 2571.5. Samples: 1192356. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:40:29,479][00031] Avg episode reward: [(0, '4.886')] -[2025-02-27 20:40:34,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 4808704. Throughput: 0: 2574.1. Samples: 1200252. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:40:34,479][00031] Avg episode reward: [(0, '5.216')] -[2025-02-27 20:40:36,389][00217] Updated weights for policy 0, policy_version 590 (0.0017) -[2025-02-27 20:40:39,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10247.0). Total num frames: 4857856. Throughput: 0: 2571.3. Samples: 1215846. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:40:39,480][00031] Avg episode reward: [(0, '5.091')] -[2025-02-27 20:40:44,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10246.9). Total num frames: 4907008. Throughput: 0: 2549.5. Samples: 1230474. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:40:44,480][00031] Avg episode reward: [(0, '5.098')] -[2025-02-27 20:40:44,876][00217] Updated weights for policy 0, policy_version 600 (0.0014) -[2025-02-27 20:40:49,480][00031] Fps is (10 sec: 10647.4, 60 sec: 10376.2, 300 sec: 10246.9). Total num frames: 4964352. Throughput: 0: 2553.7. Samples: 1238496. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:40:49,481][00031] Avg episode reward: [(0, '4.929')] -[2025-02-27 20:40:52,630][00217] Updated weights for policy 0, policy_version 610 (0.0020) -[2025-02-27 20:40:54,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10246.9). Total num frames: 5013504. Throughput: 0: 2594.7. Samples: 1254510. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:40:54,479][00031] Avg episode reward: [(0, '5.144')] -[2025-02-27 20:40:59,478][00031] Fps is (10 sec: 10651.8, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 5070848. Throughput: 0: 2597.4. Samples: 1270254. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:40:59,480][00031] Avg episode reward: [(0, '5.350')] -[2025-02-27 20:41:00,457][00217] Updated weights for policy 0, policy_version 620 (0.0018) -[2025-02-27 20:41:04,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 5120000. Throughput: 0: 2604.8. Samples: 1278306. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:41:04,481][00031] Avg episode reward: [(0, '4.967')] -[2025-02-27 20:41:08,165][00217] Updated weights for policy 0, policy_version 630 (0.0015) -[2025-02-27 20:41:09,478][00031] Fps is (10 sec: 9830.3, 60 sec: 10376.5, 300 sec: 10219.2). Total num frames: 5169152. Throughput: 0: 2610.4. Samples: 1294248. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) -[2025-02-27 20:41:09,479][00031] Avg episode reward: [(0, '5.300')] -[2025-02-27 20:41:14,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.8, 300 sec: 10246.9). Total num frames: 5218304. Throughput: 0: 2582.7. Samples: 1308576. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:41:14,482][00031] Avg episode reward: [(0, '5.334')] -[2025-02-27 20:41:16,294][00217] Updated weights for policy 0, policy_version 640 (0.0014) -[2025-02-27 20:41:19,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 5275648. Throughput: 0: 2584.4. Samples: 1316550. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:41:19,480][00031] Avg episode reward: [(0, '5.058')] -[2025-02-27 20:41:24,097][00217] Updated weights for policy 0, policy_version 650 (0.0018) -[2025-02-27 20:41:24,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 5324800. Throughput: 0: 2595.5. Samples: 1332642. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:41:24,481][00031] Avg episode reward: [(0, '5.164')] -[2025-02-27 20:41:29,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10513.1, 300 sec: 10246.9). Total num frames: 5382144. Throughput: 0: 2624.9. Samples: 1348596. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:41:29,480][00031] Avg episode reward: [(0, '5.411')] -[2025-02-27 20:41:31,678][00217] Updated weights for policy 0, policy_version 660 (0.0014) -[2025-02-27 20:41:34,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 5431296. Throughput: 0: 2624.7. Samples: 1356600. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:41:34,480][00031] Avg episode reward: [(0, '5.502')] -[2025-02-27 20:41:34,492][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000663_5431296.pth... -[2025-02-27 20:41:34,634][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000363_2973696.pth -[2025-02-27 20:41:39,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10247.0). Total num frames: 5480448. Throughput: 0: 2618.1. Samples: 1372326. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:41:39,479][00031] Avg episode reward: [(0, '5.414')] -[2025-02-27 20:41:39,508][00217] Updated weights for policy 0, policy_version 670 (0.0015) -[2025-02-27 20:41:44,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10274.7). Total num frames: 5537792. Throughput: 0: 2615.7. Samples: 1387962. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:41:44,479][00031] Avg episode reward: [(0, '5.260')] -[2025-02-27 20:41:47,811][00217] Updated weights for policy 0, policy_version 680 (0.0015) -[2025-02-27 20:41:49,478][00031] Fps is (10 sec: 10649.2, 60 sec: 10376.8, 300 sec: 10247.0). Total num frames: 5586944. Throughput: 0: 2591.8. Samples: 1394940. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:41:49,480][00031] Avg episode reward: [(0, '5.360')] -[2025-02-27 20:41:54,479][00031] Fps is (10 sec: 10648.7, 60 sec: 10512.9, 300 sec: 10274.7). Total num frames: 5644288. Throughput: 0: 2594.8. Samples: 1411014. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:41:54,480][00031] Avg episode reward: [(0, '5.543')] -[2025-02-27 20:41:55,360][00217] Updated weights for policy 0, policy_version 690 (0.0016) -[2025-02-27 20:41:59,478][00031] Fps is (10 sec: 10650.1, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 5693440. Throughput: 0: 2628.1. Samples: 1426842. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:41:59,479][00031] Avg episode reward: [(0, '5.460')] -[2025-02-27 20:42:03,052][00217] Updated weights for policy 0, policy_version 700 (0.0014) -[2025-02-27 20:42:04,478][00031] Fps is (10 sec: 9831.2, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 5742592. Throughput: 0: 2628.4. Samples: 1434828. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:42:04,480][00031] Avg episode reward: [(0, '6.108')] -[2025-02-27 20:42:04,491][00196] Saving new best policy, reward=6.108! -[2025-02-27 20:42:09,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10513.1, 300 sec: 10274.8). Total num frames: 5799936. Throughput: 0: 2616.9. Samples: 1450404. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:42:09,482][00031] Avg episode reward: [(0, '5.871')] -[2025-02-27 20:42:10,564][00217] Updated weights for policy 0, policy_version 710 (0.0015) -[2025-02-27 20:42:14,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10274.7). Total num frames: 5849088. Throughput: 0: 2615.9. Samples: 1466310. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:42:14,480][00031] Avg episode reward: [(0, '5.970')] -[2025-02-27 20:42:19,214][00217] Updated weights for policy 0, policy_version 720 (0.0014) -[2025-02-27 20:42:19,478][00031] Fps is (10 sec: 9830.5, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 5898240. Throughput: 0: 2598.4. Samples: 1473528. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:42:19,480][00031] Avg episode reward: [(0, '5.790')] -[2025-02-27 20:42:24,478][00031] Fps is (10 sec: 9830.3, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 5947392. Throughput: 0: 2585.2. Samples: 1488660. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:42:24,482][00031] Avg episode reward: [(0, '7.139')] -[2025-02-27 20:42:24,490][00196] Saving new best policy, reward=7.139! -[2025-02-27 20:42:26,981][00217] Updated weights for policy 0, policy_version 730 (0.0015) -[2025-02-27 20:42:29,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 6004736. Throughput: 0: 2587.3. Samples: 1504392. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:42:29,479][00031] Avg episode reward: [(0, '6.757')] -[2025-02-27 20:42:34,479][00031] Fps is (10 sec: 10648.1, 60 sec: 10376.3, 300 sec: 10274.7). Total num frames: 6053888. Throughput: 0: 2607.1. Samples: 1512264. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:42:34,482][00031] Avg episode reward: [(0, '7.122')] -[2025-02-27 20:42:34,720][00217] Updated weights for policy 0, policy_version 740 (0.0015) -[2025-02-27 20:42:39,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10302.5). Total num frames: 6111232. Throughput: 0: 2601.5. Samples: 1528080. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:42:39,480][00031] Avg episode reward: [(0, '7.034')] -[2025-02-27 20:42:42,383][00217] Updated weights for policy 0, policy_version 750 (0.0016) -[2025-02-27 20:42:44,478][00031] Fps is (10 sec: 10651.1, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 6160384. Throughput: 0: 2599.5. Samples: 1543818. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:42:44,480][00031] Avg episode reward: [(0, '7.034')] -[2025-02-27 20:42:49,479][00031] Fps is (10 sec: 9829.5, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 6209536. Throughput: 0: 2596.9. Samples: 1551690. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:42:49,482][00031] Avg episode reward: [(0, '7.271')] -[2025-02-27 20:42:49,548][00196] Saving new best policy, reward=7.271! -[2025-02-27 20:42:50,685][00217] Updated weights for policy 0, policy_version 760 (0.0015) -[2025-02-27 20:42:54,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.1, 300 sec: 10274.7). Total num frames: 6258688. Throughput: 0: 2572.3. Samples: 1566156. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:42:54,480][00031] Avg episode reward: [(0, '6.972')] -[2025-02-27 20:42:58,586][00217] Updated weights for policy 0, policy_version 770 (0.0015) -[2025-02-27 20:42:59,478][00031] Fps is (10 sec: 10650.5, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 6316032. Throughput: 0: 2568.1. Samples: 1581876. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) -[2025-02-27 20:42:59,480][00031] Avg episode reward: [(0, '7.761')] -[2025-02-27 20:42:59,482][00196] Saving new best policy, reward=7.761! -[2025-02-27 20:43:04,478][00031] Fps is (10 sec: 10649.2, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 6365184. Throughput: 0: 2581.7. Samples: 1589706. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:43:04,480][00031] Avg episode reward: [(0, '7.586')] -[2025-02-27 20:43:06,483][00217] Updated weights for policy 0, policy_version 780 (0.0015) -[2025-02-27 20:43:09,478][00031] Fps is (10 sec: 9830.1, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 6414336. Throughput: 0: 2593.5. Samples: 1605366. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:43:09,480][00031] Avg episode reward: [(0, '8.108')] -[2025-02-27 20:43:09,482][00196] Saving new best policy, reward=8.108! -[2025-02-27 20:43:14,129][00217] Updated weights for policy 0, policy_version 790 (0.0014) -[2025-02-27 20:43:14,479][00031] Fps is (10 sec: 10649.2, 60 sec: 10376.4, 300 sec: 10302.5). Total num frames: 6471680. Throughput: 0: 2595.8. Samples: 1621206. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:43:14,481][00031] Avg episode reward: [(0, '8.597')] -[2025-02-27 20:43:14,489][00196] Saving new best policy, reward=8.597! -[2025-02-27 20:43:19,478][00031] Fps is (10 sec: 10649.4, 60 sec: 10376.4, 300 sec: 10302.5). Total num frames: 6520832. Throughput: 0: 2591.5. Samples: 1628880. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:43:19,480][00031] Avg episode reward: [(0, '9.305')] -[2025-02-27 20:43:19,483][00196] Saving new best policy, reward=9.305! -[2025-02-27 20:43:22,529][00217] Updated weights for policy 0, policy_version 800 (0.0032) -[2025-02-27 20:43:24,478][00031] Fps is (10 sec: 9831.2, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 6569984. Throughput: 0: 2569.9. Samples: 1643724. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) -[2025-02-27 20:43:24,479][00031] Avg episode reward: [(0, '9.154')] -[2025-02-27 20:43:29,481][00031] Fps is (10 sec: 9828.2, 60 sec: 10239.5, 300 sec: 10302.4). Total num frames: 6619136. Throughput: 0: 2558.5. Samples: 1658958. Policy #0 lag: (min: 0.0, avg: 2.3, max: 4.0) -[2025-02-27 20:43:29,482][00031] Avg episode reward: [(0, '7.173')] -[2025-02-27 20:43:30,151][00217] Updated weights for policy 0, policy_version 810 (0.0015) -[2025-02-27 20:43:34,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.2, 300 sec: 10302.5). Total num frames: 6668288. Throughput: 0: 2561.0. Samples: 1666932. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:43:34,479][00031] Avg episode reward: [(0, '8.265')] -[2025-02-27 20:43:34,487][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000814_6668288.pth... -[2025-02-27 20:43:34,628][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000511_4186112.pth -[2025-02-27 20:43:38,255][00217] Updated weights for policy 0, policy_version 820 (0.0015) -[2025-02-27 20:43:39,478][00031] Fps is (10 sec: 10652.3, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 6725632. Throughput: 0: 2584.7. Samples: 1682466. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:43:39,480][00031] Avg episode reward: [(0, '8.230')] -[2025-02-27 20:43:44,478][00031] Fps is (10 sec: 11468.8, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 6782976. Throughput: 0: 2584.5. Samples: 1698180. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:43:44,480][00031] Avg episode reward: [(0, '8.142')] -[2025-02-27 20:43:46,066][00217] Updated weights for policy 0, policy_version 830 (0.0014) -[2025-02-27 20:43:49,478][00031] Fps is (10 sec: 10649.4, 60 sec: 10376.6, 300 sec: 10330.2). Total num frames: 6832128. Throughput: 0: 2584.9. Samples: 1706028. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:43:49,481][00031] Avg episode reward: [(0, '8.107')] -[2025-02-27 20:43:53,850][00217] Updated weights for policy 0, policy_version 840 (0.0014) -[2025-02-27 20:43:54,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10330.2). Total num frames: 6881280. Throughput: 0: 2591.6. Samples: 1721988. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:43:54,479][00031] Avg episode reward: [(0, '8.638')] -[2025-02-27 20:43:59,478][00031] Fps is (10 sec: 9830.6, 60 sec: 10240.0, 300 sec: 10330.2). Total num frames: 6930432. Throughput: 0: 2555.2. Samples: 1736190. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:43:59,480][00031] Avg episode reward: [(0, '10.497')] -[2025-02-27 20:43:59,482][00196] Saving new best policy, reward=10.497! -[2025-02-27 20:44:02,434][00217] Updated weights for policy 0, policy_version 850 (0.0015) -[2025-02-27 20:44:04,478][00031] Fps is (10 sec: 9830.2, 60 sec: 10240.0, 300 sec: 10330.2). Total num frames: 6979584. Throughput: 0: 2558.8. Samples: 1744026. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:44:04,480][00031] Avg episode reward: [(0, '9.334')] -[2025-02-27 20:44:09,478][00031] Fps is (10 sec: 10649.8, 60 sec: 10376.6, 300 sec: 10330.2). Total num frames: 7036928. Throughput: 0: 2576.3. Samples: 1759656. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:44:09,480][00031] Avg episode reward: [(0, '9.748')] -[2025-02-27 20:44:09,903][00217] Updated weights for policy 0, policy_version 860 (0.0016) -[2025-02-27 20:44:14,478][00031] Fps is (10 sec: 10649.8, 60 sec: 10240.1, 300 sec: 10330.3). Total num frames: 7086080. Throughput: 0: 2585.0. Samples: 1775274. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:44:14,479][00031] Avg episode reward: [(0, '10.777')] -[2025-02-27 20:44:14,487][00196] Saving new best policy, reward=10.777! -[2025-02-27 20:44:17,973][00217] Updated weights for policy 0, policy_version 870 (0.0014) -[2025-02-27 20:44:19,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.6, 300 sec: 10358.0). Total num frames: 7143424. Throughput: 0: 2580.4. Samples: 1783050. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:44:19,479][00031] Avg episode reward: [(0, '10.573')] -[2025-02-27 20:44:24,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 7192576. Throughput: 0: 2588.7. Samples: 1798956. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:44:24,479][00031] Avg episode reward: [(0, '10.244')] -[2025-02-27 20:44:25,722][00217] Updated weights for policy 0, policy_version 880 (0.0015) -[2025-02-27 20:44:29,478][00031] Fps is (10 sec: 9010.7, 60 sec: 10240.4, 300 sec: 10330.2). Total num frames: 7233536. Throughput: 0: 2567.0. Samples: 1813698. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:44:29,481][00031] Avg episode reward: [(0, '10.765')] -[2025-02-27 20:44:33,919][00217] Updated weights for policy 0, policy_version 890 (0.0015) -[2025-02-27 20:44:34,478][00031] Fps is (10 sec: 9830.0, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 7290880. Throughput: 0: 2563.6. Samples: 1821390. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:44:34,480][00031] Avg episode reward: [(0, '10.037')] -[2025-02-27 20:44:39,478][00031] Fps is (10 sec: 11469.0, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 7348224. Throughput: 0: 2558.9. Samples: 1837140. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:44:39,480][00031] Avg episode reward: [(0, '12.242')] -[2025-02-27 20:44:39,483][00196] Saving new best policy, reward=12.242! -[2025-02-27 20:44:41,842][00217] Updated weights for policy 0, policy_version 900 (0.0017) -[2025-02-27 20:44:44,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10239.9, 300 sec: 10358.0). Total num frames: 7397376. Throughput: 0: 2593.3. Samples: 1852890. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:44:44,481][00031] Avg episode reward: [(0, '11.005')] -[2025-02-27 20:44:49,478][00031] Fps is (10 sec: 9830.7, 60 sec: 10240.1, 300 sec: 10330.3). Total num frames: 7446528. Throughput: 0: 2591.5. Samples: 1860642. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:44:49,479][00031] Avg episode reward: [(0, '11.327')] -[2025-02-27 20:44:49,636][00217] Updated weights for policy 0, policy_version 910 (0.0014) -[2025-02-27 20:44:54,478][00031] Fps is (10 sec: 10650.0, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 7503872. Throughput: 0: 2598.3. Samples: 1876578. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:44:54,481][00031] Avg episode reward: [(0, '11.518')] -[2025-02-27 20:44:57,287][00217] Updated weights for policy 0, policy_version 920 (0.0014) -[2025-02-27 20:44:59,478][00031] Fps is (10 sec: 10649.4, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 7553024. Throughput: 0: 2599.3. Samples: 1892244. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) -[2025-02-27 20:44:59,481][00031] Avg episode reward: [(0, '11.628')] -[2025-02-27 20:45:04,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.6, 300 sec: 10358.0). Total num frames: 7602176. Throughput: 0: 2576.1. Samples: 1898976. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:45:04,480][00031] Avg episode reward: [(0, '12.660')] -[2025-02-27 20:45:04,489][00196] Saving new best policy, reward=12.660! -[2025-02-27 20:45:05,809][00217] Updated weights for policy 0, policy_version 930 (0.0019) -[2025-02-27 20:45:09,478][00031] Fps is (10 sec: 9830.3, 60 sec: 10240.0, 300 sec: 10358.1). Total num frames: 7651328. Throughput: 0: 2566.9. Samples: 1914468. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:45:09,480][00031] Avg episode reward: [(0, '13.184')] -[2025-02-27 20:45:09,497][00196] Saving new best policy, reward=13.184! -[2025-02-27 20:45:13,802][00217] Updated weights for policy 0, policy_version 940 (0.0015) -[2025-02-27 20:45:14,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10330.2). Total num frames: 7700480. Throughput: 0: 2587.8. Samples: 1930146. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:45:14,479][00031] Avg episode reward: [(0, '12.182')] -[2025-02-27 20:45:19,478][00031] Fps is (10 sec: 10650.0, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 7757824. Throughput: 0: 2589.1. Samples: 1937898. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) -[2025-02-27 20:45:19,480][00031] Avg episode reward: [(0, '12.604')] -[2025-02-27 20:45:21,348][00217] Updated weights for policy 0, policy_version 950 (0.0015) -[2025-02-27 20:45:24,479][00031] Fps is (10 sec: 11467.9, 60 sec: 10376.4, 300 sec: 10385.8). Total num frames: 7815168. Throughput: 0: 2594.4. Samples: 1953888. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:45:24,481][00031] Avg episode reward: [(0, '13.877')] -[2025-02-27 20:45:24,491][00196] Saving new best policy, reward=13.877! -[2025-02-27 20:45:29,430][00217] Updated weights for policy 0, policy_version 960 (0.0014) -[2025-02-27 20:45:29,478][00031] Fps is (10 sec: 10649.3, 60 sec: 10513.1, 300 sec: 10358.0). Total num frames: 7864320. Throughput: 0: 2593.1. Samples: 1969578. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:45:29,480][00031] Avg episode reward: [(0, '14.691')] -[2025-02-27 20:45:29,482][00196] Saving new best policy, reward=14.691! -[2025-02-27 20:45:34,478][00031] Fps is (10 sec: 9830.7, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 7913472. Throughput: 0: 2593.6. Samples: 1977354. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) -[2025-02-27 20:45:34,480][00031] Avg episode reward: [(0, '15.123')] -[2025-02-27 20:45:34,489][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000966_7913472.pth... -[2025-02-27 20:45:34,663][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000663_5431296.pth -[2025-02-27 20:45:34,709][00196] Saving new best policy, reward=15.123! -[2025-02-27 20:45:37,570][00217] Updated weights for policy 0, policy_version 970 (0.0017) -[2025-02-27 20:45:39,478][00031] Fps is (10 sec: 9830.6, 60 sec: 10240.1, 300 sec: 10358.0). Total num frames: 7962624. Throughput: 0: 2548.3. Samples: 1991250. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:45:39,480][00031] Avg episode reward: [(0, '16.671')] -[2025-02-27 20:45:39,482][00196] Saving new best policy, reward=16.671! -[2025-02-27 20:45:44,478][00031] Fps is (10 sec: 9830.9, 60 sec: 10240.1, 300 sec: 10330.3). Total num frames: 8011776. Throughput: 0: 2552.1. Samples: 2007090. Policy #0 lag: (min: 0.0, avg: 1.6, max: 5.0) -[2025-02-27 20:45:44,479][00031] Avg episode reward: [(0, '14.369')] -[2025-02-27 20:45:45,409][00217] Updated weights for policy 0, policy_version 980 (0.0016) -[2025-02-27 20:45:49,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 8069120. Throughput: 0: 2579.1. Samples: 2015034. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:45:49,482][00031] Avg episode reward: [(0, '15.333')] -[2025-02-27 20:45:53,552][00217] Updated weights for policy 0, policy_version 990 (0.0014) -[2025-02-27 20:45:54,478][00031] Fps is (10 sec: 11468.8, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 8126464. Throughput: 0: 2587.4. Samples: 2030898. Policy #0 lag: (min: 0.0, avg: 1.6, max: 5.0) -[2025-02-27 20:45:54,482][00031] Avg episode reward: [(0, '17.500')] -[2025-02-27 20:45:54,489][00196] Saving new best policy, reward=17.500! -[2025-02-27 20:45:59,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 8167424. Throughput: 0: 2584.7. Samples: 2046456. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) -[2025-02-27 20:45:59,479][00031] Avg episode reward: [(0, '14.918')] -[2025-02-27 20:46:00,885][00217] Updated weights for policy 0, policy_version 1000 (0.0014) -[2025-02-27 20:46:04,478][00031] Fps is (10 sec: 9830.1, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 8224768. Throughput: 0: 2590.4. Samples: 2054466. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) -[2025-02-27 20:46:04,481][00031] Avg episode reward: [(0, '14.886')] -[2025-02-27 20:46:09,268][00217] Updated weights for policy 0, policy_version 1010 (0.0016) -[2025-02-27 20:46:09,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.6, 300 sec: 10358.0). Total num frames: 8273920. Throughput: 0: 2552.7. Samples: 2068758. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:46:09,480][00031] Avg episode reward: [(0, '15.508')] -[2025-02-27 20:46:14,478][00031] Fps is (10 sec: 9830.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 8323072. Throughput: 0: 2553.7. Samples: 2084496. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:46:14,481][00031] Avg episode reward: [(0, '16.632')] -[2025-02-27 20:46:17,362][00217] Updated weights for policy 0, policy_version 1020 (0.0017) -[2025-02-27 20:46:19,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 8372224. Throughput: 0: 2554.4. Samples: 2092302. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:46:19,481][00031] Avg episode reward: [(0, '15.682')] -[2025-02-27 20:46:24,480][00031] Fps is (10 sec: 10647.6, 60 sec: 10239.8, 300 sec: 10330.2). Total num frames: 8429568. Throughput: 0: 2597.4. Samples: 2108136. Policy #0 lag: (min: 0.0, avg: 1.6, max: 5.0) -[2025-02-27 20:46:24,482][00031] Avg episode reward: [(0, '15.365')] -[2025-02-27 20:46:24,629][00217] Updated weights for policy 0, policy_version 1030 (0.0016) -[2025-02-27 20:46:29,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 8478720. Throughput: 0: 2591.3. Samples: 2123700. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) -[2025-02-27 20:46:29,480][00031] Avg episode reward: [(0, '16.531')] -[2025-02-27 20:46:32,912][00217] Updated weights for policy 0, policy_version 1040 (0.0014) -[2025-02-27 20:46:34,478][00031] Fps is (10 sec: 10651.6, 60 sec: 10376.6, 300 sec: 10358.0). Total num frames: 8536064. Throughput: 0: 2591.5. Samples: 2131650. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:46:34,480][00031] Avg episode reward: [(0, '19.898')] -[2025-02-27 20:46:34,490][00196] Saving new best policy, reward=19.898! -[2025-02-27 20:46:39,481][00031] Fps is (10 sec: 10646.7, 60 sec: 10376.1, 300 sec: 10330.2). Total num frames: 8585216. Throughput: 0: 2586.2. Samples: 2147286. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) -[2025-02-27 20:46:39,483][00031] Avg episode reward: [(0, '19.457')] -[2025-02-27 20:46:41,350][00217] Updated weights for policy 0, policy_version 1050 (0.0019) -[2025-02-27 20:46:44,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 8634368. Throughput: 0: 2559.9. Samples: 2161650. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:46:44,480][00031] Avg episode reward: [(0, '19.810')] -[2025-02-27 20:46:49,040][00217] Updated weights for policy 0, policy_version 1060 (0.0014) -[2025-02-27 20:46:49,478][00031] Fps is (10 sec: 9833.2, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 8683520. Throughput: 0: 2554.3. Samples: 2169408. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:46:49,479][00031] Avg episode reward: [(0, '21.065')] -[2025-02-27 20:46:49,482][00196] Saving new best policy, reward=21.065! -[2025-02-27 20:46:54,478][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.5, 300 sec: 10302.5). Total num frames: 8732672. Throughput: 0: 2587.3. Samples: 2185188. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:46:54,481][00031] Avg episode reward: [(0, '20.095')] -[2025-02-27 20:46:56,945][00217] Updated weights for policy 0, policy_version 1070 (0.0014) -[2025-02-27 20:46:59,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 8790016. Throughput: 0: 2586.8. Samples: 2200902. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:46:59,479][00031] Avg episode reward: [(0, '17.472')] -[2025-02-27 20:47:04,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 8839168. Throughput: 0: 2587.5. Samples: 2208738. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:47:04,479][00031] Avg episode reward: [(0, '18.965')] -[2025-02-27 20:47:04,646][00217] Updated weights for policy 0, policy_version 1080 (0.0015) -[2025-02-27 20:47:09,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 8896512. Throughput: 0: 2586.2. Samples: 2224512. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:47:09,479][00031] Avg episode reward: [(0, '18.678')] -[2025-02-27 20:47:12,617][00217] Updated weights for policy 0, policy_version 1090 (0.0015) -[2025-02-27 20:47:14,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 8937472. Throughput: 0: 2559.1. Samples: 2238858. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:47:14,481][00031] Avg episode reward: [(0, '18.919')] -[2025-02-27 20:47:19,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 8994816. Throughput: 0: 2556.0. Samples: 2246670. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:47:19,480][00031] Avg episode reward: [(0, '18.690')] -[2025-02-27 20:47:20,708][00217] Updated weights for policy 0, policy_version 1100 (0.0014) -[2025-02-27 20:47:24,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.3, 300 sec: 10302.5). Total num frames: 9043968. Throughput: 0: 2559.6. Samples: 2262462. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:47:24,481][00031] Avg episode reward: [(0, '19.354')] -[2025-02-27 20:47:28,820][00217] Updated weights for policy 0, policy_version 1110 (0.0015) -[2025-02-27 20:47:29,479][00031] Fps is (10 sec: 10648.2, 60 sec: 10376.3, 300 sec: 10330.3). Total num frames: 9101312. Throughput: 0: 2588.5. Samples: 2278134. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:47:29,481][00031] Avg episode reward: [(0, '22.058')] -[2025-02-27 20:47:29,483][00196] Saving new best policy, reward=22.058! -[2025-02-27 20:47:34,478][00031] Fps is (10 sec: 10649.3, 60 sec: 10239.9, 300 sec: 10302.5). Total num frames: 9150464. Throughput: 0: 2590.1. Samples: 2285964. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:47:34,483][00031] Avg episode reward: [(0, '21.511')] -[2025-02-27 20:47:34,493][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001117_9150464.pth... -[2025-02-27 20:47:34,639][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000814_6668288.pth -[2025-02-27 20:47:36,311][00217] Updated weights for policy 0, policy_version 1120 (0.0014) -[2025-02-27 20:47:39,478][00031] Fps is (10 sec: 9831.5, 60 sec: 10240.4, 300 sec: 10302.5). Total num frames: 9199616. Throughput: 0: 2587.7. Samples: 2301636. Policy #0 lag: (min: 0.0, avg: 2.3, max: 4.0) -[2025-02-27 20:47:39,480][00031] Avg episode reward: [(0, '20.847')] -[2025-02-27 20:47:44,404][00217] Updated weights for policy 0, policy_version 1130 (0.0016) -[2025-02-27 20:47:44,479][00031] Fps is (10 sec: 10649.0, 60 sec: 10376.4, 300 sec: 10330.3). Total num frames: 9256960. Throughput: 0: 2583.7. Samples: 2317170. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) -[2025-02-27 20:47:44,482][00031] Avg episode reward: [(0, '21.235')] -[2025-02-27 20:47:49,478][00031] Fps is (10 sec: 10649.8, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 9306112. Throughput: 0: 2559.6. Samples: 2323920. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:47:49,479][00031] Avg episode reward: [(0, '20.125')] -[2025-02-27 20:47:52,928][00217] Updated weights for policy 0, policy_version 1140 (0.0016) -[2025-02-27 20:47:54,478][00031] Fps is (10 sec: 9831.2, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 9355264. Throughput: 0: 2561.9. Samples: 2339796. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) -[2025-02-27 20:47:54,480][00031] Avg episode reward: [(0, '21.795')] -[2025-02-27 20:47:59,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 9404416. Throughput: 0: 2589.7. Samples: 2355396. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) -[2025-02-27 20:47:59,480][00031] Avg episode reward: [(0, '22.336')] -[2025-02-27 20:47:59,483][00196] Saving new best policy, reward=22.336! -[2025-02-27 20:48:00,611][00217] Updated weights for policy 0, policy_version 1150 (0.0016) -[2025-02-27 20:48:04,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 9461760. Throughput: 0: 2590.1. Samples: 2363226. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:48:04,480][00031] Avg episode reward: [(0, '21.030')] -[2025-02-27 20:48:08,092][00217] Updated weights for policy 0, policy_version 1160 (0.0018) -[2025-02-27 20:48:09,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 9510912. Throughput: 0: 2588.4. Samples: 2378940. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:48:09,479][00031] Avg episode reward: [(0, '21.586')] -[2025-02-27 20:48:14,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 9560064. Throughput: 0: 2591.0. Samples: 2394726. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) -[2025-02-27 20:48:14,479][00031] Avg episode reward: [(0, '22.175')] -[2025-02-27 20:48:15,980][00217] Updated weights for policy 0, policy_version 1170 (0.0016) -[2025-02-27 20:48:19,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 9609216. Throughput: 0: 2580.3. Samples: 2402076. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) -[2025-02-27 20:48:19,480][00031] Avg episode reward: [(0, '21.573')] -[2025-02-27 20:48:24,478][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.6). Total num frames: 9658368. Throughput: 0: 2562.8. Samples: 2416962. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) -[2025-02-27 20:48:24,480][00031] Avg episode reward: [(0, '21.401')] -[2025-02-27 20:48:24,533][00217] Updated weights for policy 0, policy_version 1180 (0.0015) -[2025-02-27 20:48:29,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10240.2, 300 sec: 10330.3). Total num frames: 9715712. Throughput: 0: 2567.2. Samples: 2432694. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:48:29,480][00031] Avg episode reward: [(0, '23.477')] -[2025-02-27 20:48:29,482][00196] Saving new best policy, reward=23.477! -[2025-02-27 20:48:32,261][00217] Updated weights for policy 0, policy_version 1190 (0.0014) -[2025-02-27 20:48:34,478][00031] Fps is (10 sec: 11468.8, 60 sec: 10376.6, 300 sec: 10330.3). Total num frames: 9773056. Throughput: 0: 2594.1. Samples: 2440656. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) -[2025-02-27 20:48:34,479][00031] Avg episode reward: [(0, '23.970')] -[2025-02-27 20:48:34,489][00196] Saving new best policy, reward=23.970! -[2025-02-27 20:48:39,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.6, 300 sec: 10302.5). Total num frames: 9822208. Throughput: 0: 2593.7. Samples: 2456514. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:48:39,479][00031] Avg episode reward: [(0, '22.014')] -[2025-02-27 20:48:39,914][00217] Updated weights for policy 0, policy_version 1200 (0.0014) -[2025-02-27 20:48:44,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.7, 300 sec: 10330.3). Total num frames: 9879552. Throughput: 0: 2604.4. Samples: 2472594. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:48:44,479][00031] Avg episode reward: [(0, '20.143')] -[2025-02-27 20:48:47,745][00217] Updated weights for policy 0, policy_version 1210 (0.0018) -[2025-02-27 20:48:49,478][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 9928704. Throughput: 0: 2606.5. Samples: 2480520. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:48:49,481][00031] Avg episode reward: [(0, '21.854')] -[2025-02-27 20:48:54,478][00031] Fps is (10 sec: 9830.6, 60 sec: 10376.6, 300 sec: 10330.3). Total num frames: 9977856. Throughput: 0: 2585.6. Samples: 2495292. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:48:54,480][00031] Avg episode reward: [(0, '22.355')] -[2025-02-27 20:48:55,499][00217] Updated weights for policy 0, policy_version 1220 (0.0016) -[2025-02-27 20:48:59,478][00031] Fps is (10 sec: 9830.3, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 10027008. Throughput: 0: 2589.3. Samples: 2511246. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:48:59,481][00031] Avg episode reward: [(0, '22.130')] -[2025-02-27 20:49:03,492][00217] Updated weights for policy 0, policy_version 1230 (0.0014) -[2025-02-27 20:49:04,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 10084352. Throughput: 0: 2605.6. Samples: 2519328. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) -[2025-02-27 20:49:04,480][00031] Avg episode reward: [(0, '22.923')] -[2025-02-27 20:49:09,478][00031] Fps is (10 sec: 11468.8, 60 sec: 10513.1, 300 sec: 10358.0). Total num frames: 10141696. Throughput: 0: 2628.8. Samples: 2535258. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) -[2025-02-27 20:49:09,481][00031] Avg episode reward: [(0, '23.075')] -[2025-02-27 20:49:11,250][00217] Updated weights for policy 0, policy_version 1240 (0.0015) -[2025-02-27 20:49:14,478][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10330.3). Total num frames: 10190848. Throughput: 0: 2636.7. Samples: 2551344. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:49:14,480][00031] Avg episode reward: [(0, '22.945')] -[2025-02-27 20:49:18,596][00217] Updated weights for policy 0, policy_version 1250 (0.0016) -[2025-02-27 20:49:19,478][00031] Fps is (10 sec: 10649.0, 60 sec: 10649.5, 300 sec: 10358.0). Total num frames: 10248192. Throughput: 0: 2636.2. Samples: 2559288. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) -[2025-02-27 20:49:19,481][00031] Avg episode reward: [(0, '23.029')] -[2025-02-27 20:49:24,478][00031] Fps is (10 sec: 10649.3, 60 sec: 10649.6, 300 sec: 10385.8). Total num frames: 10297344. Throughput: 0: 2628.0. Samples: 2574774. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:49:24,481][00031] Avg episode reward: [(0, '21.584')] -[2025-02-27 20:49:27,025][00217] Updated weights for policy 0, policy_version 1260 (0.0014) -[2025-02-27 20:49:29,478][00031] Fps is (10 sec: 9831.1, 60 sec: 10513.1, 300 sec: 10358.0). Total num frames: 10346496. Throughput: 0: 2608.0. Samples: 2589954. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) -[2025-02-27 20:49:29,479][00031] Avg episode reward: [(0, '20.975')] -[2025-02-27 20:49:34,478][00031] Fps is (10 sec: 9830.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 10395648. Throughput: 0: 2611.2. Samples: 2598024. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:49:34,480][00031] Avg episode reward: [(0, '23.435')] -[2025-02-27 20:49:34,505][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001270_10403840.pth... -[2025-02-27 20:49:34,519][00217] Updated weights for policy 0, policy_version 1270 (0.0015) -[2025-02-27 20:49:34,645][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000966_7913472.pth -[2025-02-27 20:49:39,478][00031] Fps is (10 sec: 10649.5, 60 sec: 10513.1, 300 sec: 10358.0). Total num frames: 10452992. Throughput: 0: 2636.5. Samples: 2613936. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:49:39,479][00031] Avg episode reward: [(0, '25.116')] -[2025-02-27 20:49:39,481][00196] Saving new best policy, reward=25.116! -[2025-02-27 20:49:42,528][00217] Updated weights for policy 0, policy_version 1280 (0.0018) -[2025-02-27 20:49:44,479][00031] Fps is (10 sec: 11467.7, 60 sec: 10512.9, 300 sec: 10385.8). Total num frames: 10510336. Throughput: 0: 2637.0. Samples: 2629914. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) -[2025-02-27 20:49:44,483][00031] Avg episode reward: [(0, '27.004')] -[2025-02-27 20:49:44,493][00196] Saving new best policy, reward=27.004! -[2025-02-27 20:49:49,478][00031] Fps is (10 sec: 10649.3, 60 sec: 10513.0, 300 sec: 10358.0). Total num frames: 10559488. Throughput: 0: 2632.9. Samples: 2637810. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) -[2025-02-27 20:49:49,480][00031] Avg episode reward: [(0, '25.430')] -[2025-02-27 20:49:49,833][00217] Updated weights for policy 0, policy_version 1290 (0.0017) -[2025-02-27 20:49:52,127][00031] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 31], exiting... -[2025-02-27 20:49:52,137][00196] Stopping Batcher_0... -[2025-02-27 20:49:52,137][00196] Loop batcher_evt_loop terminating... -[2025-02-27 20:49:52,140][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_10584064.pth... -[2025-02-27 20:49:52,136][00031] Runner profile tree view: -main_loop: 1092.3079 -[2025-02-27 20:49:52,145][00031] Collected {0: 10584064}, FPS: 9689.6 -[2025-02-27 20:49:52,261][00217] Weights refcount: 2 0 -[2025-02-27 20:49:52,270][00217] Stopping InferenceWorker_p0-w0... -[2025-02-27 20:49:52,272][00217] Loop inference_proc0-0_evt_loop terminating... -[2025-02-27 20:49:52,314][00231] EvtLoop [rollout_proc14_evt_loop, process=rollout_proc14] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance14'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,316][00231] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc14_evt_loop -[2025-02-27 20:49:52,332][00224] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,351][00224] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop -[2025-02-27 20:49:52,403][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001117_9150464.pth -[2025-02-27 20:49:52,409][00229] EvtLoop [rollout_proc13_evt_loop, process=rollout_proc13] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance13'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,426][00229] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc13_evt_loop -[2025-02-27 20:49:52,406][00218] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,429][00218] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop -[2025-02-27 20:49:52,402][00233] EvtLoop [rollout_proc16_evt_loop, process=rollout_proc16] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance16'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,420][00226] EvtLoop [rollout_proc9_evt_loop, process=rollout_proc9] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance9'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,435][00233] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc16_evt_loop -[2025-02-27 20:49:52,436][00226] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc9_evt_loop -[2025-02-27 20:49:52,423][00196] Stopping LearnerWorker_p0... -[2025-02-27 20:49:52,438][00196] Loop learner_proc0_evt_loop terminating... -[2025-02-27 20:49:52,412][00236] EvtLoop [rollout_proc19_evt_loop, process=rollout_proc19] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance19'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,446][00236] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc19_evt_loop -[2025-02-27 20:49:52,432][00221] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,446][00221] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop -[2025-02-27 20:49:52,424][00225] EvtLoop [rollout_proc8_evt_loop, process=rollout_proc8] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance8'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,448][00225] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc8_evt_loop -[2025-02-27 20:49:52,500][00220] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,519][00220] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop -[2025-02-27 20:49:52,733][00227] EvtLoop [rollout_proc11_evt_loop, process=rollout_proc11] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance11'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:52,745][00227] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc11_evt_loop -[2025-02-27 20:49:53,009][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json -[2025-02-27 20:49:53,013][00031] Overriding arg 'num_workers' with value 1 passed from command line -[2025-02-27 20:49:53,015][00031] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-02-27 20:49:53,018][00031] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-02-27 20:49:53,019][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-02-27 20:49:53,021][00031] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-02-27 20:49:53,025][00031] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-02-27 20:49:53,028][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-02-27 20:49:53,029][00031] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-02-27 20:49:53,030][00031] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-02-27 20:49:53,031][00031] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-02-27 20:49:53,032][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-02-27 20:49:53,033][00031] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-02-27 20:49:53,034][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-02-27 20:49:53,034][00031] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-02-27 20:49:53,105][00031] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-02-27 20:49:53,111][00031] RunningMeanStd input shape: (3, 72, 128) -[2025-02-27 20:49:53,114][00031] RunningMeanStd input shape: (1,) -[2025-02-27 20:49:53,148][00031] ConvEncoder: input_channels=3 -[2025-02-27 20:49:53,479][00031] Conv encoder output size: 512 -[2025-02-27 20:49:53,482][00031] Policy head output size: 512 -[2025-02-27 20:49:53,872][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_10584064.pth... -[2025-02-27 20:49:55,026][00235] EvtLoop [rollout_proc18_evt_loop, process=rollout_proc18] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance18'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,111][00235] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc18_evt_loop -[2025-02-27 20:49:55,115][00228] EvtLoop [rollout_proc12_evt_loop, process=rollout_proc12] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance12'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,117][00228] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc12_evt_loop -[2025-02-27 20:49:55,094][00230] EvtLoop [rollout_proc10_evt_loop, process=rollout_proc10] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance10'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,138][00230] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc10_evt_loop -[2025-02-27 20:49:55,148][00219] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,156][00219] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop -[2025-02-27 20:49:55,259][00223] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,282][00223] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop -[2025-02-27 20:49:55,355][00232] EvtLoop [rollout_proc15_evt_loop, process=rollout_proc15] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance15'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,357][00232] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc15_evt_loop -[2025-02-27 20:49:55,400][00234] EvtLoop [rollout_proc17_evt_loop, process=rollout_proc17] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance17'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,402][00234] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc17_evt_loop -[2025-02-27 20:49:55,469][00031] Num frames 100... -[2025-02-27 20:49:55,579][00216] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,581][00216] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop -[2025-02-27 20:49:55,607][00222] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 508, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 447, in step - return self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-02-27 20:49:55,619][00222] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop -[2025-02-27 20:49:55,759][00031] Num frames 200... -[2025-02-27 20:49:56,055][00031] Num frames 300... -[2025-02-27 20:49:56,333][00031] Num frames 400... -[2025-02-27 20:49:56,650][00031] Num frames 500... -[2025-02-27 20:49:56,940][00031] Num frames 600... -[2025-02-27 20:49:57,216][00031] Num frames 700... -[2025-02-27 20:49:57,489][00031] Num frames 800... -[2025-02-27 20:49:57,764][00031] Num frames 900... -[2025-02-27 20:49:58,026][00031] Num frames 1000... -[2025-02-27 20:49:58,284][00031] Num frames 1100... -[2025-02-27 20:49:58,438][00031] Avg episode rewards: #0: 22.520, true rewards: #0: 11.520 -[2025-02-27 20:49:58,440][00031] Avg episode reward: 22.520, avg true_objective: 11.520 -[2025-02-27 20:49:58,540][00031] Num frames 1200... -[2025-02-27 20:49:58,717][00031] Num frames 1300... -[2025-02-27 20:49:58,875][00031] Num frames 1400... -[2025-02-27 20:49:59,004][00031] Num frames 1500... -[2025-02-27 20:49:59,157][00031] Num frames 1600... -[2025-02-27 20:49:59,274][00031] Num frames 1700... -[2025-02-27 20:49:59,394][00031] Num frames 1800... -[2025-02-27 20:49:59,513][00031] Num frames 1900... -[2025-02-27 20:49:59,632][00031] Num frames 2000... -[2025-02-27 20:49:59,751][00031] Num frames 2100... -[2025-02-27 20:49:59,880][00031] Num frames 2200... -[2025-02-27 20:50:00,011][00031] Num frames 2300... -[2025-02-27 20:50:00,134][00031] Num frames 2400... -[2025-02-27 20:50:00,257][00031] Num frames 2500... -[2025-02-27 20:50:00,377][00031] Num frames 2600... -[2025-02-27 20:50:00,494][00031] Num frames 2700... -[2025-02-27 20:50:00,610][00031] Num frames 2800... -[2025-02-27 20:50:00,727][00031] Num frames 2900... -[2025-02-27 20:50:00,846][00031] Num frames 3000... -[2025-02-27 20:50:00,969][00031] Num frames 3100... -[2025-02-27 20:50:01,088][00031] Num frames 3200... -[2025-02-27 20:50:01,203][00031] Avg episode rewards: #0: 41.259, true rewards: #0: 16.260 -[2025-02-27 20:50:01,204][00031] Avg episode reward: 41.259, avg true_objective: 16.260 -[2025-02-27 20:50:01,263][00031] Num frames 3300... -[2025-02-27 20:50:01,379][00031] Num frames 3400... -[2025-02-27 20:50:01,494][00031] Num frames 3500... -[2025-02-27 20:50:01,610][00031] Num frames 3600... -[2025-02-27 20:50:01,729][00031] Num frames 3700... -[2025-02-27 20:50:01,848][00031] Num frames 3800... -[2025-02-27 20:50:01,965][00031] Num frames 3900... -[2025-02-27 20:50:02,124][00031] Avg episode rewards: #0: 31.960, true rewards: #0: 13.293 -[2025-02-27 20:50:02,125][00031] Avg episode reward: 31.960, avg true_objective: 13.293 -[2025-02-27 20:50:02,142][00031] Num frames 4000... -[2025-02-27 20:50:02,259][00031] Num frames 4100... -[2025-02-27 20:50:02,377][00031] Num frames 4200... -[2025-02-27 20:50:02,447][00031] Avg episode rewards: #0: 24.780, true rewards: #0: 10.530 -[2025-02-27 20:50:02,448][00031] Avg episode reward: 24.780, avg true_objective: 10.530 -[2025-02-27 20:50:02,551][00031] Num frames 4300... -[2025-02-27 20:50:02,667][00031] Num frames 4400... -[2025-02-27 20:50:02,783][00031] Num frames 4500... -[2025-02-27 20:50:02,899][00031] Num frames 4600... -[2025-02-27 20:50:03,021][00031] Num frames 4700... -[2025-02-27 20:50:03,152][00031] Num frames 4800... -[2025-02-27 20:50:03,270][00031] Num frames 4900... -[2025-02-27 20:50:03,388][00031] Num frames 5000... -[2025-02-27 20:50:03,503][00031] Num frames 5100... -[2025-02-27 20:50:03,622][00031] Num frames 5200... -[2025-02-27 20:50:03,738][00031] Num frames 5300... -[2025-02-27 20:50:03,852][00031] Num frames 5400... -[2025-02-27 20:50:03,971][00031] Num frames 5500... -[2025-02-27 20:50:04,092][00031] Num frames 5600... -[2025-02-27 20:50:04,211][00031] Num frames 5700... -[2025-02-27 20:50:04,336][00031] Num frames 5800... -[2025-02-27 20:50:04,460][00031] Num frames 5900... -[2025-02-27 20:50:04,588][00031] Num frames 6000... -[2025-02-27 20:50:04,717][00031] Num frames 6100... -[2025-02-27 20:50:04,842][00031] Num frames 6200... -[2025-02-27 20:50:04,976][00031] Num frames 6300... -[2025-02-27 20:50:05,046][00031] Avg episode rewards: #0: 31.624, true rewards: #0: 12.624 -[2025-02-27 20:50:05,048][00031] Avg episode reward: 31.624, avg true_objective: 12.624 -[2025-02-27 20:50:05,157][00031] Num frames 6400... -[2025-02-27 20:50:05,282][00031] Num frames 6500... -[2025-02-27 20:50:05,398][00031] Num frames 6600... -[2025-02-27 20:50:05,530][00031] Num frames 6700... -[2025-02-27 20:50:05,653][00031] Num frames 6800... -[2025-02-27 20:50:05,770][00031] Num frames 6900... -[2025-02-27 20:50:05,834][00031] Avg episode rewards: #0: 28.013, true rewards: #0: 11.513 -[2025-02-27 20:50:05,835][00031] Avg episode reward: 28.013, avg true_objective: 11.513 -[2025-02-27 20:50:05,946][00031] Num frames 7000... -[2025-02-27 20:50:06,069][00031] Num frames 7100... -[2025-02-27 20:50:06,193][00031] Num frames 7200... -[2025-02-27 20:50:06,310][00031] Num frames 7300... -[2025-02-27 20:50:06,436][00031] Num frames 7400... -[2025-02-27 20:50:06,556][00031] Num frames 7500... -[2025-02-27 20:50:06,630][00031] Avg episode rewards: #0: 25.308, true rewards: #0: 10.737 -[2025-02-27 20:50:06,631][00031] Avg episode reward: 25.308, avg true_objective: 10.737 -[2025-02-27 20:50:06,730][00031] Num frames 7600... -[2025-02-27 20:50:06,845][00031] Num frames 7700... -[2025-02-27 20:50:06,970][00031] Num frames 7800... -[2025-02-27 20:50:07,090][00031] Num frames 7900... -[2025-02-27 20:50:07,205][00031] Num frames 8000... -[2025-02-27 20:50:07,321][00031] Num frames 8100... -[2025-02-27 20:50:07,380][00031] Avg episode rewards: #0: 23.629, true rewards: #0: 10.129 -[2025-02-27 20:50:07,381][00031] Avg episode reward: 23.629, avg true_objective: 10.129 -[2025-02-27 20:50:07,497][00031] Num frames 8200... -[2025-02-27 20:50:07,616][00031] Num frames 8300... -[2025-02-27 20:50:07,733][00031] Num frames 8400... -[2025-02-27 20:50:07,853][00031] Num frames 8500... -[2025-02-27 20:50:07,981][00031] Num frames 8600... -[2025-02-27 20:50:08,106][00031] Num frames 8700... -[2025-02-27 20:50:08,222][00031] Num frames 8800... -[2025-02-27 20:50:08,340][00031] Num frames 8900... -[2025-02-27 20:50:08,458][00031] Num frames 9000... -[2025-02-27 20:50:08,576][00031] Num frames 9100... -[2025-02-27 20:50:08,694][00031] Avg episode rewards: #0: 23.614, true rewards: #0: 10.170 -[2025-02-27 20:50:08,695][00031] Avg episode reward: 23.614, avg true_objective: 10.170 -[2025-02-27 20:50:08,752][00031] Num frames 9200... -[2025-02-27 20:50:08,872][00031] Num frames 9300... -[2025-02-27 20:50:08,993][00031] Num frames 9400... -[2025-02-27 20:50:09,114][00031] Num frames 9500... -[2025-02-27 20:50:09,231][00031] Num frames 9600... -[2025-02-27 20:50:09,350][00031] Num frames 9700... -[2025-02-27 20:50:09,469][00031] Num frames 9800... -[2025-02-27 20:50:09,585][00031] Num frames 9900... -[2025-02-27 20:50:09,703][00031] Num frames 10000... -[2025-02-27 20:50:09,821][00031] Num frames 10100... -[2025-02-27 20:50:09,939][00031] Num frames 10200... -[2025-02-27 20:50:10,059][00031] Num frames 10300... -[2025-02-27 20:50:10,184][00031] Num frames 10400... -[2025-02-27 20:50:10,308][00031] Num frames 10500... -[2025-02-27 20:50:10,432][00031] Num frames 10600... -[2025-02-27 20:50:10,561][00031] Num frames 10700... -[2025-02-27 20:50:10,690][00031] Num frames 10800... -[2025-02-27 20:50:10,816][00031] Num frames 10900... -[2025-02-27 20:50:10,942][00031] Num frames 11000... -[2025-02-27 20:50:11,071][00031] Num frames 11100... -[2025-02-27 20:50:11,123][00031] Avg episode rewards: #0: 26.600, true rewards: #0: 11.100 -[2025-02-27 20:50:11,123][00031] Avg episode reward: 26.600, avg true_objective: 11.100 -[2025-02-27 20:50:45,990][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4! -[2025-02-27 20:50:46,798][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json -[2025-02-27 20:50:46,799][00031] Overriding arg 'num_workers' with value 1 passed from command line -[2025-02-27 20:50:46,800][00031] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-02-27 20:50:46,800][00031] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-02-27 20:50:46,801][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-02-27 20:50:46,802][00031] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-02-27 20:50:46,805][00031] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-02-27 20:50:46,806][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-02-27 20:50:46,807][00031] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-02-27 20:50:46,808][00031] Adding new argument 'hf_repository'='francescosabbarese/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-02-27 20:50:46,809][00031] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-02-27 20:50:46,809][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-02-27 20:50:46,811][00031] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-02-27 20:50:46,812][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-02-27 20:50:46,814][00031] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-02-27 20:50:46,838][00031] RunningMeanStd input shape: (3, 72, 128) -[2025-02-27 20:50:46,839][00031] RunningMeanStd input shape: (1,) -[2025-02-27 20:50:46,851][00031] ConvEncoder: input_channels=3 -[2025-02-27 20:50:46,895][00031] Conv encoder output size: 512 -[2025-02-27 20:50:46,896][00031] Policy head output size: 512 -[2025-02-27 20:50:46,917][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_10584064.pth... -[2025-02-27 20:50:47,382][00031] Num frames 100... -[2025-02-27 20:50:47,509][00031] Num frames 200... -[2025-02-27 20:50:47,635][00031] Num frames 300... -[2025-02-27 20:50:47,761][00031] Num frames 400... -[2025-02-27 20:50:47,886][00031] Num frames 500... -[2025-02-27 20:50:48,013][00031] Num frames 600... -[2025-02-27 20:50:48,136][00031] Num frames 700... -[2025-02-27 20:50:48,254][00031] Num frames 800... -[2025-02-27 20:50:48,371][00031] Num frames 900... -[2025-02-27 20:50:48,488][00031] Num frames 1000... -[2025-02-27 20:50:48,607][00031] Num frames 1100... -[2025-02-27 20:50:48,723][00031] Num frames 1200... -[2025-02-27 20:50:48,841][00031] Num frames 1300... -[2025-02-27 20:50:48,918][00031] Avg episode rewards: #0: 35.190, true rewards: #0: 13.190 -[2025-02-27 20:50:48,919][00031] Avg episode reward: 35.190, avg true_objective: 13.190 -[2025-02-27 20:50:49,020][00031] Num frames 1400... -[2025-02-27 20:50:49,143][00031] Num frames 1500... -[2025-02-27 20:50:49,262][00031] Num frames 1600... -[2025-02-27 20:50:49,383][00031] Num frames 1700... -[2025-02-27 20:50:49,501][00031] Num frames 1800... -[2025-02-27 20:50:49,626][00031] Num frames 1900... -[2025-02-27 20:50:49,750][00031] Avg episode rewards: #0: 24.295, true rewards: #0: 9.795 -[2025-02-27 20:50:49,751][00031] Avg episode reward: 24.295, avg true_objective: 9.795 -[2025-02-27 20:50:49,798][00031] Num frames 2000... -[2025-02-27 20:50:49,918][00031] Num frames 2100... -[2025-02-27 20:50:50,039][00031] Num frames 2200... -[2025-02-27 20:50:50,164][00031] Num frames 2300... -[2025-02-27 20:50:50,276][00031] Num frames 2400... -[2025-02-27 20:50:50,392][00031] Num frames 2500... -[2025-02-27 20:50:50,562][00031] Avg episode rewards: #0: 20.330, true rewards: #0: 8.663 -[2025-02-27 20:50:50,562][00031] Avg episode reward: 20.330, avg true_objective: 8.663 -[2025-02-27 20:50:50,565][00031] Num frames 2600... -[2025-02-27 20:50:50,687][00031] Num frames 2700... -[2025-02-27 20:50:50,806][00031] Num frames 2800... -[2025-02-27 20:50:50,917][00031] Num frames 2900... -[2025-02-27 20:50:51,036][00031] Num frames 3000... -[2025-02-27 20:50:51,159][00031] Num frames 3100... -[2025-02-27 20:50:51,274][00031] Num frames 3200... -[2025-02-27 20:50:51,393][00031] Num frames 3300... -[2025-02-27 20:50:51,508][00031] Num frames 3400... -[2025-02-27 20:50:51,627][00031] Num frames 3500... -[2025-02-27 20:50:51,745][00031] Num frames 3600... -[2025-02-27 20:50:51,861][00031] Num frames 3700... -[2025-02-27 20:50:51,975][00031] Num frames 3800... -[2025-02-27 20:50:52,122][00031] Avg episode rewards: #0: 22.948, true rewards: #0: 9.697 -[2025-02-27 20:50:52,123][00031] Avg episode reward: 22.948, avg true_objective: 9.697 -[2025-02-27 20:50:52,151][00031] Num frames 3900... -[2025-02-27 20:50:52,266][00031] Num frames 4000... -[2025-02-27 20:50:52,383][00031] Num frames 4100... -[2025-02-27 20:50:52,505][00031] Num frames 4200... -[2025-02-27 20:50:52,628][00031] Num frames 4300... -[2025-02-27 20:50:52,748][00031] Num frames 4400... -[2025-02-27 20:50:52,865][00031] Num frames 4500... -[2025-02-27 20:50:52,987][00031] Num frames 4600... -[2025-02-27 20:50:53,117][00031] Num frames 4700... -[2025-02-27 20:50:53,236][00031] Num frames 4800... -[2025-02-27 20:50:53,359][00031] Num frames 4900... -[2025-02-27 20:50:53,482][00031] Num frames 5000... -[2025-02-27 20:50:53,611][00031] Num frames 5100... -[2025-02-27 20:50:53,726][00031] Num frames 5200... -[2025-02-27 20:50:53,844][00031] Num frames 5300... -[2025-02-27 20:50:53,963][00031] Num frames 5400... -[2025-02-27 20:50:54,085][00031] Num frames 5500... -[2025-02-27 20:50:54,160][00031] Avg episode rewards: #0: 28.034, true rewards: #0: 11.034 -[2025-02-27 20:50:54,161][00031] Avg episode reward: 28.034, avg true_objective: 11.034 -[2025-02-27 20:50:54,255][00031] Num frames 5600... -[2025-02-27 20:50:54,379][00031] Num frames 5700... -[2025-02-27 20:50:54,503][00031] Num frames 5800... -[2025-02-27 20:50:54,630][00031] Num frames 5900... -[2025-02-27 20:50:54,754][00031] Num frames 6000... -[2025-02-27 20:50:54,880][00031] Num frames 6100... -[2025-02-27 20:50:54,957][00031] Avg episode rewards: #0: 25.028, true rewards: #0: 10.195 -[2025-02-27 20:50:54,958][00031] Avg episode reward: 25.028, avg true_objective: 10.195 -[2025-02-27 20:50:55,057][00031] Num frames 6200... -[2025-02-27 20:50:55,177][00031] Num frames 6300... -[2025-02-27 20:50:55,300][00031] Num frames 6400... -[2025-02-27 20:50:55,414][00031] Num frames 6500... -[2025-02-27 20:50:55,542][00031] Num frames 6600... -[2025-02-27 20:50:55,657][00031] Num frames 6700... -[2025-02-27 20:50:55,774][00031] Num frames 6800... -[2025-02-27 20:50:55,896][00031] Num frames 6900... -[2025-02-27 20:50:56,017][00031] Num frames 7000... -[2025-02-27 20:50:56,135][00031] Num frames 7100... -[2025-02-27 20:50:56,260][00031] Num frames 7200... -[2025-02-27 20:50:56,376][00031] Num frames 7300... -[2025-02-27 20:50:56,433][00031] Avg episode rewards: #0: 25.001, true rewards: #0: 10.430 -[2025-02-27 20:50:56,434][00031] Avg episode reward: 25.001, avg true_objective: 10.430 -[2025-02-27 20:50:56,546][00031] Num frames 7400... -[2025-02-27 20:50:56,661][00031] Num frames 7500... -[2025-02-27 20:50:56,777][00031] Num frames 7600... -[2025-02-27 20:50:56,929][00031] Avg episode rewards: #0: 22.356, true rewards: #0: 9.606 -[2025-02-27 20:50:56,930][00031] Avg episode reward: 22.356, avg true_objective: 9.606 -[2025-02-27 20:50:56,947][00031] Num frames 7700... -[2025-02-27 20:50:57,063][00031] Num frames 7800... -[2025-02-27 20:50:57,183][00031] Num frames 7900... -[2025-02-27 20:50:57,302][00031] Num frames 8000... -[2025-02-27 20:50:57,422][00031] Num frames 8100... -[2025-02-27 20:50:57,549][00031] Num frames 8200... -[2025-02-27 20:50:57,674][00031] Num frames 8300... -[2025-02-27 20:50:57,797][00031] Num frames 8400... -[2025-02-27 20:50:57,921][00031] Num frames 8500... -[2025-02-27 20:50:58,044][00031] Num frames 8600... -[2025-02-27 20:50:58,169][00031] Num frames 8700... -[2025-02-27 20:50:58,296][00031] Num frames 8800... -[2025-02-27 20:50:58,358][00031] Avg episode rewards: #0: 22.450, true rewards: #0: 9.783 -[2025-02-27 20:50:58,359][00031] Avg episode reward: 22.450, avg true_objective: 9.783 -[2025-02-27 20:50:58,469][00031] Num frames 8900... -[2025-02-27 20:50:58,585][00031] Num frames 9000... -[2025-02-27 20:50:58,707][00031] Num frames 9100... -[2025-02-27 20:50:58,832][00031] Num frames 9200... -[2025-02-27 20:50:58,956][00031] Num frames 9300... -[2025-02-27 20:50:59,079][00031] Num frames 9400... -[2025-02-27 20:50:59,231][00031] Avg episode rewards: #0: 21.577, true rewards: #0: 9.477 -[2025-02-27 20:50:59,232][00031] Avg episode reward: 21.577, avg true_objective: 9.477 -[2025-02-27 20:51:28,387][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4! -[2025-02-27 20:51:33,964][00031] The model has been pushed to https://huggingface.co/francescosabbarese/rl_course_vizdoom_health_gathering_supreme -[2025-02-27 20:51:33,987][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json -[2025-02-27 20:51:33,988][00031] Overriding arg 'num_workers' with value 1 passed from command line -[2025-02-27 20:51:33,988][00031] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-02-27 20:51:33,989][00031] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-02-27 20:51:33,991][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-02-27 20:51:33,992][00031] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-02-27 20:51:33,992][00031] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-02-27 20:51:33,994][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-02-27 20:51:33,994][00031] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-02-27 20:51:33,995][00031] Adding new argument 'hf_repository'='francescosabbarese/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-02-27 20:51:33,996][00031] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-02-27 20:51:33,997][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-02-27 20:51:33,998][00031] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-02-27 20:51:34,000][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-02-27 20:51:34,000][00031] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-02-27 20:51:34,025][00031] RunningMeanStd input shape: (3, 72, 128) -[2025-02-27 20:51:34,026][00031] RunningMeanStd input shape: (1,) -[2025-02-27 20:51:34,037][00031] ConvEncoder: input_channels=3 -[2025-02-27 20:51:34,077][00031] Conv encoder output size: 512 -[2025-02-27 20:51:34,078][00031] Policy head output size: 512 -[2025-02-27 20:51:34,104][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_10584064.pth... -[2025-02-27 20:51:34,563][00031] Num frames 100... -[2025-02-27 20:51:34,690][00031] Num frames 200... -[2025-02-27 20:51:34,811][00031] Num frames 300... -[2025-02-27 20:51:34,937][00031] Num frames 400... -[2025-02-27 20:51:35,060][00031] Num frames 500... -[2025-02-27 20:51:35,175][00031] Num frames 600... -[2025-02-27 20:51:35,292][00031] Num frames 700... -[2025-02-27 20:51:35,408][00031] Num frames 800... -[2025-02-27 20:51:35,537][00031] Num frames 900... -[2025-02-27 20:51:35,655][00031] Num frames 1000... -[2025-02-27 20:51:35,773][00031] Num frames 1100... -[2025-02-27 20:51:35,889][00031] Num frames 1200... -[2025-02-27 20:51:36,009][00031] Num frames 1300... -[2025-02-27 20:51:36,115][00031] Avg episode rewards: #0: 37.440, true rewards: #0: 13.440 -[2025-02-27 20:51:36,116][00031] Avg episode reward: 37.440, avg true_objective: 13.440 -[2025-02-27 20:51:36,184][00031] Num frames 1400... -[2025-02-27 20:51:36,303][00031] Num frames 1500... -[2025-02-27 20:51:36,426][00031] Num frames 1600... -[2025-02-27 20:51:36,544][00031] Num frames 1700... -[2025-02-27 20:51:36,662][00031] Num frames 1800... -[2025-02-27 20:51:36,778][00031] Num frames 1900... -[2025-02-27 20:51:36,894][00031] Num frames 2000... -[2025-02-27 20:51:37,010][00031] Num frames 2100... -[2025-02-27 20:51:37,125][00031] Num frames 2200... -[2025-02-27 20:51:37,239][00031] Num frames 2300... -[2025-02-27 20:51:37,356][00031] Num frames 2400... -[2025-02-27 20:51:37,472][00031] Num frames 2500... -[2025-02-27 20:51:37,599][00031] Num frames 2600... -[2025-02-27 20:51:37,725][00031] Num frames 2700... -[2025-02-27 20:51:37,843][00031] Num frames 2800... -[2025-02-27 20:51:37,960][00031] Num frames 2900... -[2025-02-27 20:51:38,085][00031] Num frames 3000... -[2025-02-27 20:51:38,209][00031] Num frames 3100... -[2025-02-27 20:51:38,330][00031] Num frames 3200... -[2025-02-27 20:51:38,450][00031] Num frames 3300... -[2025-02-27 20:51:38,560][00031] Avg episode rewards: #0: 42.225, true rewards: #0: 16.725 -[2025-02-27 20:51:38,561][00031] Avg episode reward: 42.225, avg true_objective: 16.725 -[2025-02-27 20:51:38,627][00031] Num frames 3400... -[2025-02-27 20:51:38,746][00031] Num frames 3500... -[2025-02-27 20:51:38,866][00031] Num frames 3600... -[2025-02-27 20:51:38,991][00031] Num frames 3700... -[2025-02-27 20:51:39,110][00031] Num frames 3800... -[2025-02-27 20:51:39,227][00031] Num frames 3900... -[2025-02-27 20:51:39,344][00031] Num frames 4000... -[2025-02-27 20:51:39,459][00031] Num frames 4100... -[2025-02-27 20:51:39,576][00031] Num frames 4200... -[2025-02-27 20:51:39,695][00031] Num frames 4300... -[2025-02-27 20:51:39,811][00031] Num frames 4400... -[2025-02-27 20:51:39,874][00031] Avg episode rewards: #0: 37.023, true rewards: #0: 14.690 -[2025-02-27 20:51:39,875][00031] Avg episode reward: 37.023, avg true_objective: 14.690 -[2025-02-27 20:51:39,985][00031] Num frames 4500... -[2025-02-27 20:51:40,106][00031] Num frames 4600... -[2025-02-27 20:51:40,226][00031] Num frames 4700... -[2025-02-27 20:51:40,341][00031] Num frames 4800... -[2025-02-27 20:51:40,459][00031] Num frames 4900... -[2025-02-27 20:51:40,577][00031] Num frames 5000... -[2025-02-27 20:51:40,694][00031] Num frames 5100... -[2025-02-27 20:51:40,812][00031] Num frames 5200... -[2025-02-27 20:51:40,928][00031] Num frames 5300... -[2025-02-27 20:51:41,045][00031] Num frames 5400... -[2025-02-27 20:51:41,159][00031] Num frames 5500... -[2025-02-27 20:51:41,278][00031] Num frames 5600... -[2025-02-27 20:51:41,405][00031] Num frames 5700... -[2025-02-27 20:51:41,530][00031] Num frames 5800... -[2025-02-27 20:51:41,679][00031] Avg episode rewards: #0: 36.947, true rewards: #0: 14.697 -[2025-02-27 20:51:41,680][00031] Avg episode reward: 36.947, avg true_objective: 14.697 -[2025-02-27 20:51:41,706][00031] Num frames 5900... -[2025-02-27 20:51:41,827][00031] Num frames 6000... -[2025-02-27 20:51:41,942][00031] Num frames 6100... -[2025-02-27 20:51:42,058][00031] Num frames 6200... -[2025-02-27 20:51:42,174][00031] Num frames 6300... -[2025-02-27 20:51:42,293][00031] Num frames 6400... -[2025-02-27 20:51:42,417][00031] Num frames 6500... -[2025-02-27 20:51:42,541][00031] Num frames 6600... -[2025-02-27 20:51:42,665][00031] Num frames 6700... -[2025-02-27 20:51:42,789][00031] Num frames 6800... -[2025-02-27 20:51:42,914][00031] Num frames 6900... -[2025-02-27 20:51:43,038][00031] Num frames 7000... -[2025-02-27 20:51:43,165][00031] Num frames 7100... -[2025-02-27 20:51:43,287][00031] Num frames 7200... -[2025-02-27 20:51:43,402][00031] Num frames 7300... -[2025-02-27 20:51:43,517][00031] Num frames 7400... -[2025-02-27 20:51:43,683][00031] Avg episode rewards: #0: 38.792, true rewards: #0: 14.992 -[2025-02-27 20:51:43,684][00031] Avg episode reward: 38.792, avg true_objective: 14.992 -[2025-02-27 20:51:43,690][00031] Num frames 7500... -[2025-02-27 20:51:43,814][00031] Num frames 7600... -[2025-02-27 20:51:43,937][00031] Num frames 7700... -[2025-02-27 20:51:44,060][00031] Num frames 7800... -[2025-02-27 20:51:44,182][00031] Num frames 7900... -[2025-02-27 20:51:44,305][00031] Num frames 8000... -[2025-02-27 20:51:44,425][00031] Num frames 8100... -[2025-02-27 20:51:44,525][00031] Avg episode rewards: #0: 34.560, true rewards: #0: 13.560 -[2025-02-27 20:51:44,526][00031] Avg episode reward: 34.560, avg true_objective: 13.560 -[2025-02-27 20:51:44,605][00031] Num frames 8200... -[2025-02-27 20:51:44,721][00031] Num frames 8300... -[2025-02-27 20:51:44,837][00031] Num frames 8400... -[2025-02-27 20:51:44,968][00031] Num frames 8500... -[2025-02-27 20:51:45,084][00031] Num frames 8600... -[2025-02-27 20:51:45,208][00031] Num frames 8700... -[2025-02-27 20:51:45,326][00031] Num frames 8800... -[2025-02-27 20:51:45,448][00031] Num frames 8900... -[2025-02-27 20:51:45,620][00031] Avg episode rewards: #0: 32.271, true rewards: #0: 12.843 -[2025-02-27 20:51:45,621][00031] Avg episode reward: 32.271, avg true_objective: 12.843 -[2025-02-27 20:51:45,634][00031] Num frames 9000... -[2025-02-27 20:51:45,754][00031] Num frames 9100... -[2025-02-27 20:51:45,869][00031] Num frames 9200... -[2025-02-27 20:51:45,988][00031] Num frames 9300... -[2025-02-27 20:51:46,105][00031] Num frames 9400... -[2025-02-27 20:51:46,219][00031] Num frames 9500... -[2025-02-27 20:51:46,339][00031] Num frames 9600... -[2025-02-27 20:51:46,451][00031] Num frames 9700... -[2025-02-27 20:51:46,569][00031] Num frames 9800... -[2025-02-27 20:51:46,685][00031] Num frames 9900... -[2025-02-27 20:51:46,804][00031] Num frames 10000... -[2025-02-27 20:51:46,924][00031] Num frames 10100... -[2025-02-27 20:51:47,045][00031] Num frames 10200... -[2025-02-27 20:51:47,164][00031] Num frames 10300... -[2025-02-27 20:51:47,280][00031] Num frames 10400... -[2025-02-27 20:51:47,406][00031] Num frames 10500... -[2025-02-27 20:51:47,524][00031] Num frames 10600... -[2025-02-27 20:51:47,646][00031] Num frames 10700... -[2025-02-27 20:51:47,762][00031] Num frames 10800... -[2025-02-27 20:51:47,899][00031] Avg episode rewards: #0: 34.712, true rewards: #0: 13.587 -[2025-02-27 20:51:47,900][00031] Avg episode reward: 34.712, avg true_objective: 13.587 -[2025-02-27 20:51:47,936][00031] Num frames 10900... -[2025-02-27 20:51:48,061][00031] Num frames 11000... -[2025-02-27 20:51:48,180][00031] Num frames 11100... -[2025-02-27 20:51:48,296][00031] Num frames 11200... -[2025-02-27 20:51:48,413][00031] Num frames 11300... -[2025-02-27 20:51:48,532][00031] Num frames 11400... -[2025-02-27 20:51:48,655][00031] Avg episode rewards: #0: 32.286, true rewards: #0: 12.731 -[2025-02-27 20:51:48,656][00031] Avg episode reward: 32.286, avg true_objective: 12.731 -[2025-02-27 20:51:48,707][00031] Num frames 11500... -[2025-02-27 20:51:48,826][00031] Num frames 11600... -[2025-02-27 20:51:48,946][00031] Num frames 11700... -[2025-02-27 20:51:49,068][00031] Num frames 11800... -[2025-02-27 20:51:49,196][00031] Num frames 11900... -[2025-02-27 20:51:49,323][00031] Num frames 12000... -[2025-02-27 20:51:49,459][00031] Avg episode rewards: #0: 30.366, true rewards: #0: 12.066 -[2025-02-27 20:51:49,460][00031] Avg episode reward: 30.366, avg true_objective: 12.066 -[2025-02-27 20:52:26,202][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4! -[2025-02-27 20:52:29,435][00031] The model has been pushed to https://huggingface.co/francescosabbarese/rl_course_vizdoom_health_gathering_supreme -[2025-02-27 20:52:29,469][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json -[2025-02-27 20:52:29,469][00031] Overriding arg 'num_workers' with value 1 passed from command line -[2025-02-27 20:52:29,470][00031] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-02-27 20:52:29,471][00031] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-02-27 20:52:29,472][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-02-27 20:52:29,473][00031] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-02-27 20:52:29,475][00031] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-02-27 20:52:29,475][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-02-27 20:52:29,476][00031] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-02-27 20:52:29,478][00031] Adding new argument 'hf_repository'='francescosabbarese/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-02-27 20:52:29,478][00031] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-02-27 20:52:29,479][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-02-27 20:52:29,480][00031] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-02-27 20:52:29,481][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-02-27 20:52:29,482][00031] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-02-27 20:52:29,505][00031] RunningMeanStd input shape: (3, 72, 128) -[2025-02-27 20:52:29,506][00031] RunningMeanStd input shape: (1,) -[2025-02-27 20:52:29,517][00031] ConvEncoder: input_channels=3 -[2025-02-27 20:52:29,557][00031] Conv encoder output size: 512 -[2025-02-27 20:52:29,558][00031] Policy head output size: 512 -[2025-02-27 20:52:29,579][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_10584064.pth... -[2025-02-27 20:52:30,035][00031] Num frames 100... -[2025-02-27 20:52:30,159][00031] Num frames 200... -[2025-02-27 20:52:30,276][00031] Num frames 300... -[2025-02-27 20:52:30,391][00031] Num frames 400... -[2025-02-27 20:52:30,507][00031] Num frames 500... -[2025-02-27 20:52:30,626][00031] Num frames 600... -[2025-02-27 20:52:30,743][00031] Num frames 700... -[2025-02-27 20:52:30,861][00031] Num frames 800... -[2025-02-27 20:52:30,984][00031] Num frames 900... -[2025-02-27 20:52:31,109][00031] Num frames 1000... -[2025-02-27 20:52:31,233][00031] Num frames 1100... -[2025-02-27 20:52:31,357][00031] Num frames 1200... -[2025-02-27 20:52:31,481][00031] Num frames 1300... -[2025-02-27 20:52:31,603][00031] Num frames 1400... -[2025-02-27 20:52:31,722][00031] Num frames 1500... -[2025-02-27 20:52:31,843][00031] Num frames 1600... -[2025-02-27 20:52:31,966][00031] Num frames 1700... -[2025-02-27 20:52:32,088][00031] Num frames 1800... -[2025-02-27 20:52:32,249][00031] Avg episode rewards: #0: 52.849, true rewards: #0: 18.850 -[2025-02-27 20:52:32,250][00031] Avg episode reward: 52.849, avg true_objective: 18.850 -[2025-02-27 20:52:32,268][00031] Num frames 1900... -[2025-02-27 20:52:32,391][00031] Num frames 2000... -[2025-02-27 20:52:32,513][00031] Num frames 2100... -[2025-02-27 20:52:32,631][00031] Num frames 2200... -[2025-02-27 20:52:32,744][00031] Avg episode rewards: #0: 30.250, true rewards: #0: 11.250 -[2025-02-27 20:52:32,745][00031] Avg episode reward: 30.250, avg true_objective: 11.250 -[2025-02-27 20:52:32,803][00031] Num frames 2300... -[2025-02-27 20:52:32,917][00031] Num frames 2400... -[2025-02-27 20:52:33,034][00031] Num frames 2500... -[2025-02-27 20:52:33,156][00031] Num frames 2600... -[2025-02-27 20:52:33,272][00031] Num frames 2700... -[2025-02-27 20:52:33,388][00031] Num frames 2800... -[2025-02-27 20:52:33,507][00031] Num frames 2900... -[2025-02-27 20:52:33,661][00031] Avg episode rewards: #0: 24.953, true rewards: #0: 9.953 -[2025-02-27 20:52:33,662][00031] Avg episode reward: 24.953, avg true_objective: 9.953 -[2025-02-27 20:52:33,679][00031] Num frames 3000... -[2025-02-27 20:52:33,793][00031] Num frames 3100... -[2025-02-27 20:52:33,914][00031] Num frames 3200... -[2025-02-27 20:52:34,033][00031] Num frames 3300... -[2025-02-27 20:52:34,151][00031] Num frames 3400... -[2025-02-27 20:52:34,265][00031] Num frames 3500... -[2025-02-27 20:52:34,382][00031] Num frames 3600... -[2025-02-27 20:52:34,498][00031] Num frames 3700... -[2025-02-27 20:52:34,615][00031] Num frames 3800... -[2025-02-27 20:52:34,691][00031] Avg episode rewards: #0: 22.545, true rewards: #0: 9.545 -[2025-02-27 20:52:34,692][00031] Avg episode reward: 22.545, avg true_objective: 9.545 -[2025-02-27 20:52:34,785][00031] Num frames 3900... -[2025-02-27 20:52:34,924][00031] Num frames 4000... -[2025-02-27 20:52:35,082][00031] Num frames 4100... -[2025-02-27 20:52:35,222][00031] Num frames 4200... -[2025-02-27 20:52:35,371][00031] Num frames 4300... -[2025-02-27 20:52:35,498][00031] Num frames 4400... -[2025-02-27 20:52:35,676][00031] Avg episode rewards: #0: 20.780, true rewards: #0: 8.980 -[2025-02-27 20:52:35,677][00031] Avg episode reward: 20.780, avg true_objective: 8.980 -[2025-02-27 20:52:35,691][00031] Num frames 4500... -[2025-02-27 20:52:35,819][00031] Num frames 4600... -[2025-02-27 20:52:35,949][00031] Num frames 4700... -[2025-02-27 20:52:36,074][00031] Num frames 4800... -[2025-02-27 20:52:36,200][00031] Num frames 4900... -[2025-02-27 20:52:36,324][00031] Num frames 5000... -[2025-02-27 20:52:36,440][00031] Num frames 5100... -[2025-02-27 20:52:36,565][00031] Num frames 5200... -[2025-02-27 20:52:36,692][00031] Num frames 5300... -[2025-02-27 20:52:36,854][00031] Avg episode rewards: #0: 20.643, true rewards: #0: 8.977 -[2025-02-27 20:52:36,854][00031] Avg episode reward: 20.643, avg true_objective: 8.977 -[2025-02-27 20:52:36,871][00031] Num frames 5400... -[2025-02-27 20:52:36,997][00031] Num frames 5500... -[2025-02-27 20:52:37,124][00031] Num frames 5600... -[2025-02-27 20:52:37,250][00031] Num frames 5700... -[2025-02-27 20:52:37,377][00031] Num frames 5800... -[2025-02-27 20:52:37,501][00031] Num frames 5900... -[2025-02-27 20:52:37,619][00031] Num frames 6000... -[2025-02-27 20:52:37,737][00031] Num frames 6100... -[2025-02-27 20:52:37,854][00031] Num frames 6200... -[2025-02-27 20:52:37,970][00031] Num frames 6300... -[2025-02-27 20:52:38,086][00031] Num frames 6400... -[2025-02-27 20:52:38,206][00031] Num frames 6500... -[2025-02-27 20:52:38,367][00031] Avg episode rewards: #0: 21.559, true rewards: #0: 9.416 -[2025-02-27 20:52:38,368][00031] Avg episode reward: 21.559, avg true_objective: 9.416 -[2025-02-27 20:52:38,380][00031] Num frames 6600... -[2025-02-27 20:52:38,499][00031] Num frames 6700... -[2025-02-27 20:52:38,626][00031] Num frames 6800... -[2025-02-27 20:52:38,750][00031] Num frames 6900... -[2025-02-27 20:52:38,871][00031] Num frames 7000... -[2025-02-27 20:52:38,987][00031] Num frames 7100... -[2025-02-27 20:52:39,105][00031] Num frames 7200... -[2025-02-27 20:52:39,269][00031] Avg episode rewards: #0: 20.736, true rewards: #0: 9.111 -[2025-02-27 20:52:39,270][00031] Avg episode reward: 20.736, avg true_objective: 9.111 -[2025-02-27 20:52:39,283][00031] Num frames 7300... -[2025-02-27 20:52:39,404][00031] Num frames 7400... -[2025-02-27 20:52:39,529][00031] Num frames 7500... -[2025-02-27 20:52:39,653][00031] Num frames 7600... -[2025-02-27 20:52:39,769][00031] Num frames 7700... -[2025-02-27 20:52:39,891][00031] Num frames 7800... -[2025-02-27 20:52:40,011][00031] Num frames 7900... -[2025-02-27 20:52:40,138][00031] Num frames 8000... -[2025-02-27 20:52:40,264][00031] Num frames 8100... -[2025-02-27 20:52:40,391][00031] Num frames 8200... -[2025-02-27 20:52:40,515][00031] Num frames 8300... -[2025-02-27 20:52:40,641][00031] Num frames 8400... -[2025-02-27 20:52:40,769][00031] Num frames 8500... -[2025-02-27 20:52:40,897][00031] Num frames 8600... -[2025-02-27 20:52:41,020][00031] Num frames 8700... -[2025-02-27 20:52:41,143][00031] Num frames 8800... -[2025-02-27 20:52:41,267][00031] Num frames 8900... -[2025-02-27 20:52:41,391][00031] Num frames 9000... -[2025-02-27 20:52:41,513][00031] Num frames 9100... -[2025-02-27 20:52:41,635][00031] Num frames 9200... -[2025-02-27 20:52:41,767][00031] Avg episode rewards: #0: 24.734, true rewards: #0: 10.290 -[2025-02-27 20:52:41,768][00031] Avg episode reward: 24.734, avg true_objective: 10.290 -[2025-02-27 20:52:41,815][00031] Num frames 9300... -[2025-02-27 20:52:41,937][00031] Num frames 9400... -[2025-02-27 20:52:42,063][00031] Num frames 9500... -[2025-02-27 20:52:42,186][00031] Num frames 9600... -[2025-02-27 20:52:42,311][00031] Num frames 9700... -[2025-02-27 20:52:42,434][00031] Num frames 9800... -[2025-02-27 20:52:42,558][00031] Num frames 9900... -[2025-02-27 20:52:42,681][00031] Num frames 10000... -[2025-02-27 20:52:42,803][00031] Num frames 10100... -[2025-02-27 20:52:42,926][00031] Num frames 10200... -[2025-02-27 20:52:43,046][00031] Avg episode rewards: #0: 24.353, true rewards: #0: 10.253 -[2025-02-27 20:52:43,047][00031] Avg episode reward: 24.353, avg true_objective: 10.253 -[2025-02-27 20:53:00,256][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4! +[2025-02-27 20:55:33,281][00196] Using optimizer +[2025-02-27 20:55:35,096][00196] No checkpoints found +[2025-02-27 20:55:35,097][00196] Did not load from checkpoint, starting from scratch! +[2025-02-27 20:55:35,098][00196] Initialized policy 0 weights for model version 0 +[2025-02-27 20:55:35,103][00196] LearnerWorker_p0 finished initialization! +[2025-02-27 20:55:35,103][00196] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-02-27 20:55:35,104][00031] Heartbeat connected on LearnerWorker_p0 +[2025-02-27 20:55:35,190][00216] RunningMeanStd input shape: (3, 72, 128) +[2025-02-27 20:55:35,191][00216] RunningMeanStd input shape: (1,) +[2025-02-27 20:55:35,210][00216] ConvEncoder: input_channels=3 +[2025-02-27 20:55:35,332][00216] Conv encoder output size: 512 +[2025-02-27 20:55:35,333][00216] Policy head output size: 512 +[2025-02-27 20:55:35,397][00031] Inference worker 0-0 is ready! +[2025-02-27 20:55:35,399][00031] All inference workers are ready! Signal rollout workers to start! +[2025-02-27 20:55:35,704][00236] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,703][00228] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,711][00231] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,701][00221] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,712][00220] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,715][00217] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,715][00226] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,717][00223] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,720][00225] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,719][00219] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,721][00229] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,721][00235] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,720][00224] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,723][00233] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,725][00232] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,727][00234] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,724][00222] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,730][00227] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,734][00218] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:35,740][00230] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 20:55:37,053][00031] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:55:37,735][00217] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,066][00219] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,068][00235] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,088][00222] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,125][00218] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,499][00235] Decorrelating experience for 32 frames... +[2025-02-27 20:55:38,514][00236] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,507][00224] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,513][00228] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,520][00229] Decorrelating experience for 0 frames... +[2025-02-27 20:55:38,966][00236] Decorrelating experience for 32 frames... +[2025-02-27 20:55:39,002][00223] Decorrelating experience for 0 frames... +[2025-02-27 20:55:39,222][00233] Decorrelating experience for 0 frames... +[2025-02-27 20:55:39,340][00220] Decorrelating experience for 0 frames... +[2025-02-27 20:55:39,421][00224] Decorrelating experience for 32 frames... +[2025-02-27 20:55:39,448][00222] Decorrelating experience for 32 frames... +[2025-02-27 20:55:39,490][00218] Decorrelating experience for 32 frames... +[2025-02-27 20:55:39,665][00225] Decorrelating experience for 0 frames... +[2025-02-27 20:55:39,870][00231] Decorrelating experience for 0 frames... +[2025-02-27 20:55:39,919][00227] Decorrelating experience for 0 frames... +[2025-02-27 20:55:39,924][00230] Decorrelating experience for 0 frames... +[2025-02-27 20:55:40,144][00235] Decorrelating experience for 64 frames... +[2025-02-27 20:55:40,314][00228] Decorrelating experience for 32 frames... +[2025-02-27 20:55:40,365][00224] Decorrelating experience for 64 frames... +[2025-02-27 20:55:40,393][00233] Decorrelating experience for 32 frames... +[2025-02-27 20:55:40,618][00229] Decorrelating experience for 32 frames... +[2025-02-27 20:55:40,622][00230] Decorrelating experience for 32 frames... +[2025-02-27 20:55:40,712][00225] Decorrelating experience for 32 frames... +[2025-02-27 20:55:41,059][00222] Decorrelating experience for 64 frames... +[2025-02-27 20:55:41,135][00231] Decorrelating experience for 32 frames... +[2025-02-27 20:55:41,245][00227] Decorrelating experience for 32 frames... +[2025-02-27 20:55:41,344][00234] Decorrelating experience for 0 frames... +[2025-02-27 20:55:41,617][00234] Decorrelating experience for 32 frames... +[2025-02-27 20:55:41,711][00235] Decorrelating experience for 96 frames... +[2025-02-27 20:55:41,961][00225] Decorrelating experience for 64 frames... +[2025-02-27 20:55:41,966][00229] Decorrelating experience for 64 frames... +[2025-02-27 20:55:42,018][00236] Decorrelating experience for 64 frames... +[2025-02-27 20:55:42,053][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:55:42,272][00223] Decorrelating experience for 32 frames... +[2025-02-27 20:55:42,365][00231] Decorrelating experience for 64 frames... +[2025-02-27 20:55:42,477][00224] Decorrelating experience for 96 frames... +[2025-02-27 20:55:42,792][00233] Decorrelating experience for 64 frames... +[2025-02-27 20:55:42,870][00220] Decorrelating experience for 32 frames... +[2025-02-27 20:55:43,035][00227] Decorrelating experience for 64 frames... +[2025-02-27 20:55:43,183][00225] Decorrelating experience for 96 frames... +[2025-02-27 20:55:43,187][00230] Decorrelating experience for 64 frames... +[2025-02-27 20:55:43,277][00235] Decorrelating experience for 128 frames... +[2025-02-27 20:55:43,345][00231] Decorrelating experience for 96 frames... +[2025-02-27 20:55:43,455][00232] Decorrelating experience for 0 frames... +[2025-02-27 20:55:43,647][00222] Decorrelating experience for 96 frames... +[2025-02-27 20:55:43,708][00220] Decorrelating experience for 64 frames... +[2025-02-27 20:55:43,853][00236] Decorrelating experience for 96 frames... +[2025-02-27 20:55:44,396][00229] Decorrelating experience for 96 frames... +[2025-02-27 20:55:44,412][00224] Decorrelating experience for 128 frames... +[2025-02-27 20:55:44,481][00218] Decorrelating experience for 64 frames... +[2025-02-27 20:55:44,826][00226] Decorrelating experience for 0 frames... +[2025-02-27 20:55:44,833][00227] Decorrelating experience for 96 frames... +[2025-02-27 20:55:44,872][00230] Decorrelating experience for 96 frames... +[2025-02-27 20:55:44,903][00235] Decorrelating experience for 160 frames... +[2025-02-27 20:55:44,906][00231] Decorrelating experience for 128 frames... +[2025-02-27 20:55:45,303][00233] Decorrelating experience for 96 frames... +[2025-02-27 20:55:45,493][00227] Decorrelating experience for 128 frames... +[2025-02-27 20:55:45,971][00225] Decorrelating experience for 128 frames... +[2025-02-27 20:55:46,046][00229] Decorrelating experience for 128 frames... +[2025-02-27 20:55:46,147][00221] Decorrelating experience for 0 frames... +[2025-02-27 20:55:46,166][00228] Decorrelating experience for 64 frames... +[2025-02-27 20:55:46,335][00233] Decorrelating experience for 128 frames... +[2025-02-27 20:55:46,709][00225] Decorrelating experience for 160 frames... +[2025-02-27 20:55:46,760][00224] Decorrelating experience for 160 frames... +[2025-02-27 20:55:46,773][00232] Decorrelating experience for 32 frames... +[2025-02-27 20:55:46,867][00236] Decorrelating experience for 128 frames... +[2025-02-27 20:55:47,040][00227] Decorrelating experience for 160 frames... +[2025-02-27 20:55:47,053][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:55:47,275][00231] Decorrelating experience for 160 frames... +[2025-02-27 20:55:47,492][00218] Decorrelating experience for 96 frames... +[2025-02-27 20:55:47,533][00222] Decorrelating experience for 128 frames... +[2025-02-27 20:55:48,164][00221] Decorrelating experience for 32 frames... +[2025-02-27 20:55:48,264][00223] Decorrelating experience for 64 frames... +[2025-02-27 20:55:48,376][00217] Decorrelating experience for 32 frames... +[2025-02-27 20:55:48,437][00225] Decorrelating experience for 192 frames... +[2025-02-27 20:55:48,684][00218] Decorrelating experience for 128 frames... +[2025-02-27 20:55:48,704][00229] Decorrelating experience for 160 frames... +[2025-02-27 20:55:48,861][00228] Decorrelating experience for 96 frames... +[2025-02-27 20:55:49,266][00236] Decorrelating experience for 160 frames... +[2025-02-27 20:55:49,287][00219] Decorrelating experience for 32 frames... +[2025-02-27 20:55:49,389][00232] Decorrelating experience for 64 frames... +[2025-02-27 20:55:49,625][00217] Decorrelating experience for 64 frames... +[2025-02-27 20:55:49,626][00223] Decorrelating experience for 96 frames... +[2025-02-27 20:55:49,899][00221] Decorrelating experience for 64 frames... +[2025-02-27 20:55:50,154][00235] Decorrelating experience for 192 frames... +[2025-02-27 20:55:50,600][00219] Decorrelating experience for 64 frames... +[2025-02-27 20:55:50,624][00227] Decorrelating experience for 192 frames... +[2025-02-27 20:55:50,630][00218] Decorrelating experience for 160 frames... +[2025-02-27 20:55:50,684][00228] Decorrelating experience for 128 frames... +[2025-02-27 20:55:50,715][00217] Decorrelating experience for 96 frames... +[2025-02-27 20:55:50,834][00222] Decorrelating experience for 160 frames... +[2025-02-27 20:55:50,890][00231] Decorrelating experience for 192 frames... +[2025-02-27 20:55:51,508][00224] Decorrelating experience for 192 frames... +[2025-02-27 20:55:51,642][00220] Decorrelating experience for 96 frames... +[2025-02-27 20:55:51,847][00233] Decorrelating experience for 160 frames... +[2025-02-27 20:55:52,053][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:55:52,129][00221] Decorrelating experience for 96 frames... +[2025-02-27 20:55:52,241][00223] Decorrelating experience for 128 frames... +[2025-02-27 20:55:52,305][00227] Decorrelating experience for 224 frames... +[2025-02-27 20:55:52,334][00235] Decorrelating experience for 224 frames... +[2025-02-27 20:55:52,402][00236] Decorrelating experience for 192 frames... +[2025-02-27 20:55:53,171][00217] Decorrelating experience for 128 frames... +[2025-02-27 20:55:53,240][00231] Decorrelating experience for 224 frames... +[2025-02-27 20:55:53,410][00219] Decorrelating experience for 96 frames... +[2025-02-27 20:55:53,508][00229] Decorrelating experience for 192 frames... +[2025-02-27 20:55:53,514][00222] Decorrelating experience for 192 frames... +[2025-02-27 20:55:53,598][00232] Decorrelating experience for 96 frames... +[2025-02-27 20:55:53,640][00218] Decorrelating experience for 192 frames... +[2025-02-27 20:55:53,777][00224] Decorrelating experience for 224 frames... +[2025-02-27 20:55:54,358][00228] Decorrelating experience for 160 frames... +[2025-02-27 20:55:54,743][00223] Decorrelating experience for 160 frames... +[2025-02-27 20:55:54,792][00234] Decorrelating experience for 64 frames... +[2025-02-27 20:55:54,903][00227] Decorrelating experience for 256 frames... +[2025-02-27 20:55:54,991][00220] Decorrelating experience for 128 frames... +[2025-02-27 20:55:55,235][00233] Decorrelating experience for 192 frames... +[2025-02-27 20:55:55,277][00229] Decorrelating experience for 224 frames... +[2025-02-27 20:55:55,330][00235] Decorrelating experience for 256 frames... +[2025-02-27 20:55:55,696][00231] Decorrelating experience for 256 frames... +[2025-02-27 20:55:56,138][00232] Decorrelating experience for 128 frames... +[2025-02-27 20:55:56,230][00219] Decorrelating experience for 128 frames... +[2025-02-27 20:55:56,273][00236] Decorrelating experience for 224 frames... +[2025-02-27 20:55:56,300][00220] Decorrelating experience for 160 frames... +[2025-02-27 20:55:56,477][00230] Decorrelating experience for 128 frames... +[2025-02-27 20:55:56,799][00234] Decorrelating experience for 96 frames... +[2025-02-27 20:55:56,996][00224] Decorrelating experience for 256 frames... +[2025-02-27 20:55:57,053][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:55:57,253][00233] Decorrelating experience for 224 frames... +[2025-02-27 20:55:57,347][00218] Decorrelating experience for 224 frames... +[2025-02-27 20:55:57,495][00226] Decorrelating experience for 32 frames... +[2025-02-27 20:55:57,596][00219] Decorrelating experience for 160 frames... +[2025-02-27 20:55:57,780][00222] Decorrelating experience for 224 frames... +[2025-02-27 20:55:58,001][00228] Decorrelating experience for 192 frames... +[2025-02-27 20:55:58,169][00227] Decorrelating experience for 288 frames... +[2025-02-27 20:55:58,281][00232] Decorrelating experience for 160 frames... +[2025-02-27 20:55:58,534][00220] Decorrelating experience for 192 frames... +[2025-02-27 20:55:58,552][00223] Decorrelating experience for 192 frames... +[2025-02-27 20:55:59,003][00234] Decorrelating experience for 128 frames... +[2025-02-27 20:55:59,132][00236] Decorrelating experience for 256 frames... +[2025-02-27 20:55:59,198][00235] Decorrelating experience for 288 frames... +[2025-02-27 20:55:59,344][00217] Decorrelating experience for 160 frames... +[2025-02-27 20:55:59,469][00218] Decorrelating experience for 256 frames... +[2025-02-27 20:55:59,684][00231] Decorrelating experience for 288 frames... +[2025-02-27 20:56:00,024][00221] Decorrelating experience for 128 frames... +[2025-02-27 20:56:00,324][00226] Decorrelating experience for 64 frames... +[2025-02-27 20:56:00,393][00224] Decorrelating experience for 288 frames... +[2025-02-27 20:56:00,732][00220] Decorrelating experience for 224 frames... +[2025-02-27 20:56:00,938][00232] Decorrelating experience for 192 frames... +[2025-02-27 20:56:01,158][00225] Decorrelating experience for 224 frames... +[2025-02-27 20:56:01,209][00221] Decorrelating experience for 160 frames... +[2025-02-27 20:56:01,519][00219] Decorrelating experience for 192 frames... +[2025-02-27 20:56:01,589][00229] Decorrelating experience for 256 frames... +[2025-02-27 20:56:01,671][00226] Decorrelating experience for 96 frames... +[2025-02-27 20:56:02,053][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:56:02,119][00224] Decorrelating experience for 320 frames... +[2025-02-27 20:56:02,156][00218] Decorrelating experience for 288 frames... +[2025-02-27 20:56:02,251][00231] Decorrelating experience for 320 frames... +[2025-02-27 20:56:02,621][00236] Decorrelating experience for 288 frames... +[2025-02-27 20:56:02,693][00233] Decorrelating experience for 256 frames... +[2025-02-27 20:56:03,302][00217] Decorrelating experience for 192 frames... +[2025-02-27 20:56:03,545][00219] Decorrelating experience for 224 frames... +[2025-02-27 20:56:03,823][00230] Decorrelating experience for 160 frames... +[2025-02-27 20:56:03,888][00225] Decorrelating experience for 256 frames... +[2025-02-27 20:56:04,215][00231] Decorrelating experience for 352 frames... +[2025-02-27 20:56:04,306][00227] Decorrelating experience for 320 frames... +[2025-02-27 20:56:04,520][00232] Decorrelating experience for 224 frames... +[2025-02-27 20:56:04,672][00228] Decorrelating experience for 224 frames... +[2025-02-27 20:56:05,463][00229] Decorrelating experience for 288 frames... +[2025-02-27 20:56:05,575][00218] Decorrelating experience for 320 frames... +[2025-02-27 20:56:05,783][00233] Decorrelating experience for 288 frames... +[2025-02-27 20:56:05,792][00235] Decorrelating experience for 320 frames... +[2025-02-27 20:56:06,067][00224] Decorrelating experience for 352 frames... +[2025-02-27 20:56:06,239][00236] Decorrelating experience for 320 frames... +[2025-02-27 20:56:06,372][00217] Decorrelating experience for 224 frames... +[2025-02-27 20:56:06,572][00226] Decorrelating experience for 128 frames... +[2025-02-27 20:56:06,965][00230] Decorrelating experience for 192 frames... +[2025-02-27 20:56:07,055][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.6. Samples: 18. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:56:07,394][00232] Decorrelating experience for 256 frames... +[2025-02-27 20:56:07,470][00220] Decorrelating experience for 256 frames... +[2025-02-27 20:56:07,616][00219] Decorrelating experience for 256 frames... +[2025-02-27 20:56:07,616][00225] Decorrelating experience for 288 frames... +[2025-02-27 20:56:07,690][00230] Decorrelating experience for 224 frames... +[2025-02-27 20:56:07,900][00223] Decorrelating experience for 224 frames... +[2025-02-27 20:56:08,443][00230] Decorrelating experience for 256 frames... +[2025-02-27 20:56:09,085][00221] Decorrelating experience for 192 frames... +[2025-02-27 20:56:09,176][00231] Worker 14, sleep for 0.700 sec to decorrelate experience collection +[2025-02-27 20:56:09,180][00233] Decorrelating experience for 320 frames... +[2025-02-27 20:56:09,181][00224] Worker 7, sleep for 0.350 sec to decorrelate experience collection +[2025-02-27 20:56:09,194][00229] Decorrelating experience for 320 frames... +[2025-02-27 20:56:09,317][00236] Decorrelating experience for 352 frames... +[2025-02-27 20:56:09,371][00226] Decorrelating experience for 160 frames... +[2025-02-27 20:56:09,468][00235] Decorrelating experience for 352 frames... +[2025-02-27 20:56:09,536][00224] Worker 7 awakens! +[2025-02-27 20:56:09,887][00231] Worker 14 awakens! +[2025-02-27 20:56:09,978][00232] Decorrelating experience for 288 frames... +[2025-02-27 20:56:10,426][00234] Decorrelating experience for 160 frames... +[2025-02-27 20:56:10,428][00227] Decorrelating experience for 352 frames... +[2025-02-27 20:56:10,709][00225] Decorrelating experience for 320 frames... +[2025-02-27 20:56:10,862][00219] Decorrelating experience for 288 frames... +[2025-02-27 20:56:11,008][00230] Decorrelating experience for 288 frames... +[2025-02-27 20:56:11,347][00220] Decorrelating experience for 288 frames... +[2025-02-27 20:56:11,612][00217] Decorrelating experience for 256 frames... +[2025-02-27 20:56:11,817][00221] Decorrelating experience for 224 frames... +[2025-02-27 20:56:12,053][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 40.6. Samples: 1422. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:56:12,056][00031] Avg episode reward: [(0, '1.598')] +[2025-02-27 20:56:12,257][00218] Decorrelating experience for 352 frames... +[2025-02-27 20:56:12,791][00223] Decorrelating experience for 256 frames... +[2025-02-27 20:56:12,928][00236] Worker 19, sleep for 0.950 sec to decorrelate experience collection +[2025-02-27 20:56:12,962][00233] Decorrelating experience for 352 frames... +[2025-02-27 20:56:13,243][00229] Decorrelating experience for 352 frames... +[2025-02-27 20:56:13,452][00226] Decorrelating experience for 192 frames... +[2025-02-27 20:56:13,884][00236] Worker 19 awakens! +[2025-02-27 20:56:14,010][00230] Decorrelating experience for 320 frames... +[2025-02-27 20:56:14,082][00234] Decorrelating experience for 192 frames... +[2025-02-27 20:56:14,136][00235] Worker 18, sleep for 0.900 sec to decorrelate experience collection +[2025-02-27 20:56:14,273][00227] Worker 9, sleep for 0.450 sec to decorrelate experience collection +[2025-02-27 20:56:14,701][00219] Decorrelating experience for 320 frames... +[2025-02-27 20:56:14,732][00227] Worker 9 awakens! +[2025-02-27 20:56:14,822][00217] Decorrelating experience for 288 frames... +[2025-02-27 20:56:15,045][00235] Worker 18 awakens! +[2025-02-27 20:56:15,360][00221] Decorrelating experience for 256 frames... +[2025-02-27 20:56:15,398][00225] Decorrelating experience for 352 frames... +[2025-02-27 20:56:16,097][00228] Decorrelating experience for 256 frames... +[2025-02-27 20:56:16,320][00218] Worker 1, sleep for 0.050 sec to decorrelate experience collection +[2025-02-27 20:56:16,376][00218] Worker 1 awakens! +[2025-02-27 20:56:16,620][00196] Signal inference workers to stop experience collection... +[2025-02-27 20:56:16,672][00216] InferenceWorker_p0-w0: stopping experience collection +[2025-02-27 20:56:17,053][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 91.1. Samples: 3642. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-02-27 20:56:17,054][00031] Avg episode reward: [(0, '2.011')] +[2025-02-27 20:56:17,184][00217] Decorrelating experience for 320 frames... +[2025-02-27 20:56:17,185][00232] Decorrelating experience for 320 frames... +[2025-02-27 20:56:17,281][00234] Decorrelating experience for 224 frames... +[2025-02-27 20:56:17,338][00230] Decorrelating experience for 352 frames... +[2025-02-27 20:56:17,743][00222] Decorrelating experience for 256 frames... +[2025-02-27 20:56:17,894][00219] Decorrelating experience for 352 frames... +[2025-02-27 20:56:18,024][00221] Decorrelating experience for 288 frames... +[2025-02-27 20:56:18,306][00196] Signal inference workers to resume experience collection... +[2025-02-27 20:56:18,307][00216] InferenceWorker_p0-w0: resuming experience collection +[2025-02-27 20:56:19,023][00223] Decorrelating experience for 288 frames... +[2025-02-27 20:56:19,230][00233] Worker 16, sleep for 0.800 sec to decorrelate experience collection +[2025-02-27 20:56:19,400][00228] Decorrelating experience for 288 frames... +[2025-02-27 20:56:19,629][00229] Worker 12, sleep for 0.600 sec to decorrelate experience collection +[2025-02-27 20:56:19,866][00217] Decorrelating experience for 352 frames... +[2025-02-27 20:56:20,035][00233] Worker 16 awakens! +[2025-02-27 20:56:20,109][00234] Decorrelating experience for 256 frames... +[2025-02-27 20:56:20,236][00229] Worker 12 awakens! +[2025-02-27 20:56:20,692][00225] Worker 8, sleep for 0.400 sec to decorrelate experience collection +[2025-02-27 20:56:21,097][00225] Worker 8 awakens! +[2025-02-27 20:56:21,344][00222] Decorrelating experience for 288 frames... +[2025-02-27 20:56:21,503][00230] Worker 13, sleep for 0.650 sec to decorrelate experience collection +[2025-02-27 20:56:21,625][00221] Decorrelating experience for 320 frames... +[2025-02-27 20:56:21,649][00226] Decorrelating experience for 224 frames... +[2025-02-27 20:56:21,738][00219] Worker 2, sleep for 0.100 sec to decorrelate experience collection +[2025-02-27 20:56:21,847][00219] Worker 2 awakens! +[2025-02-27 20:56:22,053][00031] Fps is (10 sec: 3276.8, 60 sec: 728.2, 300 sec: 728.2). Total num frames: 32768. Throughput: 0: 197.6. Samples: 8892. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-02-27 20:56:22,055][00031] Avg episode reward: [(0, '2.422')] +[2025-02-27 20:56:22,164][00230] Worker 13 awakens! +[2025-02-27 20:56:22,838][00223] Decorrelating experience for 320 frames... +[2025-02-27 20:56:23,730][00234] Decorrelating experience for 288 frames... +[2025-02-27 20:56:23,912][00232] Decorrelating experience for 352 frames... +[2025-02-27 20:56:23,919][00228] Decorrelating experience for 320 frames... +[2025-02-27 20:56:25,167][00222] Decorrelating experience for 320 frames... +[2025-02-27 20:56:25,738][00221] Decorrelating experience for 352 frames... +[2025-02-27 20:56:26,317][00226] Decorrelating experience for 256 frames... +[2025-02-27 20:56:26,407][00220] Decorrelating experience for 320 frames... +[2025-02-27 20:56:26,932][00223] Decorrelating experience for 352 frames... +[2025-02-27 20:56:27,053][00031] Fps is (10 sec: 5734.4, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 57344. Throughput: 0: 405.3. Samples: 18240. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2025-02-27 20:56:27,054][00031] Avg episode reward: [(0, '3.462')] +[2025-02-27 20:56:27,753][00234] Decorrelating experience for 320 frames... +[2025-02-27 20:56:28,108][00228] Decorrelating experience for 352 frames... +[2025-02-27 20:56:28,983][00232] Worker 15, sleep for 0.750 sec to decorrelate experience collection +[2025-02-27 20:56:29,192][00216] Updated weights for policy 0, policy_version 10 (0.0017) +[2025-02-27 20:56:29,359][00222] Decorrelating experience for 352 frames... +[2025-02-27 20:56:29,739][00232] Worker 15 awakens! +[2025-02-27 20:56:29,994][00226] Decorrelating experience for 288 frames... +[2025-02-27 20:56:30,562][00221] Worker 3, sleep for 0.150 sec to decorrelate experience collection +[2025-02-27 20:56:30,714][00221] Worker 3 awakens! +[2025-02-27 20:56:30,850][00220] Decorrelating experience for 352 frames... +[2025-02-27 20:56:32,053][00031] Fps is (10 sec: 6553.7, 60 sec: 1787.3, 300 sec: 1787.3). Total num frames: 98304. Throughput: 0: 518.9. Samples: 23352. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-02-27 20:56:32,055][00031] Avg episode reward: [(0, '3.936')] +[2025-02-27 20:56:32,092][00223] Worker 6, sleep for 0.300 sec to decorrelate experience collection +[2025-02-27 20:56:32,407][00223] Worker 6 awakens! +[2025-02-27 20:56:33,068][00228] Worker 11, sleep for 0.550 sec to decorrelate experience collection +[2025-02-27 20:56:33,071][00234] Decorrelating experience for 352 frames... +[2025-02-27 20:56:33,623][00228] Worker 11 awakens! +[2025-02-27 20:56:34,022][00222] Worker 5, sleep for 0.250 sec to decorrelate experience collection +[2025-02-27 20:56:34,273][00226] Decorrelating experience for 320 frames... +[2025-02-27 20:56:34,278][00222] Worker 5 awakens! +[2025-02-27 20:56:36,352][00220] Worker 4, sleep for 0.200 sec to decorrelate experience collection +[2025-02-27 20:56:36,556][00220] Worker 4 awakens! +[2025-02-27 20:56:37,053][00031] Fps is (10 sec: 7372.5, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 131072. Throughput: 0: 799.1. Samples: 35958. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-02-27 20:56:37,058][00031] Avg episode reward: [(0, '3.692')] +[2025-02-27 20:56:37,114][00196] Saving new best policy, reward=3.692! +[2025-02-27 20:56:38,717][00226] Decorrelating experience for 352 frames... +[2025-02-27 20:56:39,023][00234] Worker 17, sleep for 0.850 sec to decorrelate experience collection +[2025-02-27 20:56:39,889][00234] Worker 17 awakens! +[2025-02-27 20:56:40,077][00216] Updated weights for policy 0, policy_version 20 (0.0019) +[2025-02-27 20:56:42,053][00031] Fps is (10 sec: 8191.9, 60 sec: 3003.7, 300 sec: 2772.7). Total num frames: 180224. Throughput: 0: 1075.3. Samples: 48390. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 20:56:42,055][00031] Avg episode reward: [(0, '4.196')] +[2025-02-27 20:56:42,056][00196] Saving new best policy, reward=4.196! +[2025-02-27 20:56:43,469][00226] Worker 10, sleep for 0.500 sec to decorrelate experience collection +[2025-02-27 20:56:43,976][00226] Worker 10 awakens! +[2025-02-27 20:56:47,055][00031] Fps is (10 sec: 9828.9, 60 sec: 3822.8, 300 sec: 3276.7). Total num frames: 229376. Throughput: 0: 1232.9. Samples: 55482. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 20:56:47,059][00031] Avg episode reward: [(0, '4.344')] +[2025-02-27 20:56:47,067][00196] Saving new best policy, reward=4.344! +[2025-02-27 20:56:48,334][00216] Updated weights for policy 0, policy_version 30 (0.0017) +[2025-02-27 20:56:52,053][00031] Fps is (10 sec: 9011.2, 60 sec: 4505.6, 300 sec: 3604.5). Total num frames: 270336. Throughput: 0: 1552.7. Samples: 69888. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 20:56:52,056][00031] Avg episode reward: [(0, '4.344')] +[2025-02-27 20:56:56,895][00216] Updated weights for policy 0, policy_version 40 (0.0022) +[2025-02-27 20:56:57,053][00031] Fps is (10 sec: 9832.3, 60 sec: 5461.3, 300 sec: 4096.0). Total num frames: 327680. Throughput: 0: 1847.7. Samples: 84570. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 20:56:57,055][00031] Avg episode reward: [(0, '4.420')] +[2025-02-27 20:56:57,065][00196] Saving new best policy, reward=4.420! +[2025-02-27 20:57:02,053][00031] Fps is (10 sec: 9830.4, 60 sec: 6144.0, 300 sec: 4336.9). Total num frames: 368640. Throughput: 0: 1958.3. Samples: 91764. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 20:57:02,055][00031] Avg episode reward: [(0, '4.477')] +[2025-02-27 20:57:02,056][00196] Saving new best policy, reward=4.477! +[2025-02-27 20:57:05,623][00216] Updated weights for policy 0, policy_version 50 (0.0016) +[2025-02-27 20:57:07,053][00031] Fps is (10 sec: 9830.4, 60 sec: 7100.0, 300 sec: 4733.2). Total num frames: 425984. Throughput: 0: 2166.8. Samples: 106398. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 20:57:07,056][00031] Avg episode reward: [(0, '4.454')] +[2025-02-27 20:57:07,065][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000052_425984.pth... +[2025-02-27 20:57:12,053][00031] Fps is (10 sec: 9830.1, 60 sec: 7782.4, 300 sec: 4915.2). Total num frames: 466944. Throughput: 0: 2261.6. Samples: 120012. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 20:57:12,055][00031] Avg episode reward: [(0, '4.553')] +[2025-02-27 20:57:12,058][00196] Saving new best policy, reward=4.553! +[2025-02-27 20:57:14,343][00216] Updated weights for policy 0, policy_version 60 (0.0022) +[2025-02-27 20:57:17,054][00031] Fps is (10 sec: 9009.9, 60 sec: 8601.4, 300 sec: 5160.9). Total num frames: 516096. Throughput: 0: 2311.0. Samples: 127350. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 20:57:17,056][00031] Avg episode reward: [(0, '4.298')] +[2025-02-27 20:57:22,053][00031] Fps is (10 sec: 9830.5, 60 sec: 8874.6, 300 sec: 5383.3). Total num frames: 565248. Throughput: 0: 2379.5. Samples: 143034. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 20:57:22,054][00031] Avg episode reward: [(0, '4.590')] +[2025-02-27 20:57:22,057][00196] Saving new best policy, reward=4.590! +[2025-02-27 20:57:22,295][00216] Updated weights for policy 0, policy_version 70 (0.0017) +[2025-02-27 20:57:27,054][00031] Fps is (10 sec: 10649.6, 60 sec: 9420.6, 300 sec: 5659.9). Total num frames: 622592. Throughput: 0: 2441.3. Samples: 158250. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 20:57:27,057][00031] Avg episode reward: [(0, '4.576')] +[2025-02-27 20:57:30,107][00216] Updated weights for policy 0, policy_version 80 (0.0016) +[2025-02-27 20:57:32,053][00031] Fps is (10 sec: 10649.9, 60 sec: 9557.3, 300 sec: 5841.3). Total num frames: 671744. Throughput: 0: 2454.1. Samples: 165912. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 20:57:32,056][00031] Avg episode reward: [(0, '4.488')] +[2025-02-27 20:57:37,053][00031] Fps is (10 sec: 9831.6, 60 sec: 9830.4, 300 sec: 6007.5). Total num frames: 720896. Throughput: 0: 2474.8. Samples: 181254. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 20:57:37,056][00031] Avg episode reward: [(0, '4.539')] +[2025-02-27 20:57:38,102][00216] Updated weights for policy 0, policy_version 90 (0.0016) +[2025-02-27 20:57:42,053][00031] Fps is (10 sec: 10649.2, 60 sec: 9966.9, 300 sec: 6225.9). Total num frames: 778240. Throughput: 0: 2497.1. Samples: 196938. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 20:57:42,059][00031] Avg episode reward: [(0, '4.610')] +[2025-02-27 20:57:42,063][00196] Saving new best policy, reward=4.610! +[2025-02-27 20:57:46,901][00216] Updated weights for policy 0, policy_version 100 (0.0018) +[2025-02-27 20:57:47,053][00031] Fps is (10 sec: 9830.5, 60 sec: 9830.7, 300 sec: 6301.5). Total num frames: 819200. Throughput: 0: 2474.9. Samples: 203136. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 20:57:47,058][00031] Avg episode reward: [(0, '4.738')] +[2025-02-27 20:57:47,066][00196] Saving new best policy, reward=4.738! +[2025-02-27 20:57:52,054][00031] Fps is (10 sec: 9010.1, 60 sec: 9966.7, 300 sec: 6432.2). Total num frames: 868352. Throughput: 0: 2490.3. Samples: 218466. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 20:57:52,057][00031] Avg episode reward: [(0, '4.732')] +[2025-02-27 20:57:54,558][00216] Updated weights for policy 0, policy_version 110 (0.0016) +[2025-02-27 20:57:57,053][00031] Fps is (10 sec: 9830.5, 60 sec: 9830.4, 300 sec: 6553.6). Total num frames: 917504. Throughput: 0: 2529.8. Samples: 233850. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 20:57:57,055][00031] Avg episode reward: [(0, '4.845')] +[2025-02-27 20:57:57,077][00196] Saving new best policy, reward=4.845! +[2025-02-27 20:58:02,053][00031] Fps is (10 sec: 10651.3, 60 sec: 10103.5, 300 sec: 6723.1). Total num frames: 974848. Throughput: 0: 2538.1. Samples: 241560. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 20:58:02,055][00031] Avg episode reward: [(0, '4.601')] +[2025-02-27 20:58:02,842][00216] Updated weights for policy 0, policy_version 120 (0.0016) +[2025-02-27 20:58:07,053][00031] Fps is (10 sec: 10649.6, 60 sec: 9966.9, 300 sec: 6826.7). Total num frames: 1024000. Throughput: 0: 2528.9. Samples: 256836. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 20:58:07,055][00031] Avg episode reward: [(0, '4.720')] +[2025-02-27 20:58:10,872][00216] Updated weights for policy 0, policy_version 130 (0.0016) +[2025-02-27 20:58:12,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.5, 300 sec: 6923.6). Total num frames: 1073152. Throughput: 0: 2535.9. Samples: 272364. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 20:58:12,055][00031] Avg episode reward: [(0, '4.546')] +[2025-02-27 20:58:17,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.7, 300 sec: 7014.4). Total num frames: 1122304. Throughput: 0: 2532.8. Samples: 279888. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 20:58:17,054][00031] Avg episode reward: [(0, '4.948')] +[2025-02-27 20:58:17,063][00196] Saving new best policy, reward=4.948! +[2025-02-27 20:58:19,219][00216] Updated weights for policy 0, policy_version 140 (0.0021) +[2025-02-27 20:58:22,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 7099.7). Total num frames: 1171456. Throughput: 0: 2500.4. Samples: 293772. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 20:58:22,055][00031] Avg episode reward: [(0, '4.726')] +[2025-02-27 20:58:27,053][00031] Fps is (10 sec: 9830.0, 60 sec: 9967.1, 300 sec: 7180.0). Total num frames: 1220608. Throughput: 0: 2493.1. Samples: 309126. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 20:58:27,056][00031] Avg episode reward: [(0, '4.745')] +[2025-02-27 20:58:27,222][00216] Updated weights for policy 0, policy_version 150 (0.0016) +[2025-02-27 20:58:32,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9966.9, 300 sec: 7255.8). Total num frames: 1269760. Throughput: 0: 2524.9. Samples: 316758. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 20:58:32,055][00031] Avg episode reward: [(0, '4.899')] +[2025-02-27 20:58:35,388][00216] Updated weights for policy 0, policy_version 160 (0.0016) +[2025-02-27 20:58:37,053][00031] Fps is (10 sec: 10650.1, 60 sec: 10103.5, 300 sec: 7372.8). Total num frames: 1327104. Throughput: 0: 2521.7. Samples: 331938. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 20:58:37,055][00031] Avg episode reward: [(0, '4.856')] +[2025-02-27 20:58:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 9967.0, 300 sec: 7439.2). Total num frames: 1376256. Throughput: 0: 2524.4. Samples: 347448. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 20:58:42,054][00031] Avg episode reward: [(0, '4.964')] +[2025-02-27 20:58:42,057][00196] Saving new best policy, reward=4.964! +[2025-02-27 20:58:43,410][00216] Updated weights for policy 0, policy_version 170 (0.0016) +[2025-02-27 20:58:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 7502.1). Total num frames: 1425408. Throughput: 0: 2518.9. Samples: 354912. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 20:58:47,057][00031] Avg episode reward: [(0, '5.122')] +[2025-02-27 20:58:47,065][00196] Saving new best policy, reward=5.122! +[2025-02-27 20:58:51,765][00216] Updated weights for policy 0, policy_version 180 (0.0019) +[2025-02-27 20:58:52,057][00031] Fps is (10 sec: 9826.8, 60 sec: 10103.1, 300 sec: 7561.7). Total num frames: 1474560. Throughput: 0: 2496.6. Samples: 369192. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 20:58:52,058][00031] Avg episode reward: [(0, '5.314')] +[2025-02-27 20:58:52,060][00196] Saving new best policy, reward=5.314! +[2025-02-27 20:58:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 7618.6). Total num frames: 1523712. Throughput: 0: 2487.1. Samples: 384282. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 20:58:57,055][00031] Avg episode reward: [(0, '5.807')] +[2025-02-27 20:58:57,063][00196] Saving new best policy, reward=5.807! +[2025-02-27 20:58:59,776][00216] Updated weights for policy 0, policy_version 190 (0.0016) +[2025-02-27 20:59:02,053][00031] Fps is (10 sec: 9834.0, 60 sec: 9966.9, 300 sec: 7672.5). Total num frames: 1572864. Throughput: 0: 2491.3. Samples: 391998. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 20:59:02,055][00031] Avg episode reward: [(0, '5.882')] +[2025-02-27 20:59:02,057][00196] Saving new best policy, reward=5.882! +[2025-02-27 20:59:07,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9966.9, 300 sec: 7723.9). Total num frames: 1622016. Throughput: 0: 2525.6. Samples: 407424. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 20:59:07,056][00031] Avg episode reward: [(0, '5.361')] +[2025-02-27 20:59:07,066][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000198_1622016.pth... +[2025-02-27 20:59:07,829][00216] Updated weights for policy 0, policy_version 200 (0.0017) +[2025-02-27 20:59:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9966.9, 300 sec: 7772.9). Total num frames: 1671168. Throughput: 0: 2525.9. Samples: 422790. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 20:59:12,055][00031] Avg episode reward: [(0, '5.541')] +[2025-02-27 20:59:15,992][00216] Updated weights for policy 0, policy_version 210 (0.0016) +[2025-02-27 20:59:17,054][00031] Fps is (10 sec: 10647.9, 60 sec: 10103.2, 300 sec: 7856.8). Total num frames: 1728512. Throughput: 0: 2522.7. Samples: 430284. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 20:59:17,056][00031] Avg episode reward: [(0, '6.329')] +[2025-02-27 20:59:17,067][00196] Saving new best policy, reward=6.329! +[2025-02-27 20:59:22,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10103.5, 300 sec: 7900.7). Total num frames: 1777664. Throughput: 0: 2528.0. Samples: 445698. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 20:59:22,055][00031] Avg episode reward: [(0, '6.442')] +[2025-02-27 20:59:22,056][00196] Saving new best policy, reward=6.442! +[2025-02-27 20:59:24,901][00216] Updated weights for policy 0, policy_version 220 (0.0016) +[2025-02-27 20:59:27,053][00031] Fps is (10 sec: 9831.9, 60 sec: 10103.5, 300 sec: 7942.7). Total num frames: 1826816. Throughput: 0: 2486.8. Samples: 459354. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 20:59:27,054][00031] Avg episode reward: [(0, '6.465')] +[2025-02-27 20:59:27,064][00196] Saving new best policy, reward=6.465! +[2025-02-27 20:59:32,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10103.4, 300 sec: 7982.8). Total num frames: 1875968. Throughput: 0: 2490.8. Samples: 466998. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 20:59:32,054][00031] Avg episode reward: [(0, '6.982')] +[2025-02-27 20:59:32,056][00196] Saving new best policy, reward=6.982! +[2025-02-27 20:59:32,571][00216] Updated weights for policy 0, policy_version 230 (0.0016) +[2025-02-27 20:59:37,053][00031] Fps is (10 sec: 9829.9, 60 sec: 9966.9, 300 sec: 8021.3). Total num frames: 1925120. Throughput: 0: 2514.2. Samples: 482322. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 20:59:37,055][00031] Avg episode reward: [(0, '6.285')] +[2025-02-27 20:59:40,378][00216] Updated weights for policy 0, policy_version 240 (0.0016) +[2025-02-27 20:59:42,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10103.4, 300 sec: 8091.7). Total num frames: 1982464. Throughput: 0: 2525.5. Samples: 497928. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 20:59:42,055][00031] Avg episode reward: [(0, '7.285')] +[2025-02-27 20:59:42,059][00196] Saving new best policy, reward=7.285! +[2025-02-27 20:59:47,053][00031] Fps is (10 sec: 9830.9, 60 sec: 9966.9, 300 sec: 8093.7). Total num frames: 2023424. Throughput: 0: 2520.1. Samples: 505404. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 20:59:47,055][00031] Avg episode reward: [(0, '7.573')] +[2025-02-27 20:59:47,097][00196] Saving new best policy, reward=7.573! +[2025-02-27 20:59:48,371][00216] Updated weights for policy 0, policy_version 250 (0.0016) +[2025-02-27 20:59:52,053][00031] Fps is (10 sec: 9830.6, 60 sec: 10104.1, 300 sec: 8159.9). Total num frames: 2080768. Throughput: 0: 2524.4. Samples: 521022. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 20:59:52,056][00031] Avg episode reward: [(0, '7.968')] +[2025-02-27 20:59:52,059][00196] Saving new best policy, reward=7.968! +[2025-02-27 20:59:56,989][00216] Updated weights for policy 0, policy_version 260 (0.0017) +[2025-02-27 20:59:57,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 8192.0). Total num frames: 2129920. Throughput: 0: 2498.4. Samples: 535218. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 20:59:57,055][00031] Avg episode reward: [(0, '6.951')] +[2025-02-27 21:00:02,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 8222.9). Total num frames: 2179072. Throughput: 0: 2495.4. Samples: 542574. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:00:02,055][00031] Avg episode reward: [(0, '7.770')] +[2025-02-27 21:00:05,194][00216] Updated weights for policy 0, policy_version 270 (0.0017) +[2025-02-27 21:00:07,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.5, 300 sec: 8252.7). Total num frames: 2228224. Throughput: 0: 2488.5. Samples: 557682. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:00:07,058][00031] Avg episode reward: [(0, '7.871')] +[2025-02-27 21:00:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 8281.4). Total num frames: 2277376. Throughput: 0: 2523.6. Samples: 572916. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:00:12,055][00031] Avg episode reward: [(0, '7.805')] +[2025-02-27 21:00:13,154][00216] Updated weights for policy 0, policy_version 280 (0.0017) +[2025-02-27 21:00:17,053][00031] Fps is (10 sec: 9830.3, 60 sec: 9967.2, 300 sec: 8309.0). Total num frames: 2326528. Throughput: 0: 2520.2. Samples: 580404. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:00:17,056][00031] Avg episode reward: [(0, '8.371')] +[2025-02-27 21:00:17,066][00196] Saving new best policy, reward=8.371! +[2025-02-27 21:00:21,141][00216] Updated weights for policy 0, policy_version 290 (0.0016) +[2025-02-27 21:00:22,053][00031] Fps is (10 sec: 10649.2, 60 sec: 10103.4, 300 sec: 8364.5). Total num frames: 2383872. Throughput: 0: 2525.1. Samples: 595950. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:00:22,056][00031] Avg episode reward: [(0, '8.579')] +[2025-02-27 21:00:22,060][00196] Saving new best policy, reward=8.579! +[2025-02-27 21:00:27,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10103.5, 300 sec: 8389.7). Total num frames: 2433024. Throughput: 0: 2517.2. Samples: 611202. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:00:27,055][00031] Avg episode reward: [(0, '7.757')] +[2025-02-27 21:00:30,079][00216] Updated weights for policy 0, policy_version 300 (0.0016) +[2025-02-27 21:00:32,054][00031] Fps is (10 sec: 9010.9, 60 sec: 9966.9, 300 sec: 8386.4). Total num frames: 2473984. Throughput: 0: 2496.6. Samples: 617754. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:00:32,056][00031] Avg episode reward: [(0, '9.375')] +[2025-02-27 21:00:32,059][00196] Saving new best policy, reward=9.375! +[2025-02-27 21:00:37,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10103.5, 300 sec: 8580.8). Total num frames: 2531328. Throughput: 0: 2486.0. Samples: 632892. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:00:37,056][00031] Avg episode reward: [(0, '9.001')] +[2025-02-27 21:00:37,691][00216] Updated weights for policy 0, policy_version 310 (0.0016) +[2025-02-27 21:00:42,053][00031] Fps is (10 sec: 10650.3, 60 sec: 9967.0, 300 sec: 8747.4). Total num frames: 2580480. Throughput: 0: 2517.2. Samples: 648492. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:00:42,055][00031] Avg episode reward: [(0, '9.145')] +[2025-02-27 21:00:45,848][00216] Updated weights for policy 0, policy_version 320 (0.0016) +[2025-02-27 21:00:47,054][00031] Fps is (10 sec: 9830.1, 60 sec: 10103.4, 300 sec: 8914.0). Total num frames: 2629632. Throughput: 0: 2520.4. Samples: 655992. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:00:47,055][00031] Avg episode reward: [(0, '8.989')] +[2025-02-27 21:00:52,053][00031] Fps is (10 sec: 9830.3, 60 sec: 9966.9, 300 sec: 9080.6). Total num frames: 2678784. Throughput: 0: 2528.7. Samples: 671472. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:00:52,054][00031] Avg episode reward: [(0, '9.938')] +[2025-02-27 21:00:52,058][00196] Saving new best policy, reward=9.938! +[2025-02-27 21:00:53,620][00216] Updated weights for policy 0, policy_version 330 (0.0016) +[2025-02-27 21:00:57,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10103.4, 300 sec: 9275.0). Total num frames: 2736128. Throughput: 0: 2530.9. Samples: 686808. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:00:57,056][00031] Avg episode reward: [(0, '8.637')] +[2025-02-27 21:01:01,794][00216] Updated weights for policy 0, policy_version 340 (0.0016) +[2025-02-27 21:01:02,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10103.5, 300 sec: 9441.7). Total num frames: 2785280. Throughput: 0: 2537.9. Samples: 694608. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:01:02,055][00031] Avg episode reward: [(0, '9.586')] +[2025-02-27 21:01:07,053][00031] Fps is (10 sec: 9011.7, 60 sec: 9966.9, 300 sec: 9580.5). Total num frames: 2826240. Throughput: 0: 2500.3. Samples: 708462. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:01:07,054][00031] Avg episode reward: [(0, '10.333')] +[2025-02-27 21:01:07,063][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000345_2826240.pth... +[2025-02-27 21:01:07,197][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000052_425984.pth +[2025-02-27 21:01:07,213][00196] Saving new best policy, reward=10.333! +[2025-02-27 21:01:10,712][00216] Updated weights for policy 0, policy_version 350 (0.0016) +[2025-02-27 21:01:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 9774.9). Total num frames: 2883584. Throughput: 0: 2503.3. Samples: 723852. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:01:12,055][00031] Avg episode reward: [(0, '9.726')] +[2025-02-27 21:01:17,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 9830.4). Total num frames: 2932736. Throughput: 0: 2524.2. Samples: 731340. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:01:17,055][00031] Avg episode reward: [(0, '10.423')] +[2025-02-27 21:01:17,062][00196] Saving new best policy, reward=10.423! +[2025-02-27 21:01:18,310][00216] Updated weights for policy 0, policy_version 360 (0.0016) +[2025-02-27 21:01:22,053][00031] Fps is (10 sec: 9830.3, 60 sec: 9967.0, 300 sec: 9913.7). Total num frames: 2981888. Throughput: 0: 2530.3. Samples: 746754. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) +[2025-02-27 21:01:22,054][00031] Avg episode reward: [(0, '10.990')] +[2025-02-27 21:01:22,058][00196] Saving new best policy, reward=10.990! +[2025-02-27 21:01:26,479][00216] Updated weights for policy 0, policy_version 370 (0.0019) +[2025-02-27 21:01:27,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9966.9, 300 sec: 9941.5). Total num frames: 3031040. Throughput: 0: 2503.9. Samples: 761166. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:01:27,054][00031] Avg episode reward: [(0, '10.853')] +[2025-02-27 21:01:32,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.6, 300 sec: 9997.0). Total num frames: 3080192. Throughput: 0: 2502.6. Samples: 768606. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:01:32,055][00031] Avg episode reward: [(0, '11.454')] +[2025-02-27 21:01:32,058][00196] Saving new best policy, reward=11.454! +[2025-02-27 21:01:36,099][00216] Updated weights for policy 0, policy_version 380 (0.0017) +[2025-02-27 21:01:37,053][00031] Fps is (10 sec: 9011.2, 60 sec: 9830.5, 300 sec: 9969.2). Total num frames: 3121152. Throughput: 0: 2457.2. Samples: 782046. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:01:37,057][00031] Avg episode reward: [(0, '12.694')] +[2025-02-27 21:01:37,069][00196] Saving new best policy, reward=12.694! +[2025-02-27 21:01:42,053][00031] Fps is (10 sec: 9011.2, 60 sec: 9830.4, 300 sec: 9969.3). Total num frames: 3170304. Throughput: 0: 2450.4. Samples: 797076. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:01:42,057][00031] Avg episode reward: [(0, '12.083')] +[2025-02-27 21:01:43,369][00216] Updated weights for policy 0, policy_version 390 (0.0017) +[2025-02-27 21:01:47,053][00031] Fps is (10 sec: 10649.6, 60 sec: 9967.0, 300 sec: 10024.8). Total num frames: 3227648. Throughput: 0: 2447.9. Samples: 804762. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:01:47,056][00031] Avg episode reward: [(0, '12.834')] +[2025-02-27 21:01:47,066][00196] Saving new best policy, reward=12.834! +[2025-02-27 21:01:51,524][00216] Updated weights for policy 0, policy_version 400 (0.0016) +[2025-02-27 21:01:52,053][00031] Fps is (10 sec: 10649.5, 60 sec: 9966.9, 300 sec: 9997.0). Total num frames: 3276800. Throughput: 0: 2488.8. Samples: 820458. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:01:52,055][00031] Avg episode reward: [(0, '12.093')] +[2025-02-27 21:01:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9830.5, 300 sec: 10024.8). Total num frames: 3325952. Throughput: 0: 2490.3. Samples: 835914. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:01:57,056][00031] Avg episode reward: [(0, '12.107')] +[2025-02-27 21:01:59,395][00216] Updated weights for policy 0, policy_version 410 (0.0019) +[2025-02-27 21:02:02,053][00031] Fps is (10 sec: 10649.6, 60 sec: 9966.9, 300 sec: 10024.8). Total num frames: 3383296. Throughput: 0: 2498.5. Samples: 843774. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:02:02,056][00031] Avg episode reward: [(0, '12.954')] +[2025-02-27 21:02:02,058][00196] Saving new best policy, reward=12.954! +[2025-02-27 21:02:07,055][00031] Fps is (10 sec: 10647.1, 60 sec: 10103.1, 300 sec: 10052.5). Total num frames: 3432448. Throughput: 0: 2500.9. Samples: 859302. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:02:07,058][00031] Avg episode reward: [(0, '12.909')] +[2025-02-27 21:02:07,479][00216] Updated weights for policy 0, policy_version 420 (0.0018) +[2025-02-27 21:02:12,053][00031] Fps is (10 sec: 9830.0, 60 sec: 9966.8, 300 sec: 10052.6). Total num frames: 3481600. Throughput: 0: 2497.6. Samples: 873558. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:02:12,055][00031] Avg episode reward: [(0, '12.894')] +[2025-02-27 21:02:16,006][00216] Updated weights for policy 0, policy_version 430 (0.0019) +[2025-02-27 21:02:17,054][00031] Fps is (10 sec: 9831.6, 60 sec: 9966.7, 300 sec: 10052.5). Total num frames: 3530752. Throughput: 0: 2497.7. Samples: 881004. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:02:17,056][00031] Avg episode reward: [(0, '14.298')] +[2025-02-27 21:02:17,064][00196] Saving new best policy, reward=14.298! +[2025-02-27 21:02:22,053][00031] Fps is (10 sec: 9830.9, 60 sec: 9966.9, 300 sec: 10024.8). Total num frames: 3579904. Throughput: 0: 2545.5. Samples: 896592. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:02:22,054][00031] Avg episode reward: [(0, '13.585')] +[2025-02-27 21:02:23,600][00216] Updated weights for policy 0, policy_version 440 (0.0016) +[2025-02-27 21:02:27,053][00031] Fps is (10 sec: 10650.9, 60 sec: 10103.5, 300 sec: 10052.6). Total num frames: 3637248. Throughput: 0: 2556.1. Samples: 912102. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:02:27,054][00031] Avg episode reward: [(0, '13.482')] +[2025-02-27 21:02:31,769][00216] Updated weights for policy 0, policy_version 450 (0.0017) +[2025-02-27 21:02:32,053][00031] Fps is (10 sec: 10649.4, 60 sec: 10103.5, 300 sec: 10052.6). Total num frames: 3686400. Throughput: 0: 2559.9. Samples: 919956. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:02:32,054][00031] Avg episode reward: [(0, '15.748')] +[2025-02-27 21:02:32,057][00196] Saving new best policy, reward=15.748! +[2025-02-27 21:02:37,053][00031] Fps is (10 sec: 9829.9, 60 sec: 10239.9, 300 sec: 10024.8). Total num frames: 3735552. Throughput: 0: 2552.4. Samples: 935316. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:02:37,055][00031] Avg episode reward: [(0, '16.655')] +[2025-02-27 21:02:37,069][00196] Saving new best policy, reward=16.655! +[2025-02-27 21:02:39,846][00216] Updated weights for policy 0, policy_version 460 (0.0022) +[2025-02-27 21:02:42,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.0, 300 sec: 10052.6). Total num frames: 3784704. Throughput: 0: 2533.5. Samples: 949920. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:02:42,054][00031] Avg episode reward: [(0, '15.488')] +[2025-02-27 21:02:47,053][00031] Fps is (10 sec: 9830.7, 60 sec: 10103.4, 300 sec: 10052.6). Total num frames: 3833856. Throughput: 0: 2518.1. Samples: 957090. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:02:47,054][00031] Avg episode reward: [(0, '15.498')] +[2025-02-27 21:02:47,865][00216] Updated weights for policy 0, policy_version 470 (0.0016) +[2025-02-27 21:02:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10080.3). Total num frames: 3891200. Throughput: 0: 2522.3. Samples: 972798. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:02:52,055][00031] Avg episode reward: [(0, '16.224')] +[2025-02-27 21:02:56,031][00216] Updated weights for policy 0, policy_version 480 (0.0015) +[2025-02-27 21:02:57,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10240.0, 300 sec: 10052.6). Total num frames: 3940352. Throughput: 0: 2549.0. Samples: 988260. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:02:57,056][00031] Avg episode reward: [(0, '17.028')] +[2025-02-27 21:02:57,064][00196] Saving new best policy, reward=17.028! +[2025-02-27 21:03:02,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.5, 300 sec: 10052.6). Total num frames: 3989504. Throughput: 0: 2552.6. Samples: 995868. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:03:02,057][00031] Avg episode reward: [(0, '16.315')] +[2025-02-27 21:03:03,973][00216] Updated weights for policy 0, policy_version 490 (0.0019) +[2025-02-27 21:03:07,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.9, 300 sec: 10052.6). Total num frames: 4038656. Throughput: 0: 2547.6. Samples: 1011234. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:03:07,055][00031] Avg episode reward: [(0, '17.911')] +[2025-02-27 21:03:07,069][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_4038656.pth... +[2025-02-27 21:03:07,183][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000198_1622016.pth +[2025-02-27 21:03:07,200][00196] Saving new best policy, reward=17.911! +[2025-02-27 21:03:11,930][00216] Updated weights for policy 0, policy_version 500 (0.0016) +[2025-02-27 21:03:12,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.1, 300 sec: 10080.3). Total num frames: 4096000. Throughput: 0: 2548.5. Samples: 1026786. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:03:12,056][00031] Avg episode reward: [(0, '16.816')] +[2025-02-27 21:03:17,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.7, 300 sec: 10052.6). Total num frames: 4136960. Throughput: 0: 2522.1. Samples: 1033452. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:03:17,056][00031] Avg episode reward: [(0, '18.081')] +[2025-02-27 21:03:17,063][00196] Saving new best policy, reward=18.081! +[2025-02-27 21:03:20,330][00216] Updated weights for policy 0, policy_version 510 (0.0037) +[2025-02-27 21:03:22,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10080.3). Total num frames: 4194304. Throughput: 0: 2513.0. Samples: 1048398. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:03:22,054][00031] Avg episode reward: [(0, '18.049')] +[2025-02-27 21:03:27,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 4243456. Throughput: 0: 2532.7. Samples: 1063890. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:03:27,055][00031] Avg episode reward: [(0, '16.730')] +[2025-02-27 21:03:28,541][00216] Updated weights for policy 0, policy_version 520 (0.0021) +[2025-02-27 21:03:32,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10052.6). Total num frames: 4292608. Throughput: 0: 2547.2. Samples: 1071714. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:03:32,055][00031] Avg episode reward: [(0, '16.469')] +[2025-02-27 21:03:36,327][00216] Updated weights for policy 0, policy_version 530 (0.0017) +[2025-02-27 21:03:37,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10052.6). Total num frames: 4341760. Throughput: 0: 2541.9. Samples: 1087182. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:03:37,055][00031] Avg episode reward: [(0, '15.963')] +[2025-02-27 21:03:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10080.3). Total num frames: 4399104. Throughput: 0: 2540.8. Samples: 1102596. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:03:42,055][00031] Avg episode reward: [(0, '16.213')] +[2025-02-27 21:03:44,504][00216] Updated weights for policy 0, policy_version 540 (0.0020) +[2025-02-27 21:03:47,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10080.5). Total num frames: 4448256. Throughput: 0: 2539.5. Samples: 1110144. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:03:47,055][00031] Avg episode reward: [(0, '17.244')] +[2025-02-27 21:03:52,053][00031] Fps is (10 sec: 9011.1, 60 sec: 9966.9, 300 sec: 10052.6). Total num frames: 4489216. Throughput: 0: 2508.8. Samples: 1124130. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:03:52,055][00031] Avg episode reward: [(0, '16.841')] +[2025-02-27 21:03:52,844][00216] Updated weights for policy 0, policy_version 550 (0.0024) +[2025-02-27 21:03:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 4546560. Throughput: 0: 2505.7. Samples: 1139544. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:03:57,055][00031] Avg episode reward: [(0, '16.777')] +[2025-02-27 21:04:00,793][00216] Updated weights for policy 0, policy_version 560 (0.0023) +[2025-02-27 21:04:02,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 4595712. Throughput: 0: 2528.7. Samples: 1147242. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:04:02,055][00031] Avg episode reward: [(0, '18.425')] +[2025-02-27 21:04:02,059][00196] Saving new best policy, reward=18.425! +[2025-02-27 21:04:07,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 4644864. Throughput: 0: 2538.8. Samples: 1162644. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:04:07,056][00031] Avg episode reward: [(0, '19.079')] +[2025-02-27 21:04:07,101][00196] Saving new best policy, reward=19.079! +[2025-02-27 21:04:08,906][00216] Updated weights for policy 0, policy_version 570 (0.0018) +[2025-02-27 21:04:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9966.9, 300 sec: 10052.6). Total num frames: 4694016. Throughput: 0: 2537.7. Samples: 1178088. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:04:12,055][00031] Avg episode reward: [(0, '20.069')] +[2025-02-27 21:04:12,058][00196] Saving new best policy, reward=20.069! +[2025-02-27 21:04:17,028][00216] Updated weights for policy 0, policy_version 580 (0.0016) +[2025-02-27 21:04:17,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10080.3). Total num frames: 4751360. Throughput: 0: 2529.6. Samples: 1185546. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:04:17,057][00031] Avg episode reward: [(0, '18.647')] +[2025-02-27 21:04:22,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 4800512. Throughput: 0: 2513.5. Samples: 1200288. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:04:22,054][00031] Avg episode reward: [(0, '16.620')] +[2025-02-27 21:04:25,530][00216] Updated weights for policy 0, policy_version 590 (0.0016) +[2025-02-27 21:04:27,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 4849664. Throughput: 0: 2501.7. Samples: 1215174. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:04:27,054][00031] Avg episode reward: [(0, '16.683')] +[2025-02-27 21:04:32,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 4898816. Throughput: 0: 2509.3. Samples: 1223064. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:04:32,055][00031] Avg episode reward: [(0, '17.913')] +[2025-02-27 21:04:33,281][00216] Updated weights for policy 0, policy_version 600 (0.0017) +[2025-02-27 21:04:37,054][00031] Fps is (10 sec: 9829.3, 60 sec: 10103.3, 300 sec: 10052.5). Total num frames: 4947968. Throughput: 0: 2539.4. Samples: 1238406. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:04:37,058][00031] Avg episode reward: [(0, '19.522')] +[2025-02-27 21:04:41,469][00216] Updated weights for policy 0, policy_version 610 (0.0016) +[2025-02-27 21:04:42,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9966.9, 300 sec: 10080.3). Total num frames: 4997120. Throughput: 0: 2541.9. Samples: 1253928. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:04:42,055][00031] Avg episode reward: [(0, '19.820')] +[2025-02-27 21:04:47,053][00031] Fps is (10 sec: 10650.8, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 5054464. Throughput: 0: 2537.9. Samples: 1261446. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:04:47,056][00031] Avg episode reward: [(0, '17.844')] +[2025-02-27 21:04:49,130][00216] Updated weights for policy 0, policy_version 620 (0.0016) +[2025-02-27 21:04:52,053][00031] Fps is (10 sec: 10649.0, 60 sec: 10239.9, 300 sec: 10080.3). Total num frames: 5103616. Throughput: 0: 2539.4. Samples: 1276920. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:04:52,056][00031] Avg episode reward: [(0, '19.019')] +[2025-02-27 21:04:57,053][00031] Fps is (10 sec: 9011.0, 60 sec: 9966.9, 300 sec: 10052.5). Total num frames: 5144576. Throughput: 0: 2505.3. Samples: 1290828. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:04:57,054][00031] Avg episode reward: [(0, '20.295')] +[2025-02-27 21:04:57,120][00196] Saving new best policy, reward=20.295! +[2025-02-27 21:04:57,770][00216] Updated weights for policy 0, policy_version 630 (0.0017) +[2025-02-27 21:05:02,053][00031] Fps is (10 sec: 9830.9, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 5201920. Throughput: 0: 2513.6. Samples: 1298658. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:05:02,054][00031] Avg episode reward: [(0, '20.842')] +[2025-02-27 21:05:02,058][00196] Saving new best policy, reward=20.842! +[2025-02-27 21:05:06,179][00216] Updated weights for policy 0, policy_version 640 (0.0017) +[2025-02-27 21:05:07,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 5251072. Throughput: 0: 2525.2. Samples: 1313922. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:05:07,056][00031] Avg episode reward: [(0, '21.120')] +[2025-02-27 21:05:07,064][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_5251072.pth... +[2025-02-27 21:05:07,181][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000345_2826240.pth +[2025-02-27 21:05:07,194][00196] Saving new best policy, reward=21.120! +[2025-02-27 21:05:12,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.4, 300 sec: 10080.3). Total num frames: 5300224. Throughput: 0: 2539.5. Samples: 1329450. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:05:12,055][00031] Avg episode reward: [(0, '21.059')] +[2025-02-27 21:05:13,750][00216] Updated weights for policy 0, policy_version 650 (0.0015) +[2025-02-27 21:05:17,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 5357568. Throughput: 0: 2530.4. Samples: 1336932. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:05:17,054][00031] Avg episode reward: [(0, '20.703')] +[2025-02-27 21:05:21,615][00216] Updated weights for policy 0, policy_version 660 (0.0016) +[2025-02-27 21:05:22,055][00031] Fps is (10 sec: 10647.7, 60 sec: 10103.2, 300 sec: 10080.3). Total num frames: 5406720. Throughput: 0: 2537.2. Samples: 1352580. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) +[2025-02-27 21:05:22,056][00031] Avg episode reward: [(0, '21.326')] +[2025-02-27 21:05:22,058][00196] Saving new best policy, reward=21.326! +[2025-02-27 21:05:27,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10103.4, 300 sec: 10108.1). Total num frames: 5455872. Throughput: 0: 2515.6. Samples: 1367130. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:05:27,055][00031] Avg episode reward: [(0, '19.442')] +[2025-02-27 21:05:30,542][00216] Updated weights for policy 0, policy_version 670 (0.0021) +[2025-02-27 21:05:32,053][00031] Fps is (10 sec: 9832.3, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 5505024. Throughput: 0: 2509.9. Samples: 1374390. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:05:32,056][00031] Avg episode reward: [(0, '19.251')] +[2025-02-27 21:05:37,054][00031] Fps is (10 sec: 10649.1, 60 sec: 10240.0, 300 sec: 10108.1). Total num frames: 5562368. Throughput: 0: 2512.4. Samples: 1389978. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:05:37,057][00031] Avg episode reward: [(0, '20.920')] +[2025-02-27 21:05:37,626][00216] Updated weights for policy 0, policy_version 680 (0.0018) +[2025-02-27 21:05:42,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10080.3). Total num frames: 5603328. Throughput: 0: 2552.5. Samples: 1405692. Policy #0 lag: (min: 0.0, avg: 1.5, max: 5.0) +[2025-02-27 21:05:42,056][00031] Avg episode reward: [(0, '20.402')] +[2025-02-27 21:05:46,018][00216] Updated weights for policy 0, policy_version 690 (0.0016) +[2025-02-27 21:05:47,053][00031] Fps is (10 sec: 9831.2, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 5660672. Throughput: 0: 2547.9. Samples: 1413312. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:05:47,057][00031] Avg episode reward: [(0, '17.324')] +[2025-02-27 21:05:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.6, 300 sec: 10080.3). Total num frames: 5709824. Throughput: 0: 2552.1. Samples: 1428768. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:05:52,058][00031] Avg episode reward: [(0, '18.385')] +[2025-02-27 21:05:54,107][00216] Updated weights for policy 0, policy_version 700 (0.0016) +[2025-02-27 21:05:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10080.3). Total num frames: 5758976. Throughput: 0: 2549.7. Samples: 1444188. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:05:57,055][00031] Avg episode reward: [(0, '21.264')] +[2025-02-27 21:06:02,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.4, 300 sec: 10108.1). Total num frames: 5808128. Throughput: 0: 2538.3. Samples: 1451154. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:06:02,055][00031] Avg episode reward: [(0, '23.041')] +[2025-02-27 21:06:02,057][00196] Saving new best policy, reward=23.041! +[2025-02-27 21:06:02,551][00216] Updated weights for policy 0, policy_version 710 (0.0017) +[2025-02-27 21:06:07,054][00031] Fps is (10 sec: 9829.1, 60 sec: 10103.3, 300 sec: 10080.3). Total num frames: 5857280. Throughput: 0: 2516.6. Samples: 1465824. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:06:07,056][00031] Avg episode reward: [(0, '21.629')] +[2025-02-27 21:06:10,150][00216] Updated weights for policy 0, policy_version 720 (0.0016) +[2025-02-27 21:06:12,053][00031] Fps is (10 sec: 10649.4, 60 sec: 10240.0, 300 sec: 10108.1). Total num frames: 5914624. Throughput: 0: 2541.1. Samples: 1481478. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:06:12,056][00031] Avg episode reward: [(0, '22.736')] +[2025-02-27 21:06:17,053][00031] Fps is (10 sec: 10651.0, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 5963776. Throughput: 0: 2550.3. Samples: 1489152. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:06:17,055][00031] Avg episode reward: [(0, '19.948')] +[2025-02-27 21:06:18,367][00216] Updated weights for policy 0, policy_version 730 (0.0016) +[2025-02-27 21:06:22,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10103.7, 300 sec: 10108.1). Total num frames: 6012928. Throughput: 0: 2552.3. Samples: 1504830. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:06:22,055][00031] Avg episode reward: [(0, '16.834')] +[2025-02-27 21:06:25,875][00216] Updated weights for policy 0, policy_version 740 (0.0016) +[2025-02-27 21:06:27,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10240.1, 300 sec: 10135.9). Total num frames: 6070272. Throughput: 0: 2550.3. Samples: 1520454. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:06:27,056][00031] Avg episode reward: [(0, '16.129')] +[2025-02-27 21:06:32,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 6119424. Throughput: 0: 2556.3. Samples: 1528344. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:06:32,055][00031] Avg episode reward: [(0, '19.288')] +[2025-02-27 21:06:34,797][00216] Updated weights for policy 0, policy_version 750 (0.0017) +[2025-02-27 21:06:37,054][00031] Fps is (10 sec: 9010.2, 60 sec: 9966.9, 300 sec: 10135.8). Total num frames: 6160384. Throughput: 0: 2523.8. Samples: 1542342. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:06:37,055][00031] Avg episode reward: [(0, '22.582')] +[2025-02-27 21:06:42,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 6217728. Throughput: 0: 2531.7. Samples: 1558116. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:06:42,058][00031] Avg episode reward: [(0, '22.190')] +[2025-02-27 21:06:42,649][00216] Updated weights for policy 0, policy_version 760 (0.0016) +[2025-02-27 21:06:47,053][00031] Fps is (10 sec: 11470.1, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 6275072. Throughput: 0: 2546.3. Samples: 1565736. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:06:47,055][00031] Avg episode reward: [(0, '19.273')] +[2025-02-27 21:06:50,799][00216] Updated weights for policy 0, policy_version 770 (0.0016) +[2025-02-27 21:06:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 6324224. Throughput: 0: 2568.2. Samples: 1581390. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:06:52,056][00031] Avg episode reward: [(0, '18.694')] +[2025-02-27 21:06:57,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 6373376. Throughput: 0: 2562.0. Samples: 1596768. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:06:57,054][00031] Avg episode reward: [(0, '22.081')] +[2025-02-27 21:06:57,950][00216] Updated weights for policy 0, policy_version 780 (0.0019) +[2025-02-27 21:07:02,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 6422528. Throughput: 0: 2566.8. Samples: 1604658. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:07:02,056][00031] Avg episode reward: [(0, '23.564')] +[2025-02-27 21:07:02,061][00196] Saving new best policy, reward=23.564! +[2025-02-27 21:07:06,796][00216] Updated weights for policy 0, policy_version 790 (0.0016) +[2025-02-27 21:07:07,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.2, 300 sec: 10135.9). Total num frames: 6471680. Throughput: 0: 2542.7. Samples: 1619250. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:07:07,059][00031] Avg episode reward: [(0, '21.389')] +[2025-02-27 21:07:07,069][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_6471680.pth... +[2025-02-27 21:07:07,196][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_4038656.pth +[2025-02-27 21:07:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 6520832. Throughput: 0: 2529.7. Samples: 1634292. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:07:12,054][00031] Avg episode reward: [(0, '20.219')] +[2025-02-27 21:07:14,848][00216] Updated weights for policy 0, policy_version 800 (0.0016) +[2025-02-27 21:07:17,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 6578176. Throughput: 0: 2524.5. Samples: 1641948. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:07:17,056][00031] Avg episode reward: [(0, '19.695')] +[2025-02-27 21:07:22,054][00031] Fps is (10 sec: 10648.3, 60 sec: 10239.9, 300 sec: 10135.8). Total num frames: 6627328. Throughput: 0: 2565.9. Samples: 1657806. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:07:22,056][00031] Avg episode reward: [(0, '21.060')] +[2025-02-27 21:07:22,474][00216] Updated weights for policy 0, policy_version 810 (0.0018) +[2025-02-27 21:07:27,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 6676480. Throughput: 0: 2560.5. Samples: 1673340. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:07:27,055][00031] Avg episode reward: [(0, '22.339')] +[2025-02-27 21:07:30,487][00216] Updated weights for policy 0, policy_version 820 (0.0016) +[2025-02-27 21:07:32,053][00031] Fps is (10 sec: 10650.8, 60 sec: 10240.0, 300 sec: 10163.7). Total num frames: 6733824. Throughput: 0: 2565.9. Samples: 1681200. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:07:32,055][00031] Avg episode reward: [(0, '22.475')] +[2025-02-27 21:07:37,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.7, 300 sec: 10163.6). Total num frames: 6782976. Throughput: 0: 2565.1. Samples: 1696818. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) +[2025-02-27 21:07:37,055][00031] Avg episode reward: [(0, '23.457')] +[2025-02-27 21:07:38,376][00216] Updated weights for policy 0, policy_version 830 (0.0015) +[2025-02-27 21:07:42,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 6832128. Throughput: 0: 2542.0. Samples: 1711158. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:07:42,057][00031] Avg episode reward: [(0, '23.610')] +[2025-02-27 21:07:42,058][00196] Saving new best policy, reward=23.610! +[2025-02-27 21:07:46,801][00216] Updated weights for policy 0, policy_version 840 (0.0015) +[2025-02-27 21:07:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 6881280. Throughput: 0: 2535.7. Samples: 1718766. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:07:47,054][00031] Avg episode reward: [(0, '22.984')] +[2025-02-27 21:07:52,053][00031] Fps is (10 sec: 9830.1, 60 sec: 10103.4, 300 sec: 10135.9). Total num frames: 6930432. Throughput: 0: 2561.7. Samples: 1734528. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:07:52,056][00031] Avg episode reward: [(0, '22.400')] +[2025-02-27 21:07:54,342][00216] Updated weights for policy 0, policy_version 850 (0.0020) +[2025-02-27 21:07:57,054][00031] Fps is (10 sec: 10648.1, 60 sec: 10239.8, 300 sec: 10163.6). Total num frames: 6987776. Throughput: 0: 2574.9. Samples: 1750164. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:07:57,056][00031] Avg episode reward: [(0, '21.652')] +[2025-02-27 21:08:02,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 7036928. Throughput: 0: 2580.9. Samples: 1758090. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:08:02,055][00031] Avg episode reward: [(0, '21.488')] +[2025-02-27 21:08:02,350][00216] Updated weights for policy 0, policy_version 860 (0.0016) +[2025-02-27 21:08:07,053][00031] Fps is (10 sec: 9831.6, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 7086080. Throughput: 0: 2576.2. Samples: 1773732. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:08:07,057][00031] Avg episode reward: [(0, '20.541')] +[2025-02-27 21:08:09,902][00216] Updated weights for policy 0, policy_version 870 (0.0016) +[2025-02-27 21:08:12,054][00031] Fps is (10 sec: 10648.3, 60 sec: 10376.3, 300 sec: 10191.4). Total num frames: 7143424. Throughput: 0: 2568.1. Samples: 1788906. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:08:12,056][00031] Avg episode reward: [(0, '20.566')] +[2025-02-27 21:08:17,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 7192576. Throughput: 0: 2550.1. Samples: 1795956. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:08:17,054][00031] Avg episode reward: [(0, '22.570')] +[2025-02-27 21:08:18,346][00216] Updated weights for policy 0, policy_version 880 (0.0016) +[2025-02-27 21:08:22,053][00031] Fps is (10 sec: 10650.9, 60 sec: 10376.7, 300 sec: 10191.4). Total num frames: 7249920. Throughput: 0: 2554.3. Samples: 1811760. Policy #0 lag: (min: 0.0, avg: 1.5, max: 5.0) +[2025-02-27 21:08:22,055][00031] Avg episode reward: [(0, '23.759')] +[2025-02-27 21:08:22,057][00196] Saving new best policy, reward=23.759! +[2025-02-27 21:08:26,336][00216] Updated weights for policy 0, policy_version 890 (0.0016) +[2025-02-27 21:08:27,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 7290880. Throughput: 0: 2581.7. Samples: 1827336. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:08:27,055][00031] Avg episode reward: [(0, '23.531')] +[2025-02-27 21:08:32,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10239.9, 300 sec: 10191.4). Total num frames: 7348224. Throughput: 0: 2590.6. Samples: 1835346. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:08:32,055][00031] Avg episode reward: [(0, '22.203')] +[2025-02-27 21:08:33,749][00216] Updated weights for policy 0, policy_version 900 (0.0016) +[2025-02-27 21:08:37,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10376.5, 300 sec: 10191.4). Total num frames: 7405568. Throughput: 0: 2589.5. Samples: 1851054. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:08:37,054][00031] Avg episode reward: [(0, '22.756')] +[2025-02-27 21:08:41,873][00216] Updated weights for policy 0, policy_version 910 (0.0016) +[2025-02-27 21:08:42,053][00031] Fps is (10 sec: 10650.1, 60 sec: 10376.5, 300 sec: 10191.4). Total num frames: 7454720. Throughput: 0: 2588.2. Samples: 1866630. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:08:42,055][00031] Avg episode reward: [(0, '22.102')] +[2025-02-27 21:08:47,053][00031] Fps is (10 sec: 9011.3, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 7495680. Throughput: 0: 2563.1. Samples: 1873428. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:08:47,055][00031] Avg episode reward: [(0, '22.515')] +[2025-02-27 21:08:50,449][00216] Updated weights for policy 0, policy_version 920 (0.0016) +[2025-02-27 21:08:52,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.6, 300 sec: 10191.4). Total num frames: 7553024. Throughput: 0: 2542.7. Samples: 1888152. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:08:52,054][00031] Avg episode reward: [(0, '21.892')] +[2025-02-27 21:08:57,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10240.2, 300 sec: 10191.4). Total num frames: 7602176. Throughput: 0: 2548.9. Samples: 1903602. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:08:57,055][00031] Avg episode reward: [(0, '23.155')] +[2025-02-27 21:08:58,526][00216] Updated weights for policy 0, policy_version 930 (0.0020) +[2025-02-27 21:09:02,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 7651328. Throughput: 0: 2567.9. Samples: 1911510. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:09:02,054][00031] Avg episode reward: [(0, '21.920')] +[2025-02-27 21:09:06,537][00216] Updated weights for policy 0, policy_version 940 (0.0015) +[2025-02-27 21:09:07,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 7700480. Throughput: 0: 2562.0. Samples: 1927050. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:09:07,055][00031] Avg episode reward: [(0, '21.053')] +[2025-02-27 21:09:07,064][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000940_7700480.pth... +[2025-02-27 21:09:07,237][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_5251072.pth +[2025-02-27 21:09:12,053][00031] Fps is (10 sec: 10649.3, 60 sec: 10240.2, 300 sec: 10191.4). Total num frames: 7757824. Throughput: 0: 2562.9. Samples: 1942668. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:09:12,057][00031] Avg episode reward: [(0, '22.921')] +[2025-02-27 21:09:14,187][00216] Updated weights for policy 0, policy_version 950 (0.0016) +[2025-02-27 21:09:17,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 7806976. Throughput: 0: 2552.4. Samples: 1950204. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:09:17,055][00031] Avg episode reward: [(0, '24.306')] +[2025-02-27 21:09:17,071][00196] Saving new best policy, reward=24.306! +[2025-02-27 21:09:22,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.4, 300 sec: 10191.4). Total num frames: 7856128. Throughput: 0: 2518.7. Samples: 1964394. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:09:22,057][00031] Avg episode reward: [(0, '24.854')] +[2025-02-27 21:09:22,059][00196] Saving new best policy, reward=24.854! +[2025-02-27 21:09:22,645][00216] Updated weights for policy 0, policy_version 960 (0.0016) +[2025-02-27 21:09:27,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10239.9, 300 sec: 10191.4). Total num frames: 7905280. Throughput: 0: 2514.6. Samples: 1979790. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:09:27,057][00031] Avg episode reward: [(0, '24.394')] +[2025-02-27 21:09:30,652][00216] Updated weights for policy 0, policy_version 970 (0.0015) +[2025-02-27 21:09:32,053][00031] Fps is (10 sec: 9830.7, 60 sec: 10103.5, 300 sec: 10191.4). Total num frames: 7954432. Throughput: 0: 2535.3. Samples: 1987518. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:09:32,057][00031] Avg episode reward: [(0, '24.487')] +[2025-02-27 21:09:37,053][00031] Fps is (10 sec: 10650.1, 60 sec: 10103.5, 300 sec: 10219.2). Total num frames: 8011776. Throughput: 0: 2548.5. Samples: 2002836. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:09:37,054][00031] Avg episode reward: [(0, '24.013')] +[2025-02-27 21:09:38,763][00216] Updated weights for policy 0, policy_version 980 (0.0015) +[2025-02-27 21:09:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10191.4). Total num frames: 8060928. Throughput: 0: 2550.0. Samples: 2018352. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:09:42,054][00031] Avg episode reward: [(0, '20.971')] +[2025-02-27 21:09:46,637][00216] Updated weights for policy 0, policy_version 990 (0.0019) +[2025-02-27 21:09:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 8110080. Throughput: 0: 2540.4. Samples: 2025828. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:09:47,054][00031] Avg episode reward: [(0, '20.214')] +[2025-02-27 21:09:52,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10219.2). Total num frames: 8159232. Throughput: 0: 2529.1. Samples: 2040858. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:09:52,056][00031] Avg episode reward: [(0, '19.532')] +[2025-02-27 21:09:54,822][00216] Updated weights for policy 0, policy_version 1000 (0.0017) +[2025-02-27 21:09:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10191.4). Total num frames: 8208384. Throughput: 0: 2512.1. Samples: 2055714. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:09:57,055][00031] Avg episode reward: [(0, '20.597')] +[2025-02-27 21:10:02,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10103.4, 300 sec: 10191.4). Total num frames: 8257536. Throughput: 0: 2518.3. Samples: 2063526. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:10:02,055][00031] Avg episode reward: [(0, '22.155')] +[2025-02-27 21:10:02,986][00216] Updated weights for policy 0, policy_version 1010 (0.0016) +[2025-02-27 21:10:07,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10219.2). Total num frames: 8314880. Throughput: 0: 2544.4. Samples: 2078892. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:10:07,055][00031] Avg episode reward: [(0, '22.189')] +[2025-02-27 21:10:10,866][00216] Updated weights for policy 0, policy_version 1020 (0.0017) +[2025-02-27 21:10:12,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10103.5, 300 sec: 10191.4). Total num frames: 8364032. Throughput: 0: 2545.8. Samples: 2094348. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:10:12,055][00031] Avg episode reward: [(0, '20.874')] +[2025-02-27 21:10:17,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10103.5, 300 sec: 10191.5). Total num frames: 8413184. Throughput: 0: 2540.4. Samples: 2101836. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:10:17,055][00031] Avg episode reward: [(0, '21.522')] +[2025-02-27 21:10:18,862][00216] Updated weights for policy 0, policy_version 1030 (0.0015) +[2025-02-27 21:10:22,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10103.4, 300 sec: 10191.4). Total num frames: 8462336. Throughput: 0: 2550.0. Samples: 2117586. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:10:22,057][00031] Avg episode reward: [(0, '23.856')] +[2025-02-27 21:10:27,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.5, 300 sec: 10191.4). Total num frames: 8511488. Throughput: 0: 2514.7. Samples: 2131512. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:10:27,056][00031] Avg episode reward: [(0, '24.262')] +[2025-02-27 21:10:27,208][00216] Updated weights for policy 0, policy_version 1040 (0.0016) +[2025-02-27 21:10:32,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.4, 300 sec: 10163.7). Total num frames: 8560640. Throughput: 0: 2520.0. Samples: 2139228. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:10:32,055][00031] Avg episode reward: [(0, '23.993')] +[2025-02-27 21:10:35,444][00216] Updated weights for policy 0, policy_version 1050 (0.0017) +[2025-02-27 21:10:37,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9966.9, 300 sec: 10191.4). Total num frames: 8609792. Throughput: 0: 2526.0. Samples: 2154528. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:10:37,056][00031] Avg episode reward: [(0, '24.020')] +[2025-02-27 21:10:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.4, 300 sec: 10191.4). Total num frames: 8667136. Throughput: 0: 2544.1. Samples: 2170200. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:10:42,055][00031] Avg episode reward: [(0, '25.559')] +[2025-02-27 21:10:42,056][00196] Saving new best policy, reward=25.559! +[2025-02-27 21:10:43,206][00216] Updated weights for policy 0, policy_version 1060 (0.0018) +[2025-02-27 21:10:47,053][00031] Fps is (10 sec: 11468.9, 60 sec: 10240.0, 300 sec: 10219.2). Total num frames: 8724480. Throughput: 0: 2538.8. Samples: 2177772. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:10:47,055][00031] Avg episode reward: [(0, '25.507')] +[2025-02-27 21:10:51,529][00216] Updated weights for policy 0, policy_version 1070 (0.0016) +[2025-02-27 21:10:52,054][00031] Fps is (10 sec: 10648.7, 60 sec: 10239.8, 300 sec: 10219.1). Total num frames: 8773632. Throughput: 0: 2543.5. Samples: 2193354. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:10:52,057][00031] Avg episode reward: [(0, '24.214')] +[2025-02-27 21:10:57,053][00031] Fps is (10 sec: 9011.2, 60 sec: 10103.5, 300 sec: 10191.4). Total num frames: 8814592. Throughput: 0: 2530.3. Samples: 2208210. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:10:57,054][00031] Avg episode reward: [(0, '22.932')] +[2025-02-27 21:10:59,628][00216] Updated weights for policy 0, policy_version 1080 (0.0016) +[2025-02-27 21:11:02,053][00031] Fps is (10 sec: 9831.2, 60 sec: 10240.0, 300 sec: 10219.2). Total num frames: 8871936. Throughput: 0: 2518.8. Samples: 2215182. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:11:02,055][00031] Avg episode reward: [(0, '22.830')] +[2025-02-27 21:11:07,053][00031] Fps is (10 sec: 9829.9, 60 sec: 9966.9, 300 sec: 10163.6). Total num frames: 8912896. Throughput: 0: 2510.8. Samples: 2230572. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:11:07,055][00031] Avg episode reward: [(0, '22.567')] +[2025-02-27 21:11:07,115][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001089_8921088.pth... +[2025-02-27 21:11:07,258][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_6471680.pth +[2025-02-27 21:11:07,458][00216] Updated weights for policy 0, policy_version 1090 (0.0018) +[2025-02-27 21:11:12,053][00031] Fps is (10 sec: 9830.6, 60 sec: 10103.4, 300 sec: 10191.4). Total num frames: 8970240. Throughput: 0: 2546.0. Samples: 2246082. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:11:12,054][00031] Avg episode reward: [(0, '24.574')] +[2025-02-27 21:11:15,875][00216] Updated weights for policy 0, policy_version 1100 (0.0016) +[2025-02-27 21:11:17,054][00031] Fps is (10 sec: 10648.7, 60 sec: 10103.2, 300 sec: 10191.4). Total num frames: 9019392. Throughput: 0: 2540.9. Samples: 2253570. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:11:17,057][00031] Avg episode reward: [(0, '24.189')] +[2025-02-27 21:11:22,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10240.1, 300 sec: 10191.4). Total num frames: 9076736. Throughput: 0: 2548.7. Samples: 2269218. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:11:22,057][00031] Avg episode reward: [(0, '22.902')] +[2025-02-27 21:11:23,627][00216] Updated weights for policy 0, policy_version 1110 (0.0016) +[2025-02-27 21:11:27,053][00031] Fps is (10 sec: 10650.8, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 9125888. Throughput: 0: 2535.1. Samples: 2284278. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:11:27,057][00031] Avg episode reward: [(0, '23.553')] +[2025-02-27 21:11:32,053][00031] Fps is (10 sec: 9010.6, 60 sec: 10103.4, 300 sec: 10191.4). Total num frames: 9166848. Throughput: 0: 2516.9. Samples: 2291034. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:11:32,056][00031] Avg episode reward: [(0, '24.375')] +[2025-02-27 21:11:32,440][00216] Updated weights for policy 0, policy_version 1120 (0.0016) +[2025-02-27 21:11:37,054][00031] Fps is (10 sec: 9010.4, 60 sec: 10103.3, 300 sec: 10163.6). Total num frames: 9216000. Throughput: 0: 2475.3. Samples: 2304744. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:11:37,056][00031] Avg episode reward: [(0, '25.225')] +[2025-02-27 21:11:40,833][00216] Updated weights for policy 0, policy_version 1130 (0.0021) +[2025-02-27 21:11:42,053][00031] Fps is (10 sec: 9831.0, 60 sec: 9967.0, 300 sec: 10135.9). Total num frames: 9265152. Throughput: 0: 2471.5. Samples: 2319426. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:11:42,054][00031] Avg episode reward: [(0, '22.805')] +[2025-02-27 21:11:47,053][00031] Fps is (10 sec: 9831.5, 60 sec: 9830.4, 300 sec: 10135.9). Total num frames: 9314304. Throughput: 0: 2481.8. Samples: 2326860. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:11:47,055][00031] Avg episode reward: [(0, '21.464')] +[2025-02-27 21:11:49,091][00216] Updated weights for policy 0, policy_version 1140 (0.0016) +[2025-02-27 21:11:52,053][00031] Fps is (10 sec: 10649.4, 60 sec: 9967.1, 300 sec: 10163.6). Total num frames: 9371648. Throughput: 0: 2489.5. Samples: 2342598. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:11:52,057][00031] Avg episode reward: [(0, '22.656')] +[2025-02-27 21:11:57,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10103.5, 300 sec: 10163.6). Total num frames: 9420800. Throughput: 0: 2491.6. Samples: 2358204. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:11:57,055][00216] Updated weights for policy 0, policy_version 1150 (0.0017) +[2025-02-27 21:11:57,055][00031] Avg episode reward: [(0, '22.957')] +[2025-02-27 21:12:02,053][00031] Fps is (10 sec: 9830.5, 60 sec: 9967.0, 300 sec: 10163.6). Total num frames: 9469952. Throughput: 0: 2503.0. Samples: 2366202. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:12:02,054][00031] Avg episode reward: [(0, '23.553')] +[2025-02-27 21:12:05,030][00216] Updated weights for policy 0, policy_version 1160 (0.0016) +[2025-02-27 21:12:07,053][00031] Fps is (10 sec: 9829.9, 60 sec: 10103.5, 300 sec: 10163.6). Total num frames: 9519104. Throughput: 0: 2467.4. Samples: 2380254. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:12:07,055][00031] Avg episode reward: [(0, '25.985')] +[2025-02-27 21:12:07,063][00196] Saving new best policy, reward=25.985! +[2025-02-27 21:12:12,053][00031] Fps is (10 sec: 9830.1, 60 sec: 9966.9, 300 sec: 10135.9). Total num frames: 9568256. Throughput: 0: 2481.1. Samples: 2395926. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:12:12,056][00031] Avg episode reward: [(0, '27.444')] +[2025-02-27 21:12:12,058][00196] Saving new best policy, reward=27.444! +[2025-02-27 21:12:12,940][00216] Updated weights for policy 0, policy_version 1170 (0.0016) +[2025-02-27 21:12:17,053][00031] Fps is (10 sec: 9830.5, 60 sec: 9967.1, 300 sec: 10135.9). Total num frames: 9617408. Throughput: 0: 2497.2. Samples: 2403408. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:12:17,058][00031] Avg episode reward: [(0, '22.993')] +[2025-02-27 21:12:20,839][00216] Updated weights for policy 0, policy_version 1180 (0.0020) +[2025-02-27 21:12:22,055][00031] Fps is (10 sec: 10647.8, 60 sec: 9966.6, 300 sec: 10163.6). Total num frames: 9674752. Throughput: 0: 2541.3. Samples: 2419104. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:12:22,059][00031] Avg episode reward: [(0, '23.036')] +[2025-02-27 21:12:27,053][00031] Fps is (10 sec: 10649.7, 60 sec: 9966.9, 300 sec: 10135.9). Total num frames: 9723904. Throughput: 0: 2556.1. Samples: 2434452. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:12:27,056][00031] Avg episode reward: [(0, '24.310')] +[2025-02-27 21:12:28,968][00216] Updated weights for policy 0, policy_version 1190 (0.0018) +[2025-02-27 21:12:32,053][00031] Fps is (10 sec: 10651.7, 60 sec: 10240.1, 300 sec: 10163.6). Total num frames: 9781248. Throughput: 0: 2566.5. Samples: 2442354. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:12:32,056][00031] Avg episode reward: [(0, '24.033')] +[2025-02-27 21:12:37,056][00031] Fps is (10 sec: 9827.7, 60 sec: 10103.1, 300 sec: 10135.8). Total num frames: 9822208. Throughput: 0: 2551.0. Samples: 2457402. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:12:37,057][00031] Avg episode reward: [(0, '21.836')] +[2025-02-27 21:12:37,645][00216] Updated weights for policy 0, policy_version 1200 (0.0016) +[2025-02-27 21:12:42,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10239.9, 300 sec: 10163.6). Total num frames: 9879552. Throughput: 0: 2534.1. Samples: 2472240. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:12:42,056][00031] Avg episode reward: [(0, '20.930')] +[2025-02-27 21:12:45,126][00216] Updated weights for policy 0, policy_version 1210 (0.0016) +[2025-02-27 21:12:47,061][00031] Fps is (10 sec: 10644.7, 60 sec: 10238.7, 300 sec: 10163.4). Total num frames: 9928704. Throughput: 0: 2527.3. Samples: 2479950. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:12:47,068][00031] Avg episode reward: [(0, '20.915')] +[2025-02-27 21:12:52,053][00031] Fps is (10 sec: 9830.9, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 9977856. Throughput: 0: 2561.9. Samples: 2495538. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:12:52,054][00031] Avg episode reward: [(0, '21.483')] +[2025-02-27 21:12:53,278][00216] Updated weights for policy 0, policy_version 1220 (0.0016) +[2025-02-27 21:12:57,053][00031] Fps is (10 sec: 10657.6, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 10035200. Throughput: 0: 2557.7. Samples: 2511024. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:12:57,058][00031] Avg episode reward: [(0, '22.249')] +[2025-02-27 21:13:01,112][00216] Updated weights for policy 0, policy_version 1230 (0.0016) +[2025-02-27 21:13:02,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 10076160. Throughput: 0: 2566.2. Samples: 2518884. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:13:02,055][00031] Avg episode reward: [(0, '24.160')] +[2025-02-27 21:13:07,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.1, 300 sec: 10135.9). Total num frames: 10133504. Throughput: 0: 2559.4. Samples: 2534274. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:13:07,055][00031] Avg episode reward: [(0, '24.215')] +[2025-02-27 21:13:07,067][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001237_10133504.pth... +[2025-02-27 21:13:07,209][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000940_7700480.pth +[2025-02-27 21:13:09,267][00216] Updated weights for policy 0, policy_version 1240 (0.0026) +[2025-02-27 21:13:12,053][00031] Fps is (10 sec: 10649.2, 60 sec: 10240.0, 300 sec: 10135.8). Total num frames: 10182656. Throughput: 0: 2532.9. Samples: 2548434. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:13:12,056][00031] Avg episode reward: [(0, '25.633')] +[2025-02-27 21:13:17,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.1, 300 sec: 10108.1). Total num frames: 10231808. Throughput: 0: 2525.6. Samples: 2556006. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:13:17,054][00031] Avg episode reward: [(0, '23.289')] +[2025-02-27 21:13:17,613][00216] Updated weights for policy 0, policy_version 1250 (0.0016) +[2025-02-27 21:13:22,053][00031] Fps is (10 sec: 9830.8, 60 sec: 10103.8, 300 sec: 10135.9). Total num frames: 10280960. Throughput: 0: 2541.5. Samples: 2571762. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:13:22,054][00031] Avg episode reward: [(0, '24.797')] +[2025-02-27 21:13:25,489][00216] Updated weights for policy 0, policy_version 1260 (0.0016) +[2025-02-27 21:13:27,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.1, 300 sec: 10135.9). Total num frames: 10338304. Throughput: 0: 2555.5. Samples: 2587236. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:13:27,056][00031] Avg episode reward: [(0, '24.255')] +[2025-02-27 21:13:32,053][00031] Fps is (10 sec: 9830.4, 60 sec: 9966.9, 300 sec: 10080.3). Total num frames: 10379264. Throughput: 0: 2558.3. Samples: 2595054. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:13:32,055][00031] Avg episode reward: [(0, '21.989')] +[2025-02-27 21:13:33,181][00216] Updated weights for policy 0, policy_version 1270 (0.0015) +[2025-02-27 21:13:37,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10240.5, 300 sec: 10108.1). Total num frames: 10436608. Throughput: 0: 2557.7. Samples: 2610636. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:13:37,055][00031] Avg episode reward: [(0, '23.947')] +[2025-02-27 21:13:41,461][00216] Updated weights for policy 0, policy_version 1280 (0.0015) +[2025-02-27 21:13:42,053][00031] Fps is (10 sec: 11468.4, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 10493952. Throughput: 0: 2552.0. Samples: 2625864. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:13:42,055][00031] Avg episode reward: [(0, '24.169')] +[2025-02-27 21:13:47,053][00031] Fps is (10 sec: 9830.6, 60 sec: 10104.7, 300 sec: 10108.1). Total num frames: 10534912. Throughput: 0: 2524.5. Samples: 2632488. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:13:47,056][00031] Avg episode reward: [(0, '23.227')] +[2025-02-27 21:13:49,727][00216] Updated weights for policy 0, policy_version 1290 (0.0023) +[2025-02-27 21:13:52,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10239.9, 300 sec: 10135.8). Total num frames: 10592256. Throughput: 0: 2525.2. Samples: 2647908. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:13:52,055][00031] Avg episode reward: [(0, '22.718')] +[2025-02-27 21:13:57,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 10641408. Throughput: 0: 2551.5. Samples: 2663250. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:13:57,057][00031] Avg episode reward: [(0, '26.048')] +[2025-02-27 21:13:57,750][00216] Updated weights for policy 0, policy_version 1300 (0.0016) +[2025-02-27 21:14:02,053][00031] Fps is (10 sec: 9830.9, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 10690560. Throughput: 0: 2552.5. Samples: 2670870. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:14:02,057][00031] Avg episode reward: [(0, '28.657')] +[2025-02-27 21:14:02,060][00196] Saving new best policy, reward=28.657! +[2025-02-27 21:14:05,672][00216] Updated weights for policy 0, policy_version 1310 (0.0018) +[2025-02-27 21:14:07,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 10739712. Throughput: 0: 2542.7. Samples: 2686182. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:14:07,055][00031] Avg episode reward: [(0, '26.389')] +[2025-02-27 21:14:12,053][00031] Fps is (10 sec: 10649.3, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 10797056. Throughput: 0: 2544.8. Samples: 2701752. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:14:12,056][00031] Avg episode reward: [(0, '24.436')] +[2025-02-27 21:14:13,605][00216] Updated weights for policy 0, policy_version 1320 (0.0016) +[2025-02-27 21:14:17,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 10838016. Throughput: 0: 2530.7. Samples: 2708934. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:14:17,054][00031] Avg episode reward: [(0, '26.078')] +[2025-02-27 21:14:22,053][00031] Fps is (10 sec: 9011.5, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 10887168. Throughput: 0: 2510.7. Samples: 2723616. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:14:22,054][00031] Avg episode reward: [(0, '25.838')] +[2025-02-27 21:14:22,336][00216] Updated weights for policy 0, policy_version 1330 (0.0018) +[2025-02-27 21:14:27,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 10944512. Throughput: 0: 2515.9. Samples: 2739078. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:14:27,055][00031] Avg episode reward: [(0, '24.264')] +[2025-02-27 21:14:29,923][00216] Updated weights for policy 0, policy_version 1340 (0.0017) +[2025-02-27 21:14:32,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10108.1). Total num frames: 10993664. Throughput: 0: 2543.9. Samples: 2746962. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:14:32,055][00031] Avg episode reward: [(0, '20.321')] +[2025-02-27 21:14:37,053][00031] Fps is (10 sec: 9830.1, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 11042816. Throughput: 0: 2543.2. Samples: 2762352. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:14:37,055][00031] Avg episode reward: [(0, '22.070')] +[2025-02-27 21:14:37,969][00216] Updated weights for policy 0, policy_version 1350 (0.0017) +[2025-02-27 21:14:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 11100160. Throughput: 0: 2547.5. Samples: 2777886. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:14:42,055][00031] Avg episode reward: [(0, '25.154')] +[2025-02-27 21:14:45,823][00216] Updated weights for policy 0, policy_version 1360 (0.0017) +[2025-02-27 21:14:47,053][00031] Fps is (10 sec: 10649.4, 60 sec: 10240.0, 300 sec: 10135.8). Total num frames: 11149312. Throughput: 0: 2548.0. Samples: 2785530. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:14:47,061][00031] Avg episode reward: [(0, '25.157')] +[2025-02-27 21:14:52,053][00031] Fps is (10 sec: 9011.2, 60 sec: 9967.0, 300 sec: 10108.1). Total num frames: 11190272. Throughput: 0: 2522.4. Samples: 2799690. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:14:52,056][00031] Avg episode reward: [(0, '27.164')] +[2025-02-27 21:14:54,644][00216] Updated weights for policy 0, policy_version 1370 (0.0016) +[2025-02-27 21:14:57,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10103.4, 300 sec: 10135.9). Total num frames: 11247616. Throughput: 0: 2519.2. Samples: 2815116. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:14:57,056][00031] Avg episode reward: [(0, '27.300')] +[2025-02-27 21:15:02,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 11296768. Throughput: 0: 2533.7. Samples: 2822952. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:15:02,056][00031] Avg episode reward: [(0, '26.127')] +[2025-02-27 21:15:02,161][00216] Updated weights for policy 0, policy_version 1380 (0.0015) +[2025-02-27 21:15:07,054][00031] Fps is (10 sec: 9830.2, 60 sec: 10103.4, 300 sec: 10108.1). Total num frames: 11345920. Throughput: 0: 2552.2. Samples: 2838468. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:15:07,056][00031] Avg episode reward: [(0, '23.999')] +[2025-02-27 21:15:07,103][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001386_11354112.pth... +[2025-02-27 21:15:07,245][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001089_8921088.pth +[2025-02-27 21:15:10,347][00216] Updated weights for policy 0, policy_version 1390 (0.0017) +[2025-02-27 21:15:12,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 11403264. Throughput: 0: 2554.5. Samples: 2854032. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:15:12,055][00031] Avg episode reward: [(0, '22.375')] +[2025-02-27 21:15:17,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 11452416. Throughput: 0: 2544.0. Samples: 2861442. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:15:17,057][00031] Avg episode reward: [(0, '23.920')] +[2025-02-27 21:15:18,481][00216] Updated weights for policy 0, policy_version 1400 (0.0016) +[2025-02-27 21:15:22,053][00031] Fps is (10 sec: 9830.2, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 11501568. Throughput: 0: 2540.9. Samples: 2876694. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:15:22,054][00031] Avg episode reward: [(0, '26.913')] +[2025-02-27 21:15:26,667][00216] Updated weights for policy 0, policy_version 1410 (0.0015) +[2025-02-27 21:15:27,053][00031] Fps is (10 sec: 9830.7, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 11550720. Throughput: 0: 2516.1. Samples: 2891112. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:15:27,054][00031] Avg episode reward: [(0, '26.659')] +[2025-02-27 21:15:32,053][00031] Fps is (10 sec: 9830.6, 60 sec: 10103.5, 300 sec: 10135.9). Total num frames: 11599872. Throughput: 0: 2519.5. Samples: 2898906. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:15:32,055][00031] Avg episode reward: [(0, '25.655')] +[2025-02-27 21:15:34,581][00216] Updated weights for policy 0, policy_version 1420 (0.0016) +[2025-02-27 21:15:37,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 11649024. Throughput: 0: 2548.3. Samples: 2914362. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:15:37,054][00031] Avg episode reward: [(0, '23.715')] +[2025-02-27 21:15:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 11706368. Throughput: 0: 2552.2. Samples: 2929962. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:15:42,054][00031] Avg episode reward: [(0, '26.137')] +[2025-02-27 21:15:42,488][00216] Updated weights for policy 0, policy_version 1430 (0.0016) +[2025-02-27 21:15:47,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 11755520. Throughput: 0: 2546.0. Samples: 2937522. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:15:47,055][00031] Avg episode reward: [(0, '27.125')] +[2025-02-27 21:15:50,477][00216] Updated weights for policy 0, policy_version 1440 (0.0016) +[2025-02-27 21:15:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10163.6). Total num frames: 11812864. Throughput: 0: 2551.9. Samples: 2953302. Policy #0 lag: (min: 0.0, avg: 2.1, max: 6.0) +[2025-02-27 21:15:52,054][00031] Avg episode reward: [(0, '24.146')] +[2025-02-27 21:15:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 11853824. Throughput: 0: 2517.3. Samples: 2967312. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:15:57,054][00031] Avg episode reward: [(0, '22.778')] +[2025-02-27 21:15:59,106][00216] Updated weights for policy 0, policy_version 1450 (0.0021) +[2025-02-27 21:16:02,053][00031] Fps is (10 sec: 9010.7, 60 sec: 10103.4, 300 sec: 10135.9). Total num frames: 11902976. Throughput: 0: 2527.5. Samples: 2975178. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:16:02,057][00031] Avg episode reward: [(0, '24.490')] +[2025-02-27 21:16:06,549][00216] Updated weights for policy 0, policy_version 1460 (0.0020) +[2025-02-27 21:16:07,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.1, 300 sec: 10135.9). Total num frames: 11960320. Throughput: 0: 2532.8. Samples: 2990670. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:16:07,054][00031] Avg episode reward: [(0, '25.121')] +[2025-02-27 21:16:12,053][00031] Fps is (10 sec: 11469.5, 60 sec: 10240.0, 300 sec: 10163.7). Total num frames: 12017664. Throughput: 0: 2562.9. Samples: 3006444. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:16:12,055][00031] Avg episode reward: [(0, '26.343')] +[2025-02-27 21:16:14,678][00216] Updated weights for policy 0, policy_version 1470 (0.0016) +[2025-02-27 21:16:17,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10108.1). Total num frames: 12058624. Throughput: 0: 2558.4. Samples: 3014034. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:16:17,054][00031] Avg episode reward: [(0, '25.825')] +[2025-02-27 21:16:22,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10240.0, 300 sec: 10135.9). Total num frames: 12115968. Throughput: 0: 2564.5. Samples: 3029766. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:16:22,056][00031] Avg episode reward: [(0, '25.192')] +[2025-02-27 21:16:22,567][00216] Updated weights for policy 0, policy_version 1480 (0.0015) +[2025-02-27 21:16:27,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10163.7). Total num frames: 12165120. Throughput: 0: 2563.6. Samples: 3045324. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:16:27,056][00031] Avg episode reward: [(0, '26.584')] +[2025-02-27 21:16:30,843][00216] Updated weights for policy 0, policy_version 1490 (0.0015) +[2025-02-27 21:16:32,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10163.7). Total num frames: 12214272. Throughput: 0: 2545.3. Samples: 3052062. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:16:32,055][00031] Avg episode reward: [(0, '25.165')] +[2025-02-27 21:16:37,054][00031] Fps is (10 sec: 10648.6, 60 sec: 10376.4, 300 sec: 10191.4). Total num frames: 12271616. Throughput: 0: 2546.2. Samples: 3067884. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:16:37,055][00031] Avg episode reward: [(0, '25.735')] +[2025-02-27 21:16:38,587][00216] Updated weights for policy 0, policy_version 1500 (0.0015) +[2025-02-27 21:16:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 12320768. Throughput: 0: 2591.1. Samples: 3083910. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:16:42,055][00031] Avg episode reward: [(0, '27.815')] +[2025-02-27 21:16:46,339][00216] Updated weights for policy 0, policy_version 1510 (0.0016) +[2025-02-27 21:16:47,053][00031] Fps is (10 sec: 9831.3, 60 sec: 10240.0, 300 sec: 10163.6). Total num frames: 12369920. Throughput: 0: 2589.8. Samples: 3091716. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:16:47,055][00031] Avg episode reward: [(0, '25.079')] +[2025-02-27 21:16:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 12427264. Throughput: 0: 2599.1. Samples: 3107628. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:16:52,055][00031] Avg episode reward: [(0, '22.881')] +[2025-02-27 21:16:54,089][00216] Updated weights for policy 0, policy_version 1520 (0.0015) +[2025-02-27 21:16:57,054][00031] Fps is (10 sec: 11467.6, 60 sec: 10512.9, 300 sec: 10219.1). Total num frames: 12484608. Throughput: 0: 2599.9. Samples: 3123444. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:16:57,058][00031] Avg episode reward: [(0, '21.441')] +[2025-02-27 21:17:02,053][00031] Fps is (10 sec: 9830.2, 60 sec: 10376.6, 300 sec: 10191.4). Total num frames: 12525568. Throughput: 0: 2603.6. Samples: 3131196. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:17:02,055][00031] Avg episode reward: [(0, '23.626')] +[2025-02-27 21:17:02,321][00216] Updated weights for policy 0, policy_version 1530 (0.0016) +[2025-02-27 21:17:07,053][00031] Fps is (10 sec: 9831.5, 60 sec: 10376.5, 300 sec: 10219.2). Total num frames: 12582912. Throughput: 0: 2577.2. Samples: 3145740. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:17:07,056][00031] Avg episode reward: [(0, '26.098')] +[2025-02-27 21:17:07,065][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001536_12582912.pth... +[2025-02-27 21:17:07,198][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001237_10133504.pth +[2025-02-27 21:17:09,972][00216] Updated weights for policy 0, policy_version 1540 (0.0015) +[2025-02-27 21:17:12,054][00031] Fps is (10 sec: 10649.0, 60 sec: 10239.9, 300 sec: 10219.2). Total num frames: 12632064. Throughput: 0: 2582.4. Samples: 3161532. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:17:12,057][00031] Avg episode reward: [(0, '29.033')] +[2025-02-27 21:17:12,064][00196] Saving new best policy, reward=29.033! +[2025-02-27 21:17:17,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10376.5, 300 sec: 10191.5). Total num frames: 12681216. Throughput: 0: 2602.7. Samples: 3169182. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:17:17,056][00031] Avg episode reward: [(0, '29.433')] +[2025-02-27 21:17:17,065][00196] Saving new best policy, reward=29.433! +[2025-02-27 21:17:17,946][00216] Updated weights for policy 0, policy_version 1550 (0.0016) +[2025-02-27 21:17:22,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.5, 300 sec: 10219.2). Total num frames: 12738560. Throughput: 0: 2602.0. Samples: 3184974. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:17:22,055][00031] Avg episode reward: [(0, '27.738')] +[2025-02-27 21:17:25,939][00216] Updated weights for policy 0, policy_version 1560 (0.0015) +[2025-02-27 21:17:27,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.5, 300 sec: 10191.4). Total num frames: 12787712. Throughput: 0: 2599.3. Samples: 3200880. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:17:27,054][00031] Avg episode reward: [(0, '27.301')] +[2025-02-27 21:17:32,054][00031] Fps is (10 sec: 10648.7, 60 sec: 10512.8, 300 sec: 10247.0). Total num frames: 12845056. Throughput: 0: 2603.0. Samples: 3208854. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:17:32,057][00031] Avg episode reward: [(0, '22.144')] +[2025-02-27 21:17:33,752][00216] Updated weights for policy 0, policy_version 1570 (0.0015) +[2025-02-27 21:17:37,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.2, 300 sec: 10191.4). Total num frames: 12886016. Throughput: 0: 2569.1. Samples: 3223236. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:17:37,055][00031] Avg episode reward: [(0, '19.838')] +[2025-02-27 21:17:41,547][00216] Updated weights for policy 0, policy_version 1580 (0.0016) +[2025-02-27 21:17:42,053][00031] Fps is (10 sec: 9831.7, 60 sec: 10376.5, 300 sec: 10219.4). Total num frames: 12943360. Throughput: 0: 2573.4. Samples: 3239244. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:17:42,055][00031] Avg episode reward: [(0, '22.612')] +[2025-02-27 21:17:47,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10219.2). Total num frames: 12992512. Throughput: 0: 2572.9. Samples: 3246978. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:17:47,056][00031] Avg episode reward: [(0, '25.633')] +[2025-02-27 21:17:49,709][00216] Updated weights for policy 0, policy_version 1590 (0.0016) +[2025-02-27 21:17:52,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10191.4). Total num frames: 13041664. Throughput: 0: 2598.1. Samples: 3262656. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:17:52,057][00031] Avg episode reward: [(0, '25.560')] +[2025-02-27 21:17:57,053][00031] Fps is (10 sec: 10649.1, 60 sec: 10240.1, 300 sec: 10246.9). Total num frames: 13099008. Throughput: 0: 2598.9. Samples: 3278484. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:17:57,055][00031] Avg episode reward: [(0, '26.736')] +[2025-02-27 21:17:57,489][00216] Updated weights for policy 0, policy_version 1600 (0.0015) +[2025-02-27 21:18:02,053][00031] Fps is (10 sec: 11468.7, 60 sec: 10513.1, 300 sec: 10246.9). Total num frames: 13156352. Throughput: 0: 2606.0. Samples: 3286452. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:18:02,056][00031] Avg episode reward: [(0, '26.865')] +[2025-02-27 21:18:05,255][00216] Updated weights for policy 0, policy_version 1610 (0.0022) +[2025-02-27 21:18:07,053][00031] Fps is (10 sec: 10650.2, 60 sec: 10376.5, 300 sec: 10247.0). Total num frames: 13205504. Throughput: 0: 2602.2. Samples: 3302070. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:18:07,054][00031] Avg episode reward: [(0, '26.269')] +[2025-02-27 21:18:12,053][00031] Fps is (10 sec: 9011.3, 60 sec: 10240.1, 300 sec: 10219.2). Total num frames: 13246464. Throughput: 0: 2574.7. Samples: 3316740. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:18:12,055][00031] Avg episode reward: [(0, '25.857')] +[2025-02-27 21:18:13,266][00216] Updated weights for policy 0, policy_version 1620 (0.0017) +[2025-02-27 21:18:17,054][00031] Fps is (10 sec: 9829.7, 60 sec: 10376.4, 300 sec: 10246.9). Total num frames: 13303808. Throughput: 0: 2569.4. Samples: 3324474. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:18:17,055][00031] Avg episode reward: [(0, '26.919')] +[2025-02-27 21:18:21,299][00216] Updated weights for policy 0, policy_version 1630 (0.0016) +[2025-02-27 21:18:22,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10376.6, 300 sec: 10246.9). Total num frames: 13361152. Throughput: 0: 2607.9. Samples: 3340590. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:18:22,056][00031] Avg episode reward: [(0, '27.533')] +[2025-02-27 21:18:27,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 13410304. Throughput: 0: 2603.7. Samples: 3356412. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:18:27,055][00031] Avg episode reward: [(0, '25.297')] +[2025-02-27 21:18:29,149][00216] Updated weights for policy 0, policy_version 1640 (0.0017) +[2025-02-27 21:18:32,053][00031] Fps is (10 sec: 9830.2, 60 sec: 10240.2, 300 sec: 10247.0). Total num frames: 13459456. Throughput: 0: 2608.8. Samples: 3364374. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:18:32,055][00031] Avg episode reward: [(0, '25.719')] +[2025-02-27 21:18:36,407][00216] Updated weights for policy 0, policy_version 1650 (0.0018) +[2025-02-27 21:18:37,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.0, 300 sec: 10246.9). Total num frames: 13516800. Throughput: 0: 2608.2. Samples: 3380028. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:18:37,057][00031] Avg episode reward: [(0, '26.269')] +[2025-02-27 21:18:42,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 13565952. Throughput: 0: 2584.3. Samples: 3394776. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:18:42,055][00031] Avg episode reward: [(0, '26.301')] +[2025-02-27 21:18:44,803][00216] Updated weights for policy 0, policy_version 1660 (0.0016) +[2025-02-27 21:18:47,053][00031] Fps is (10 sec: 9830.9, 60 sec: 10376.5, 300 sec: 10247.0). Total num frames: 13615104. Throughput: 0: 2582.3. Samples: 3402654. Policy #0 lag: (min: 0.0, avg: 2.3, max: 4.0) +[2025-02-27 21:18:47,055][00031] Avg episode reward: [(0, '24.512')] +[2025-02-27 21:18:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10274.7). Total num frames: 13672448. Throughput: 0: 2581.5. Samples: 3418236. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:18:52,054][00031] Avg episode reward: [(0, '24.411')] +[2025-02-27 21:18:53,113][00216] Updated weights for policy 0, policy_version 1670 (0.0016) +[2025-02-27 21:18:57,053][00031] Fps is (10 sec: 10649.1, 60 sec: 10376.6, 300 sec: 10274.7). Total num frames: 13721600. Throughput: 0: 2598.6. Samples: 3433680. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:18:57,055][00031] Avg episode reward: [(0, '24.646')] +[2025-02-27 21:19:00,746][00216] Updated weights for policy 0, policy_version 1680 (0.0015) +[2025-02-27 21:19:02,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 13770752. Throughput: 0: 2601.9. Samples: 3441558. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:19:02,055][00031] Avg episode reward: [(0, '26.147')] +[2025-02-27 21:19:07,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 13828096. Throughput: 0: 2588.4. Samples: 3457068. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:19:07,056][00031] Avg episode reward: [(0, '26.418')] +[2025-02-27 21:19:07,067][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001688_13828096.pth... +[2025-02-27 21:19:07,207][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001386_11354112.pth +[2025-02-27 21:19:08,345][00216] Updated weights for policy 0, policy_version 1690 (0.0016) +[2025-02-27 21:19:12,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10302.5). Total num frames: 13877248. Throughput: 0: 2587.1. Samples: 3472830. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:19:12,055][00031] Avg episode reward: [(0, '25.336')] +[2025-02-27 21:19:17,053][00031] Fps is (10 sec: 9011.2, 60 sec: 10240.1, 300 sec: 10274.7). Total num frames: 13918208. Throughput: 0: 2555.9. Samples: 3479388. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:19:17,054][00031] Avg episode reward: [(0, '28.069')] +[2025-02-27 21:19:17,068][00216] Updated weights for policy 0, policy_version 1700 (0.0018) +[2025-02-27 21:19:22,053][00031] Fps is (10 sec: 9829.9, 60 sec: 10239.9, 300 sec: 10274.7). Total num frames: 13975552. Throughput: 0: 2565.2. Samples: 3495462. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:19:22,055][00031] Avg episode reward: [(0, '27.249')] +[2025-02-27 21:19:24,813][00216] Updated weights for policy 0, policy_version 1710 (0.0017) +[2025-02-27 21:19:27,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.1, 300 sec: 10274.7). Total num frames: 14024704. Throughput: 0: 2562.3. Samples: 3510078. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:19:27,055][00031] Avg episode reward: [(0, '23.288')] +[2025-02-27 21:19:32,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10376.6, 300 sec: 10302.5). Total num frames: 14082048. Throughput: 0: 2562.1. Samples: 3517950. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:19:32,054][00031] Avg episode reward: [(0, '24.864')] +[2025-02-27 21:19:33,113][00216] Updated weights for policy 0, policy_version 1720 (0.0016) +[2025-02-27 21:19:37,053][00031] Fps is (10 sec: 10649.1, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 14131200. Throughput: 0: 2556.0. Samples: 3533256. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:19:37,055][00031] Avg episode reward: [(0, '26.609')] +[2025-02-27 21:19:40,395][00216] Updated weights for policy 0, policy_version 1730 (0.0016) +[2025-02-27 21:19:42,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 14180352. Throughput: 0: 2557.1. Samples: 3548748. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:19:42,055][00031] Avg episode reward: [(0, '25.389')] +[2025-02-27 21:19:47,053][00031] Fps is (10 sec: 9830.9, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 14229504. Throughput: 0: 2547.3. Samples: 3556188. Policy #0 lag: (min: 0.0, avg: 2.5, max: 5.0) +[2025-02-27 21:19:47,055][00031] Avg episode reward: [(0, '24.927')] +[2025-02-27 21:19:49,116][00216] Updated weights for policy 0, policy_version 1740 (0.0015) +[2025-02-27 21:19:52,055][00031] Fps is (10 sec: 10647.3, 60 sec: 10239.6, 300 sec: 10302.4). Total num frames: 14286848. Throughput: 0: 2525.4. Samples: 3570714. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:19:52,058][00031] Avg episode reward: [(0, '27.226')] +[2025-02-27 21:19:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10274.7). Total num frames: 14327808. Throughput: 0: 2509.3. Samples: 3585750. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:19:57,056][00031] Avg episode reward: [(0, '25.549')] +[2025-02-27 21:19:57,508][00216] Updated weights for policy 0, policy_version 1750 (0.0016) +[2025-02-27 21:20:02,053][00031] Fps is (10 sec: 9013.1, 60 sec: 10103.5, 300 sec: 10274.7). Total num frames: 14376960. Throughput: 0: 2537.6. Samples: 3593580. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:20:02,055][00031] Avg episode reward: [(0, '25.393')] +[2025-02-27 21:20:05,429][00216] Updated weights for policy 0, policy_version 1760 (0.0016) +[2025-02-27 21:20:07,053][00031] Fps is (10 sec: 10649.3, 60 sec: 10103.4, 300 sec: 10274.7). Total num frames: 14434304. Throughput: 0: 2529.2. Samples: 3609276. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:20:07,055][00031] Avg episode reward: [(0, '26.914')] +[2025-02-27 21:20:12,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10274.7). Total num frames: 14483456. Throughput: 0: 2556.1. Samples: 3625104. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:20:12,055][00031] Avg episode reward: [(0, '27.948')] +[2025-02-27 21:20:13,047][00216] Updated weights for policy 0, policy_version 1770 (0.0016) +[2025-02-27 21:20:17,053][00031] Fps is (10 sec: 9830.6, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 14532608. Throughput: 0: 2547.5. Samples: 3632586. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:20:17,055][00031] Avg episode reward: [(0, '28.478')] +[2025-02-27 21:20:21,265][00216] Updated weights for policy 0, policy_version 1780 (0.0017) +[2025-02-27 21:20:22,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10240.1, 300 sec: 10302.5). Total num frames: 14589952. Throughput: 0: 2522.7. Samples: 3646776. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:20:22,057][00031] Avg episode reward: [(0, '26.251')] +[2025-02-27 21:20:27,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10103.4, 300 sec: 10274.7). Total num frames: 14630912. Throughput: 0: 2520.4. Samples: 3662166. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:20:27,056][00031] Avg episode reward: [(0, '25.395')] +[2025-02-27 21:20:29,690][00216] Updated weights for policy 0, policy_version 1790 (0.0016) +[2025-02-27 21:20:32,053][00031] Fps is (10 sec: 9011.3, 60 sec: 9966.9, 300 sec: 10274.7). Total num frames: 14680064. Throughput: 0: 2530.0. Samples: 3670038. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:20:32,054][00031] Avg episode reward: [(0, '25.457')] +[2025-02-27 21:20:37,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10103.5, 300 sec: 10274.7). Total num frames: 14737408. Throughput: 0: 2550.1. Samples: 3685464. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:20:37,056][00031] Avg episode reward: [(0, '25.046')] +[2025-02-27 21:20:37,556][00216] Updated weights for policy 0, policy_version 1800 (0.0017) +[2025-02-27 21:20:42,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 14794752. Throughput: 0: 2563.7. Samples: 3701118. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:20:42,055][00031] Avg episode reward: [(0, '25.554')] +[2025-02-27 21:20:45,144][00216] Updated weights for policy 0, policy_version 1810 (0.0019) +[2025-02-27 21:20:47,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 14843904. Throughput: 0: 2558.9. Samples: 3708732. Policy #0 lag: (min: 0.0, avg: 2.4, max: 4.0) +[2025-02-27 21:20:47,056][00031] Avg episode reward: [(0, '26.044')] +[2025-02-27 21:20:52,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.8, 300 sec: 10302.5). Total num frames: 14893056. Throughput: 0: 2556.3. Samples: 3724308. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:20:52,056][00031] Avg episode reward: [(0, '27.076')] +[2025-02-27 21:20:53,787][00216] Updated weights for policy 0, policy_version 1820 (0.0018) +[2025-02-27 21:20:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 14942208. Throughput: 0: 2519.6. Samples: 3738486. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:20:57,055][00031] Avg episode reward: [(0, '26.785')] +[2025-02-27 21:21:01,540][00216] Updated weights for policy 0, policy_version 1830 (0.0015) +[2025-02-27 21:21:02,053][00031] Fps is (10 sec: 9830.1, 60 sec: 10239.9, 300 sec: 10274.7). Total num frames: 14991360. Throughput: 0: 2527.9. Samples: 3746340. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:21:02,056][00031] Avg episode reward: [(0, '28.650')] +[2025-02-27 21:21:07,054][00031] Fps is (10 sec: 9829.1, 60 sec: 10103.3, 300 sec: 10246.9). Total num frames: 15040512. Throughput: 0: 2552.5. Samples: 3761640. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:21:07,058][00031] Avg episode reward: [(0, '28.532')] +[2025-02-27 21:21:07,068][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001836_15040512.pth... +[2025-02-27 21:21:07,210][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001536_12582912.pth +[2025-02-27 21:21:09,824][00216] Updated weights for policy 0, policy_version 1840 (0.0019) +[2025-02-27 21:21:12,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 15097856. Throughput: 0: 2553.2. Samples: 3777060. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:21:12,055][00031] Avg episode reward: [(0, '27.549')] +[2025-02-27 21:21:17,053][00031] Fps is (10 sec: 10650.6, 60 sec: 10239.9, 300 sec: 10274.7). Total num frames: 15147008. Throughput: 0: 2545.8. Samples: 3784602. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:21:17,056][00031] Avg episode reward: [(0, '26.325')] +[2025-02-27 21:21:17,369][00216] Updated weights for policy 0, policy_version 1850 (0.0015) +[2025-02-27 21:21:22,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10103.5, 300 sec: 10274.7). Total num frames: 15196160. Throughput: 0: 2552.0. Samples: 3800304. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:21:22,055][00031] Avg episode reward: [(0, '25.863')] +[2025-02-27 21:21:26,174][00216] Updated weights for policy 0, policy_version 1860 (0.0019) +[2025-02-27 21:21:27,053][00031] Fps is (10 sec: 9830.8, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 15245312. Throughput: 0: 2510.4. Samples: 3814086. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:21:27,054][00031] Avg episode reward: [(0, '26.497')] +[2025-02-27 21:21:32,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10247.0). Total num frames: 15294464. Throughput: 0: 2515.7. Samples: 3821940. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:21:32,058][00031] Avg episode reward: [(0, '28.600')] +[2025-02-27 21:21:34,202][00216] Updated weights for policy 0, policy_version 1870 (0.0016) +[2025-02-27 21:21:37,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10103.5, 300 sec: 10246.9). Total num frames: 15343616. Throughput: 0: 2511.5. Samples: 3837324. Policy #0 lag: (min: 0.0, avg: 2.6, max: 6.0) +[2025-02-27 21:21:37,056][00031] Avg episode reward: [(0, '28.768')] +[2025-02-27 21:21:42,053][00031] Fps is (10 sec: 9830.5, 60 sec: 9966.9, 300 sec: 10246.9). Total num frames: 15392768. Throughput: 0: 2545.5. Samples: 3853032. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:21:42,055][00031] Avg episode reward: [(0, '26.891')] +[2025-02-27 21:21:42,073][00216] Updated weights for policy 0, policy_version 1880 (0.0016) +[2025-02-27 21:21:47,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10103.5, 300 sec: 10246.9). Total num frames: 15450112. Throughput: 0: 2542.5. Samples: 3860754. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:21:47,055][00031] Avg episode reward: [(0, '26.039')] +[2025-02-27 21:21:49,536][00216] Updated weights for policy 0, policy_version 1890 (0.0017) +[2025-02-27 21:21:52,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10240.0, 300 sec: 10247.0). Total num frames: 15507456. Throughput: 0: 2551.3. Samples: 3876444. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:21:52,057][00031] Avg episode reward: [(0, '25.776')] +[2025-02-27 21:21:57,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 15556608. Throughput: 0: 2557.9. Samples: 3892164. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:21:57,056][00031] Avg episode reward: [(0, '25.948')] +[2025-02-27 21:21:57,684][00216] Updated weights for policy 0, policy_version 1900 (0.0015) +[2025-02-27 21:22:02,053][00031] Fps is (10 sec: 9011.2, 60 sec: 10103.5, 300 sec: 10219.2). Total num frames: 15597568. Throughput: 0: 2536.6. Samples: 3898746. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:22:02,054][00031] Avg episode reward: [(0, '25.656')] +[2025-02-27 21:22:05,799][00216] Updated weights for policy 0, policy_version 1910 (0.0015) +[2025-02-27 21:22:07,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.2, 300 sec: 10247.0). Total num frames: 15654912. Throughput: 0: 2539.2. Samples: 3914568. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:22:07,055][00031] Avg episode reward: [(0, '25.714')] +[2025-02-27 21:22:12,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 15712256. Throughput: 0: 2588.1. Samples: 3930552. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:22:12,055][00031] Avg episode reward: [(0, '25.632')] +[2025-02-27 21:22:13,824][00216] Updated weights for policy 0, policy_version 1920 (0.0020) +[2025-02-27 21:22:17,054][00031] Fps is (10 sec: 10648.1, 60 sec: 10239.8, 300 sec: 10246.9). Total num frames: 15761408. Throughput: 0: 2585.9. Samples: 3938310. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:22:17,056][00031] Avg episode reward: [(0, '26.285')] +[2025-02-27 21:22:21,443][00216] Updated weights for policy 0, policy_version 1930 (0.0016) +[2025-02-27 21:22:22,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10246.9). Total num frames: 15810560. Throughput: 0: 2597.9. Samples: 3954228. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:22:22,055][00031] Avg episode reward: [(0, '25.491')] +[2025-02-27 21:22:27,053][00031] Fps is (10 sec: 10650.8, 60 sec: 10376.5, 300 sec: 10247.0). Total num frames: 15867904. Throughput: 0: 2598.1. Samples: 3969948. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:22:27,055][00031] Avg episode reward: [(0, '26.536')] +[2025-02-27 21:22:29,167][00216] Updated weights for policy 0, policy_version 1940 (0.0016) +[2025-02-27 21:22:32,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 15917056. Throughput: 0: 2604.8. Samples: 3977970. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:22:32,055][00031] Avg episode reward: [(0, '26.414')] +[2025-02-27 21:22:37,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 15966208. Throughput: 0: 2574.1. Samples: 3992280. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:22:37,056][00031] Avg episode reward: [(0, '28.802')] +[2025-02-27 21:22:37,336][00216] Updated weights for policy 0, policy_version 1950 (0.0015) +[2025-02-27 21:22:42,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 16015360. Throughput: 0: 2580.1. Samples: 4008270. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:22:42,055][00031] Avg episode reward: [(0, '27.700')] +[2025-02-27 21:22:45,298][00216] Updated weights for policy 0, policy_version 1960 (0.0016) +[2025-02-27 21:22:47,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 16072704. Throughput: 0: 2606.8. Samples: 4016052. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:22:47,055][00031] Avg episode reward: [(0, '26.490')] +[2025-02-27 21:22:52,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10240.0, 300 sec: 10247.0). Total num frames: 16121856. Throughput: 0: 2607.3. Samples: 4031898. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:22:52,056][00031] Avg episode reward: [(0, '27.007')] +[2025-02-27 21:22:52,722][00216] Updated weights for policy 0, policy_version 1970 (0.0016) +[2025-02-27 21:22:57,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 16179200. Throughput: 0: 2601.9. Samples: 4047636. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:22:57,055][00031] Avg episode reward: [(0, '28.583')] +[2025-02-27 21:23:00,808][00216] Updated weights for policy 0, policy_version 1980 (0.0016) +[2025-02-27 21:23:02,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10246.9). Total num frames: 16228352. Throughput: 0: 2607.3. Samples: 4055634. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:23:02,055][00031] Avg episode reward: [(0, '27.386')] +[2025-02-27 21:23:07,053][00031] Fps is (10 sec: 9830.2, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 16277504. Throughput: 0: 2572.8. Samples: 4070004. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:23:07,055][00031] Avg episode reward: [(0, '26.161')] +[2025-02-27 21:23:07,064][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001987_16277504.pth... +[2025-02-27 21:23:07,209][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001688_13828096.pth +[2025-02-27 21:23:09,174][00216] Updated weights for policy 0, policy_version 1990 (0.0015) +[2025-02-27 21:23:12,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10240.0, 300 sec: 10247.0). Total num frames: 16326656. Throughput: 0: 2575.2. Samples: 4085832. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:23:12,054][00031] Avg episode reward: [(0, '27.249')] +[2025-02-27 21:23:16,878][00216] Updated weights for policy 0, policy_version 2000 (0.0018) +[2025-02-27 21:23:17,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10376.8, 300 sec: 10246.9). Total num frames: 16384000. Throughput: 0: 2568.9. Samples: 4093572. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:23:17,055][00031] Avg episode reward: [(0, '26.017')] +[2025-02-27 21:23:22,053][00031] Fps is (10 sec: 10649.2, 60 sec: 10376.4, 300 sec: 10246.9). Total num frames: 16433152. Throughput: 0: 2605.6. Samples: 4109532. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:23:22,056][00031] Avg episode reward: [(0, '26.210')] +[2025-02-27 21:23:24,472][00216] Updated weights for policy 0, policy_version 2010 (0.0016) +[2025-02-27 21:23:27,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.6, 300 sec: 10274.7). Total num frames: 16490496. Throughput: 0: 2601.3. Samples: 4125330. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:23:27,057][00031] Avg episode reward: [(0, '25.634')] +[2025-02-27 21:23:32,053][00031] Fps is (10 sec: 10650.1, 60 sec: 10376.5, 300 sec: 10247.0). Total num frames: 16539648. Throughput: 0: 2605.5. Samples: 4133298. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:23:32,056][00031] Avg episode reward: [(0, '23.438')] +[2025-02-27 21:23:32,353][00216] Updated weights for policy 0, policy_version 2020 (0.0017) +[2025-02-27 21:23:37,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.6, 300 sec: 10246.9). Total num frames: 16588800. Throughput: 0: 2604.4. Samples: 4149096. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:23:37,055][00031] Avg episode reward: [(0, '23.753')] +[2025-02-27 21:23:40,658][00216] Updated weights for policy 0, policy_version 2030 (0.0017) +[2025-02-27 21:23:42,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.6, 300 sec: 10246.9). Total num frames: 16637952. Throughput: 0: 2576.3. Samples: 4163568. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:23:42,054][00031] Avg episode reward: [(0, '25.457')] +[2025-02-27 21:23:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10219.2). Total num frames: 16687104. Throughput: 0: 2569.1. Samples: 4171242. Policy #0 lag: (min: 0.0, avg: 2.1, max: 6.0) +[2025-02-27 21:23:47,055][00031] Avg episode reward: [(0, '24.943')] +[2025-02-27 21:23:48,342][00216] Updated weights for policy 0, policy_version 2040 (0.0016) +[2025-02-27 21:23:52,053][00031] Fps is (10 sec: 10649.3, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 16744448. Throughput: 0: 2604.3. Samples: 4187196. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:23:52,056][00031] Avg episode reward: [(0, '27.153')] +[2025-02-27 21:23:56,404][00216] Updated weights for policy 0, policy_version 2050 (0.0015) +[2025-02-27 21:23:57,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 16801792. Throughput: 0: 2600.8. Samples: 4202868. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:23:57,054][00031] Avg episode reward: [(0, '27.789')] +[2025-02-27 21:24:02,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 16850944. Throughput: 0: 2605.5. Samples: 4210818. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:24:02,055][00031] Avg episode reward: [(0, '28.860')] +[2025-02-27 21:24:04,035][00216] Updated weights for policy 0, policy_version 2060 (0.0017) +[2025-02-27 21:24:07,053][00031] Fps is (10 sec: 9830.1, 60 sec: 10376.5, 300 sec: 10246.9). Total num frames: 16900096. Throughput: 0: 2601.2. Samples: 4226586. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:24:07,058][00031] Avg episode reward: [(0, '28.155')] +[2025-02-27 21:24:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 16949248. Throughput: 0: 2575.1. Samples: 4241208. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:24:12,054][00031] Avg episode reward: [(0, '26.886')] +[2025-02-27 21:24:12,308][00216] Updated weights for policy 0, policy_version 2070 (0.0019) +[2025-02-27 21:24:17,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 17006592. Throughput: 0: 2571.3. Samples: 4249008. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:24:17,055][00031] Avg episode reward: [(0, '25.757')] +[2025-02-27 21:24:20,201][00216] Updated weights for policy 0, policy_version 2080 (0.0017) +[2025-02-27 21:24:22,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.6, 300 sec: 10274.7). Total num frames: 17055744. Throughput: 0: 2572.9. Samples: 4264878. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:24:22,055][00031] Avg episode reward: [(0, '26.484')] +[2025-02-27 21:24:27,060][00031] Fps is (10 sec: 10642.7, 60 sec: 10375.4, 300 sec: 10274.5). Total num frames: 17113088. Throughput: 0: 2599.1. Samples: 4280544. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:24:27,065][00031] Avg episode reward: [(0, '27.824')] +[2025-02-27 21:24:27,966][00216] Updated weights for policy 0, policy_version 2090 (0.0015) +[2025-02-27 21:24:32,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10376.5, 300 sec: 10274.7). Total num frames: 17162240. Throughput: 0: 2605.1. Samples: 4288470. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:24:32,054][00031] Avg episode reward: [(0, '29.651')] +[2025-02-27 21:24:32,056][00196] Saving new best policy, reward=29.651! +[2025-02-27 21:24:35,508][00216] Updated weights for policy 0, policy_version 2100 (0.0016) +[2025-02-27 21:24:37,053][00031] Fps is (10 sec: 9837.0, 60 sec: 10376.6, 300 sec: 10274.7). Total num frames: 17211392. Throughput: 0: 2597.5. Samples: 4304082. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:24:37,054][00031] Avg episode reward: [(0, '27.538')] +[2025-02-27 21:24:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10302.5). Total num frames: 17268736. Throughput: 0: 2603.6. Samples: 4320030. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:24:42,056][00031] Avg episode reward: [(0, '27.283')] +[2025-02-27 21:24:44,301][00216] Updated weights for policy 0, policy_version 2110 (0.0017) +[2025-02-27 21:24:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10247.0). Total num frames: 17309696. Throughput: 0: 2568.7. Samples: 4326408. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:24:47,054][00031] Avg episode reward: [(0, '27.324')] +[2025-02-27 21:24:51,857][00216] Updated weights for policy 0, policy_version 2120 (0.0018) +[2025-02-27 21:24:52,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.6, 300 sec: 10302.5). Total num frames: 17367040. Throughput: 0: 2571.5. Samples: 4342302. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:24:52,056][00031] Avg episode reward: [(0, '27.420')] +[2025-02-27 21:24:57,053][00031] Fps is (10 sec: 11468.7, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 17424384. Throughput: 0: 2595.9. Samples: 4358022. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:24:57,057][00031] Avg episode reward: [(0, '28.275')] +[2025-02-27 21:24:59,490][00216] Updated weights for policy 0, policy_version 2130 (0.0016) +[2025-02-27 21:25:02,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 17473536. Throughput: 0: 2599.3. Samples: 4365978. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:25:02,054][00031] Avg episode reward: [(0, '29.847')] +[2025-02-27 21:25:02,056][00196] Saving new best policy, reward=29.847! +[2025-02-27 21:25:07,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10376.6, 300 sec: 10302.5). Total num frames: 17522688. Throughput: 0: 2592.9. Samples: 4381560. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:25:07,056][00031] Avg episode reward: [(0, '26.242')] +[2025-02-27 21:25:07,063][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002139_17522688.pth... +[2025-02-27 21:25:07,188][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001836_15040512.pth +[2025-02-27 21:25:07,548][00216] Updated weights for policy 0, policy_version 2140 (0.0016) +[2025-02-27 21:25:12,055][00031] Fps is (10 sec: 9828.6, 60 sec: 10376.2, 300 sec: 10302.4). Total num frames: 17571840. Throughput: 0: 2597.9. Samples: 4397436. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:25:12,056][00031] Avg episode reward: [(0, '28.358')] +[2025-02-27 21:25:15,554][00216] Updated weights for policy 0, policy_version 2150 (0.0016) +[2025-02-27 21:25:17,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 17620992. Throughput: 0: 2594.3. Samples: 4405212. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:25:17,054][00031] Avg episode reward: [(0, '28.148')] +[2025-02-27 21:25:22,053][00031] Fps is (10 sec: 10651.5, 60 sec: 10376.6, 300 sec: 10330.3). Total num frames: 17678336. Throughput: 0: 2567.5. Samples: 4419618. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:25:22,056][00031] Avg episode reward: [(0, '25.778')] +[2025-02-27 21:25:23,567][00216] Updated weights for policy 0, policy_version 2160 (0.0015) +[2025-02-27 21:25:27,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10241.1, 300 sec: 10330.3). Total num frames: 17727488. Throughput: 0: 2561.1. Samples: 4435278. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:25:27,055][00031] Avg episode reward: [(0, '27.456')] +[2025-02-27 21:25:31,278][00216] Updated weights for policy 0, policy_version 2170 (0.0016) +[2025-02-27 21:25:32,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 17784832. Throughput: 0: 2596.9. Samples: 4443270. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:25:32,054][00031] Avg episode reward: [(0, '28.147')] +[2025-02-27 21:25:37,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 17833984. Throughput: 0: 2588.4. Samples: 4458780. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:25:37,054][00031] Avg episode reward: [(0, '30.713')] +[2025-02-27 21:25:37,062][00196] Saving new best policy, reward=30.713! +[2025-02-27 21:25:39,104][00216] Updated weights for policy 0, policy_version 2180 (0.0017) +[2025-02-27 21:25:42,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 17883136. Throughput: 0: 2589.1. Samples: 4474530. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:25:42,054][00031] Avg episode reward: [(0, '29.656')] +[2025-02-27 21:25:47,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 17932288. Throughput: 0: 2584.0. Samples: 4482258. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:25:47,054][00031] Avg episode reward: [(0, '26.522')] +[2025-02-27 21:25:47,331][00216] Updated weights for policy 0, policy_version 2190 (0.0015) +[2025-02-27 21:25:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 17989632. Throughput: 0: 2558.5. Samples: 4496694. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:25:52,055][00031] Avg episode reward: [(0, '27.745')] +[2025-02-27 21:25:55,400][00216] Updated weights for policy 0, policy_version 2200 (0.0017) +[2025-02-27 21:25:57,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 18038784. Throughput: 0: 2553.6. Samples: 4512342. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:25:57,056][00031] Avg episode reward: [(0, '29.472')] +[2025-02-27 21:26:02,053][00031] Fps is (10 sec: 9830.1, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 18087936. Throughput: 0: 2557.1. Samples: 4520280. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:26:02,055][00031] Avg episode reward: [(0, '28.129')] +[2025-02-27 21:26:02,955][00216] Updated weights for policy 0, policy_version 2210 (0.0018) +[2025-02-27 21:26:07,053][00031] Fps is (10 sec: 9830.6, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 18137088. Throughput: 0: 2585.3. Samples: 4535958. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:26:07,054][00031] Avg episode reward: [(0, '27.983')] +[2025-02-27 21:26:11,247][00216] Updated weights for policy 0, policy_version 2220 (0.0016) +[2025-02-27 21:26:12,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.8, 300 sec: 10330.3). Total num frames: 18194432. Throughput: 0: 2588.0. Samples: 4551738. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:26:12,055][00031] Avg episode reward: [(0, '28.724')] +[2025-02-27 21:26:17,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 18243584. Throughput: 0: 2582.5. Samples: 4559484. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:26:17,056][00031] Avg episode reward: [(0, '29.550')] +[2025-02-27 21:26:19,174][00216] Updated weights for policy 0, policy_version 2230 (0.0019) +[2025-02-27 21:26:22,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 18292736. Throughput: 0: 2591.3. Samples: 4575390. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:26:22,056][00031] Avg episode reward: [(0, '30.997')] +[2025-02-27 21:26:22,060][00196] Saving new best policy, reward=30.997! +[2025-02-27 21:26:27,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 18341888. Throughput: 0: 2556.3. Samples: 4589562. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:26:27,056][00031] Avg episode reward: [(0, '30.443')] +[2025-02-27 21:26:27,484][00216] Updated weights for policy 0, policy_version 2240 (0.0015) +[2025-02-27 21:26:32,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 18399232. Throughput: 0: 2562.4. Samples: 4597566. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:26:32,054][00031] Avg episode reward: [(0, '28.720')] +[2025-02-27 21:26:34,756][00216] Updated weights for policy 0, policy_version 2250 (0.0015) +[2025-02-27 21:26:37,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 18448384. Throughput: 0: 2592.0. Samples: 4613334. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:26:37,055][00031] Avg episode reward: [(0, '27.124')] +[2025-02-27 21:26:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 18505728. Throughput: 0: 2598.4. Samples: 4629270. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:26:42,055][00031] Avg episode reward: [(0, '27.503')] +[2025-02-27 21:26:42,743][00216] Updated weights for policy 0, policy_version 2260 (0.0016) +[2025-02-27 21:26:47,053][00031] Fps is (10 sec: 10649.1, 60 sec: 10376.5, 300 sec: 10330.2). Total num frames: 18554880. Throughput: 0: 2594.0. Samples: 4637010. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:26:47,056][00031] Avg episode reward: [(0, '26.882')] +[2025-02-27 21:26:50,650][00216] Updated weights for policy 0, policy_version 2270 (0.0018) +[2025-02-27 21:26:52,053][00031] Fps is (10 sec: 10649.2, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 18612224. Throughput: 0: 2599.7. Samples: 4652946. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:26:52,057][00031] Avg episode reward: [(0, '27.349')] +[2025-02-27 21:26:57,053][00031] Fps is (10 sec: 9830.9, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 18653184. Throughput: 0: 2564.9. Samples: 4667160. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:26:57,054][00031] Avg episode reward: [(0, '27.495')] +[2025-02-27 21:26:58,783][00216] Updated weights for policy 0, policy_version 2280 (0.0016) +[2025-02-27 21:27:02,053][00031] Fps is (10 sec: 9830.7, 60 sec: 10376.6, 300 sec: 10358.0). Total num frames: 18710528. Throughput: 0: 2570.0. Samples: 4675134. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:27:02,055][00031] Avg episode reward: [(0, '26.570')] +[2025-02-27 21:27:06,564][00216] Updated weights for policy 0, policy_version 2290 (0.0016) +[2025-02-27 21:27:07,054][00031] Fps is (10 sec: 10648.6, 60 sec: 10376.4, 300 sec: 10330.2). Total num frames: 18759680. Throughput: 0: 2564.1. Samples: 4690776. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:27:07,056][00031] Avg episode reward: [(0, '27.186')] +[2025-02-27 21:27:07,064][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002290_18759680.pth... +[2025-02-27 21:27:07,183][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000001987_16277504.pth +[2025-02-27 21:27:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 18808832. Throughput: 0: 2600.5. Samples: 4706586. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:27:12,055][00031] Avg episode reward: [(0, '28.992')] +[2025-02-27 21:27:14,487][00216] Updated weights for policy 0, policy_version 2300 (0.0017) +[2025-02-27 21:27:17,053][00031] Fps is (10 sec: 10650.6, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 18866176. Throughput: 0: 2592.7. Samples: 4714236. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:27:17,055][00031] Avg episode reward: [(0, '29.004')] +[2025-02-27 21:27:22,053][00031] Fps is (10 sec: 10649.1, 60 sec: 10376.5, 300 sec: 10330.2). Total num frames: 18915328. Throughput: 0: 2597.4. Samples: 4730220. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:27:22,056][00031] Avg episode reward: [(0, '26.304')] +[2025-02-27 21:27:22,396][00216] Updated weights for policy 0, policy_version 2310 (0.0015) +[2025-02-27 21:27:27,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 18964480. Throughput: 0: 2590.3. Samples: 4745832. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:27:27,056][00031] Avg episode reward: [(0, '26.676')] +[2025-02-27 21:27:30,858][00216] Updated weights for policy 0, policy_version 2320 (0.0016) +[2025-02-27 21:27:32,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 19021824. Throughput: 0: 2562.0. Samples: 4752300. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:27:32,055][00031] Avg episode reward: [(0, '28.401')] +[2025-02-27 21:27:37,053][00031] Fps is (10 sec: 10649.3, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 19070976. Throughput: 0: 2555.5. Samples: 4767942. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:27:37,055][00031] Avg episode reward: [(0, '28.608')] +[2025-02-27 21:27:38,875][00216] Updated weights for policy 0, policy_version 2330 (0.0016) +[2025-02-27 21:27:42,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 19120128. Throughput: 0: 2586.8. Samples: 4783566. Policy #0 lag: (min: 0.0, avg: 2.3, max: 6.0) +[2025-02-27 21:27:42,055][00031] Avg episode reward: [(0, '29.512')] +[2025-02-27 21:27:46,460][00216] Updated weights for policy 0, policy_version 2340 (0.0026) +[2025-02-27 21:27:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10330.2). Total num frames: 19169280. Throughput: 0: 2578.8. Samples: 4791180. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:27:47,056][00031] Avg episode reward: [(0, '29.658')] +[2025-02-27 21:27:52,053][00031] Fps is (10 sec: 10649.1, 60 sec: 10240.0, 300 sec: 10330.2). Total num frames: 19226624. Throughput: 0: 2580.7. Samples: 4806906. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:27:52,056][00031] Avg episode reward: [(0, '29.791')] +[2025-02-27 21:27:54,202][00216] Updated weights for policy 0, policy_version 2350 (0.0020) +[2025-02-27 21:27:57,053][00031] Fps is (10 sec: 10649.9, 60 sec: 10376.5, 300 sec: 10330.2). Total num frames: 19275776. Throughput: 0: 2575.5. Samples: 4822482. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:27:57,055][00031] Avg episode reward: [(0, '30.404')] +[2025-02-27 21:28:02,053][00031] Fps is (10 sec: 9830.8, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 19324928. Throughput: 0: 2579.9. Samples: 4830330. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:28:02,057][00031] Avg episode reward: [(0, '28.869')] +[2025-02-27 21:28:02,578][00216] Updated weights for policy 0, policy_version 2360 (0.0017) +[2025-02-27 21:28:07,053][00031] Fps is (10 sec: 9830.0, 60 sec: 10240.1, 300 sec: 10330.2). Total num frames: 19374080. Throughput: 0: 2539.6. Samples: 4844502. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:28:07,055][00031] Avg episode reward: [(0, '26.420')] +[2025-02-27 21:28:10,580][00216] Updated weights for policy 0, policy_version 2370 (0.0016) +[2025-02-27 21:28:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 19423232. Throughput: 0: 2542.9. Samples: 4860264. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:28:12,055][00031] Avg episode reward: [(0, '25.662')] +[2025-02-27 21:28:17,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10240.0, 300 sec: 10330.3). Total num frames: 19480576. Throughput: 0: 2569.5. Samples: 4867926. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:28:17,055][00031] Avg episode reward: [(0, '26.066')] +[2025-02-27 21:28:18,467][00216] Updated weights for policy 0, policy_version 2380 (0.0017) +[2025-02-27 21:28:22,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.1, 300 sec: 10302.5). Total num frames: 19529728. Throughput: 0: 2573.4. Samples: 4883742. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:28:22,055][00031] Avg episode reward: [(0, '24.953')] +[2025-02-27 21:28:26,313][00216] Updated weights for policy 0, policy_version 2390 (0.0016) +[2025-02-27 21:28:27,054][00031] Fps is (10 sec: 10648.3, 60 sec: 10376.3, 300 sec: 10330.2). Total num frames: 19587072. Throughput: 0: 2572.9. Samples: 4899348. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:28:27,057][00031] Avg episode reward: [(0, '26.479')] +[2025-02-27 21:28:32,053][00031] Fps is (10 sec: 10649.4, 60 sec: 10240.0, 300 sec: 10330.2). Total num frames: 19636224. Throughput: 0: 2579.5. Samples: 4907256. Policy #0 lag: (min: 0.0, avg: 1.5, max: 5.0) +[2025-02-27 21:28:32,055][00031] Avg episode reward: [(0, '27.166')] +[2025-02-27 21:28:34,778][00216] Updated weights for policy 0, policy_version 2400 (0.0015) +[2025-02-27 21:28:37,053][00031] Fps is (10 sec: 9831.6, 60 sec: 10240.1, 300 sec: 10330.2). Total num frames: 19685376. Throughput: 0: 2550.2. Samples: 4921662. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) +[2025-02-27 21:28:37,054][00031] Avg episode reward: [(0, '27.538')] +[2025-02-27 21:28:42,025][00216] Updated weights for policy 0, policy_version 2410 (0.0016) +[2025-02-27 21:28:42,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 19742720. Throughput: 0: 2552.0. Samples: 4937322. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:28:42,056][00031] Avg episode reward: [(0, '28.911')] +[2025-02-27 21:28:47,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.1, 300 sec: 10302.5). Total num frames: 19783680. Throughput: 0: 2549.3. Samples: 4945050. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) +[2025-02-27 21:28:47,056][00031] Avg episode reward: [(0, '26.996')] +[2025-02-27 21:28:50,354][00216] Updated weights for policy 0, policy_version 2420 (0.0015) +[2025-02-27 21:28:52,053][00031] Fps is (10 sec: 9830.1, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 19841024. Throughput: 0: 2586.8. Samples: 4960908. Policy #0 lag: (min: 0.0, avg: 2.5, max: 4.0) +[2025-02-27 21:28:52,055][00031] Avg episode reward: [(0, '25.897')] +[2025-02-27 21:28:57,054][00031] Fps is (10 sec: 11467.9, 60 sec: 10376.4, 300 sec: 10330.2). Total num frames: 19898368. Throughput: 0: 2584.8. Samples: 4976580. Policy #0 lag: (min: 0.0, avg: 2.4, max: 4.0) +[2025-02-27 21:28:57,058][00031] Avg episode reward: [(0, '27.367')] +[2025-02-27 21:28:58,089][00216] Updated weights for policy 0, policy_version 2430 (0.0016) +[2025-02-27 21:29:02,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 19947520. Throughput: 0: 2590.1. Samples: 4984482. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) +[2025-02-27 21:29:02,057][00031] Avg episode reward: [(0, '26.486')] +[2025-02-27 21:29:05,532][00216] Updated weights for policy 0, policy_version 2440 (0.0016) +[2025-02-27 21:29:07,053][00031] Fps is (10 sec: 9830.7, 60 sec: 10376.5, 300 sec: 10330.2). Total num frames: 19996672. Throughput: 0: 2588.4. Samples: 5000220. Policy #0 lag: (min: 0.0, avg: 2.4, max: 4.0) +[2025-02-27 21:29:07,056][00031] Avg episode reward: [(0, '27.541')] +[2025-02-27 21:29:07,065][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002441_19996672.pth... +[2025-02-27 21:29:07,212][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002139_17522688.pth +[2025-02-27 21:29:12,053][00031] Fps is (10 sec: 9830.6, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 20045824. Throughput: 0: 2559.7. Samples: 5014530. Policy #0 lag: (min: 0.0, avg: 1.4, max: 5.0) +[2025-02-27 21:29:12,054][00031] Avg episode reward: [(0, '26.707')] +[2025-02-27 21:29:14,319][00216] Updated weights for policy 0, policy_version 2450 (0.0015) +[2025-02-27 21:29:17,053][00031] Fps is (10 sec: 9830.8, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 20094976. Throughput: 0: 2553.3. Samples: 5022156. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:29:17,054][00031] Avg episode reward: [(0, '29.297')] +[2025-02-27 21:29:22,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.0, 300 sec: 10274.9). Total num frames: 20144128. Throughput: 0: 2586.0. Samples: 5038032. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:29:22,054][00031] Avg episode reward: [(0, '28.352')] +[2025-02-27 21:29:22,263][00216] Updated weights for policy 0, policy_version 2460 (0.0015) +[2025-02-27 21:29:27,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10240.2, 300 sec: 10302.5). Total num frames: 20201472. Throughput: 0: 2580.7. Samples: 5053452. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:29:27,055][00031] Avg episode reward: [(0, '29.214')] +[2025-02-27 21:29:30,104][00216] Updated weights for policy 0, policy_version 2470 (0.0015) +[2025-02-27 21:29:32,053][00031] Fps is (10 sec: 11468.6, 60 sec: 10376.6, 300 sec: 10330.2). Total num frames: 20258816. Throughput: 0: 2584.1. Samples: 5061336. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:29:32,055][00031] Avg episode reward: [(0, '28.151')] +[2025-02-27 21:29:37,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 20307968. Throughput: 0: 2574.6. Samples: 5076762. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) +[2025-02-27 21:29:37,055][00031] Avg episode reward: [(0, '27.326')] +[2025-02-27 21:29:37,527][00216] Updated weights for policy 0, policy_version 2480 (0.0017) +[2025-02-27 21:29:42,053][00031] Fps is (10 sec: 9011.3, 60 sec: 10103.5, 300 sec: 10302.5). Total num frames: 20348928. Throughput: 0: 2552.7. Samples: 5091450. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) +[2025-02-27 21:29:42,054][00031] Avg episode reward: [(0, '29.511')] +[2025-02-27 21:29:46,244][00216] Updated weights for policy 0, policy_version 2490 (0.0015) +[2025-02-27 21:29:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 20406272. Throughput: 0: 2544.0. Samples: 5098962. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:29:47,054][00031] Avg episode reward: [(0, '28.538')] +[2025-02-27 21:29:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.1, 300 sec: 10274.7). Total num frames: 20455424. Throughput: 0: 2546.7. Samples: 5114820. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:29:52,058][00031] Avg episode reward: [(0, '26.165')] +[2025-02-27 21:29:54,311][00216] Updated weights for policy 0, policy_version 2500 (0.0016) +[2025-02-27 21:29:57,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10103.6, 300 sec: 10274.7). Total num frames: 20504576. Throughput: 0: 2574.8. Samples: 5130396. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:29:57,056][00031] Avg episode reward: [(0, '25.369')] +[2025-02-27 21:30:01,883][00216] Updated weights for policy 0, policy_version 2510 (0.0015) +[2025-02-27 21:30:02,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 20561920. Throughput: 0: 2582.0. Samples: 5138346. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:30:02,054][00031] Avg episode reward: [(0, '26.415')] +[2025-02-27 21:30:07,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10240.1, 300 sec: 10302.5). Total num frames: 20611072. Throughput: 0: 2575.9. Samples: 5153946. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:30:07,056][00031] Avg episode reward: [(0, '28.690')] +[2025-02-27 21:30:09,574][00216] Updated weights for policy 0, policy_version 2520 (0.0015) +[2025-02-27 21:30:12,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 20668416. Throughput: 0: 2588.5. Samples: 5169936. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:30:12,054][00031] Avg episode reward: [(0, '27.610')] +[2025-02-27 21:30:17,053][00031] Fps is (10 sec: 9829.9, 60 sec: 10239.9, 300 sec: 10274.7). Total num frames: 20709376. Throughput: 0: 2560.0. Samples: 5176536. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:30:17,055][00031] Avg episode reward: [(0, '29.799')] +[2025-02-27 21:30:17,980][00216] Updated weights for policy 0, policy_version 2530 (0.0015) +[2025-02-27 21:30:22,053][00031] Fps is (10 sec: 9829.9, 60 sec: 10376.4, 300 sec: 10302.5). Total num frames: 20766720. Throughput: 0: 2563.3. Samples: 5192112. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:30:22,056][00031] Avg episode reward: [(0, '27.913')] +[2025-02-27 21:30:25,864][00216] Updated weights for policy 0, policy_version 2540 (0.0016) +[2025-02-27 21:30:27,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10239.9, 300 sec: 10274.7). Total num frames: 20815872. Throughput: 0: 2585.6. Samples: 5207802. Policy #0 lag: (min: 0.0, avg: 2.0, max: 6.0) +[2025-02-27 21:30:27,055][00031] Avg episode reward: [(0, '28.111')] +[2025-02-27 21:30:32,053][00031] Fps is (10 sec: 10650.1, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 20873216. Throughput: 0: 2594.8. Samples: 5215728. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:30:32,055][00031] Avg episode reward: [(0, '29.800')] +[2025-02-27 21:30:33,436][00216] Updated weights for policy 0, policy_version 2550 (0.0021) +[2025-02-27 21:30:37,053][00031] Fps is (10 sec: 10650.1, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 20922368. Throughput: 0: 2590.8. Samples: 5231406. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:30:37,055][00031] Avg episode reward: [(0, '29.567')] +[2025-02-27 21:30:41,465][00216] Updated weights for policy 0, policy_version 2560 (0.0017) +[2025-02-27 21:30:42,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 20971520. Throughput: 0: 2597.7. Samples: 5247294. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:30:42,059][00031] Avg episode reward: [(0, '28.795')] +[2025-02-27 21:30:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 21020672. Throughput: 0: 2592.9. Samples: 5255028. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:30:47,055][00031] Avg episode reward: [(0, '27.303')] +[2025-02-27 21:30:49,310][00216] Updated weights for policy 0, policy_version 2570 (0.0016) +[2025-02-27 21:30:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 21078016. Throughput: 0: 2567.7. Samples: 5269494. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:30:52,054][00031] Avg episode reward: [(0, '27.020')] +[2025-02-27 21:30:57,055][00031] Fps is (10 sec: 10647.6, 60 sec: 10376.2, 300 sec: 10302.4). Total num frames: 21127168. Throughput: 0: 2563.0. Samples: 5285274. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:30:57,058][00031] Avg episode reward: [(0, '29.006')] +[2025-02-27 21:30:57,510][00216] Updated weights for policy 0, policy_version 2580 (0.0015) +[2025-02-27 21:31:02,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 21176320. Throughput: 0: 2592.4. Samples: 5293194. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:31:02,055][00031] Avg episode reward: [(0, '28.745')] +[2025-02-27 21:31:05,735][00216] Updated weights for policy 0, policy_version 2590 (0.0015) +[2025-02-27 21:31:07,053][00031] Fps is (10 sec: 10651.5, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 21233664. Throughput: 0: 2593.8. Samples: 5308830. Policy #0 lag: (min: 0.0, avg: 2.4, max: 6.0) +[2025-02-27 21:31:07,054][00031] Avg episode reward: [(0, '26.988')] +[2025-02-27 21:31:07,066][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002592_21233664.pth... +[2025-02-27 21:31:07,205][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002290_18759680.pth +[2025-02-27 21:31:12,053][00031] Fps is (10 sec: 10649.0, 60 sec: 10239.9, 300 sec: 10302.5). Total num frames: 21282816. Throughput: 0: 2595.7. Samples: 5324610. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:31:12,055][00031] Avg episode reward: [(0, '30.545')] +[2025-02-27 21:31:13,186][00216] Updated weights for policy 0, policy_version 2600 (0.0017) +[2025-02-27 21:31:17,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10513.1, 300 sec: 10330.2). Total num frames: 21340160. Throughput: 0: 2591.1. Samples: 5332326. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:31:17,054][00031] Avg episode reward: [(0, '30.568')] +[2025-02-27 21:31:21,708][00216] Updated weights for policy 0, policy_version 2610 (0.0017) +[2025-02-27 21:31:22,053][00031] Fps is (10 sec: 10650.1, 60 sec: 10376.6, 300 sec: 10330.3). Total num frames: 21389312. Throughput: 0: 2569.9. Samples: 5347050. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:31:22,057][00031] Avg episode reward: [(0, '28.804')] +[2025-02-27 21:31:27,053][00031] Fps is (10 sec: 9011.3, 60 sec: 10240.1, 300 sec: 10274.7). Total num frames: 21430272. Throughput: 0: 2557.9. Samples: 5362398. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:31:27,055][00031] Avg episode reward: [(0, '29.615')] +[2025-02-27 21:31:29,218][00216] Updated weights for policy 0, policy_version 2620 (0.0015) +[2025-02-27 21:31:32,053][00031] Fps is (10 sec: 9830.3, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 21487616. Throughput: 0: 2563.1. Samples: 5370366. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:31:32,055][00031] Avg episode reward: [(0, '30.786')] +[2025-02-27 21:31:37,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10240.0, 300 sec: 10274.7). Total num frames: 21536768. Throughput: 0: 2586.0. Samples: 5385864. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:31:37,056][00031] Avg episode reward: [(0, '30.989')] +[2025-02-27 21:31:37,122][00216] Updated weights for policy 0, policy_version 2630 (0.0016) +[2025-02-27 21:31:42,053][00031] Fps is (10 sec: 10649.8, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 21594112. Throughput: 0: 2589.0. Samples: 5401776. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:31:42,054][00031] Avg episode reward: [(0, '29.093')] +[2025-02-27 21:31:45,055][00216] Updated weights for policy 0, policy_version 2640 (0.0019) +[2025-02-27 21:31:47,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10513.1, 300 sec: 10302.5). Total num frames: 21651456. Throughput: 0: 2583.1. Samples: 5409432. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:31:47,056][00031] Avg episode reward: [(0, '28.603')] +[2025-02-27 21:31:52,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 21700608. Throughput: 0: 2587.2. Samples: 5425254. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:31:52,055][00031] Avg episode reward: [(0, '27.731')] +[2025-02-27 21:31:52,620][00216] Updated weights for policy 0, policy_version 2650 (0.0016) +[2025-02-27 21:31:57,053][00031] Fps is (10 sec: 9011.2, 60 sec: 10240.3, 300 sec: 10274.7). Total num frames: 21741568. Throughput: 0: 2553.1. Samples: 5439498. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:31:57,056][00031] Avg episode reward: [(0, '28.673')] +[2025-02-27 21:32:01,132][00216] Updated weights for policy 0, policy_version 2660 (0.0016) +[2025-02-27 21:32:02,055][00031] Fps is (10 sec: 9828.3, 60 sec: 10376.2, 300 sec: 10302.4). Total num frames: 21798912. Throughput: 0: 2559.2. Samples: 5447496. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:32:02,057][00031] Avg episode reward: [(0, '28.970')] +[2025-02-27 21:32:07,053][00031] Fps is (10 sec: 10649.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 21848064. Throughput: 0: 2582.5. Samples: 5463264. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:32:07,056][00031] Avg episode reward: [(0, '29.136')] +[2025-02-27 21:32:08,900][00216] Updated weights for policy 0, policy_version 2670 (0.0015) +[2025-02-27 21:32:12,053][00031] Fps is (10 sec: 10651.9, 60 sec: 10376.6, 300 sec: 10302.5). Total num frames: 21905408. Throughput: 0: 2595.1. Samples: 5479176. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:32:12,054][00031] Avg episode reward: [(0, '28.901')] +[2025-02-27 21:32:16,577][00216] Updated weights for policy 0, policy_version 2680 (0.0016) +[2025-02-27 21:32:17,053][00031] Fps is (10 sec: 10649.7, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 21954560. Throughput: 0: 2590.7. Samples: 5486946. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:32:17,057][00031] Avg episode reward: [(0, '29.503')] +[2025-02-27 21:32:22,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 22011904. Throughput: 0: 2599.7. Samples: 5502852. Policy #0 lag: (min: 0.0, avg: 2.3, max: 6.0) +[2025-02-27 21:32:22,054][00031] Avg episode reward: [(0, '28.803')] +[2025-02-27 21:32:24,100][00216] Updated weights for policy 0, policy_version 2690 (0.0016) +[2025-02-27 21:32:27,056][00031] Fps is (10 sec: 9827.4, 60 sec: 10376.0, 300 sec: 10274.6). Total num frames: 22052864. Throughput: 0: 2571.7. Samples: 5517510. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) +[2025-02-27 21:32:27,058][00031] Avg episode reward: [(0, '28.295')] +[2025-02-27 21:32:32,053][00031] Fps is (10 sec: 9830.2, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 22110208. Throughput: 0: 2570.3. Samples: 5525094. Policy #0 lag: (min: 0.0, avg: 2.4, max: 4.0) +[2025-02-27 21:32:32,057][00031] Avg episode reward: [(0, '29.357')] +[2025-02-27 21:32:33,059][00216] Updated weights for policy 0, policy_version 2700 (0.0019) +[2025-02-27 21:32:37,053][00031] Fps is (10 sec: 11472.4, 60 sec: 10513.1, 300 sec: 10330.3). Total num frames: 22167552. Throughput: 0: 2570.0. Samples: 5540904. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) +[2025-02-27 21:32:37,057][00031] Avg episode reward: [(0, '29.827')] +[2025-02-27 21:32:40,208][00216] Updated weights for policy 0, policy_version 2710 (0.0017) +[2025-02-27 21:32:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 22216704. Throughput: 0: 2607.6. Samples: 5556840. Policy #0 lag: (min: 0.0, avg: 2.5, max: 5.0) +[2025-02-27 21:32:42,055][00031] Avg episode reward: [(0, '31.850')] +[2025-02-27 21:32:42,058][00196] Saving new best policy, reward=31.850! +[2025-02-27 21:32:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 22265856. Throughput: 0: 2600.0. Samples: 5564490. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:32:47,054][00031] Avg episode reward: [(0, '29.808')] +[2025-02-27 21:32:48,240][00216] Updated weights for policy 0, policy_version 2720 (0.0015) +[2025-02-27 21:32:52,054][00031] Fps is (10 sec: 9829.3, 60 sec: 10239.8, 300 sec: 10302.4). Total num frames: 22315008. Throughput: 0: 2603.8. Samples: 5580438. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:32:52,057][00031] Avg episode reward: [(0, '29.669')] +[2025-02-27 21:32:55,789][00216] Updated weights for policy 0, policy_version 2730 (0.0016) +[2025-02-27 21:32:57,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10302.5). Total num frames: 22364160. Throughput: 0: 2604.5. Samples: 5596380. Policy #0 lag: (min: 0.0, avg: 1.8, max: 4.0) +[2025-02-27 21:32:57,055][00031] Avg episode reward: [(0, '29.837')] +[2025-02-27 21:33:02,053][00031] Fps is (10 sec: 10651.1, 60 sec: 10376.9, 300 sec: 10330.3). Total num frames: 22421504. Throughput: 0: 2587.3. Samples: 5603376. Policy #0 lag: (min: 0.0, avg: 1.7, max: 5.0) +[2025-02-27 21:33:02,056][00031] Avg episode reward: [(0, '28.385')] +[2025-02-27 21:33:04,485][00216] Updated weights for policy 0, policy_version 2740 (0.0018) +[2025-02-27 21:33:07,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.6, 300 sec: 10330.3). Total num frames: 22470656. Throughput: 0: 2576.3. Samples: 5618784. Policy #0 lag: (min: 0.0, avg: 2.6, max: 5.0) +[2025-02-27 21:33:07,055][00031] Avg episode reward: [(0, '27.303')] +[2025-02-27 21:33:07,064][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002743_22470656.pth... +[2025-02-27 21:33:07,187][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002441_19996672.pth +[2025-02-27 21:33:11,769][00216] Updated weights for policy 0, policy_version 2750 (0.0015) +[2025-02-27 21:33:12,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 22528000. Throughput: 0: 2602.9. Samples: 5634630. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:33:12,058][00031] Avg episode reward: [(0, '26.875')] +[2025-02-27 21:33:17,053][00031] Fps is (10 sec: 10649.3, 60 sec: 10376.5, 300 sec: 10330.2). Total num frames: 22577152. Throughput: 0: 2602.4. Samples: 5642202. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) +[2025-02-27 21:33:17,055][00031] Avg episode reward: [(0, '24.711')] +[2025-02-27 21:33:19,547][00216] Updated weights for policy 0, policy_version 2760 (0.0017) +[2025-02-27 21:33:22,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 22626304. Throughput: 0: 2603.9. Samples: 5658078. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:33:22,054][00031] Avg episode reward: [(0, '23.916')] +[2025-02-27 21:33:27,056][00031] Fps is (10 sec: 10647.0, 60 sec: 10513.1, 300 sec: 10330.2). Total num frames: 22683648. Throughput: 0: 2597.3. Samples: 5673726. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:33:27,058][00031] Avg episode reward: [(0, '29.016')] +[2025-02-27 21:33:27,710][00216] Updated weights for policy 0, policy_version 2770 (0.0018) +[2025-02-27 21:33:32,053][00031] Fps is (10 sec: 10649.2, 60 sec: 10376.5, 300 sec: 10330.2). Total num frames: 22732800. Throughput: 0: 2603.0. Samples: 5681628. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:33:32,055][00031] Avg episode reward: [(0, '31.098')] +[2025-02-27 21:33:36,136][00216] Updated weights for policy 0, policy_version 2780 (0.0015) +[2025-02-27 21:33:37,053][00031] Fps is (10 sec: 9833.0, 60 sec: 10240.0, 300 sec: 10302.5). Total num frames: 22781952. Throughput: 0: 2568.2. Samples: 5696004. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:33:37,056][00031] Avg episode reward: [(0, '29.245')] +[2025-02-27 21:33:42,053][00031] Fps is (10 sec: 9830.7, 60 sec: 10240.0, 300 sec: 10330.2). Total num frames: 22831104. Throughput: 0: 2570.9. Samples: 5712072. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:33:42,054][00031] Avg episode reward: [(0, '27.564')] +[2025-02-27 21:33:43,649][00216] Updated weights for policy 0, policy_version 2790 (0.0015) +[2025-02-27 21:33:47,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 22888448. Throughput: 0: 2590.3. Samples: 5719938. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:33:47,054][00031] Avg episode reward: [(0, '26.673')] +[2025-02-27 21:33:51,676][00216] Updated weights for policy 0, policy_version 2800 (0.0015) +[2025-02-27 21:33:52,054][00031] Fps is (10 sec: 11467.9, 60 sec: 10513.2, 300 sec: 10330.2). Total num frames: 22945792. Throughput: 0: 2605.0. Samples: 5736012. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:33:52,056][00031] Avg episode reward: [(0, '27.994')] +[2025-02-27 21:33:57,053][00031] Fps is (10 sec: 10649.3, 60 sec: 10513.0, 300 sec: 10330.2). Total num frames: 22994944. Throughput: 0: 2606.4. Samples: 5751918. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:33:57,056][00031] Avg episode reward: [(0, '29.184')] +[2025-02-27 21:33:59,446][00216] Updated weights for policy 0, policy_version 2810 (0.0015) +[2025-02-27 21:34:02,053][00031] Fps is (10 sec: 9831.2, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 23044096. Throughput: 0: 2619.2. Samples: 5760066. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:34:02,056][00031] Avg episode reward: [(0, '28.293')] +[2025-02-27 21:34:07,053][00031] Fps is (10 sec: 9830.7, 60 sec: 10376.5, 300 sec: 10330.3). Total num frames: 23093248. Throughput: 0: 2598.3. Samples: 5775000. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:34:07,055][00031] Avg episode reward: [(0, '29.357')] +[2025-02-27 21:34:07,150][00216] Updated weights for policy 0, policy_version 2820 (0.0021) +[2025-02-27 21:34:12,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 23150592. Throughput: 0: 2597.0. Samples: 5790582. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:34:12,054][00031] Avg episode reward: [(0, '29.116')] +[2025-02-27 21:34:14,997][00216] Updated weights for policy 0, policy_version 2830 (0.0015) +[2025-02-27 21:34:17,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.6, 300 sec: 10358.0). Total num frames: 23199744. Throughput: 0: 2595.1. Samples: 5798406. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:34:17,054][00031] Avg episode reward: [(0, '26.165')] +[2025-02-27 21:34:22,055][00031] Fps is (10 sec: 10647.2, 60 sec: 10512.7, 300 sec: 10357.9). Total num frames: 23257088. Throughput: 0: 2629.2. Samples: 5814324. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:34:22,056][00031] Avg episode reward: [(0, '29.824')] +[2025-02-27 21:34:22,991][00216] Updated weights for policy 0, policy_version 2840 (0.0017) +[2025-02-27 21:34:27,053][00031] Fps is (10 sec: 10649.2, 60 sec: 10376.9, 300 sec: 10330.2). Total num frames: 23306240. Throughput: 0: 2623.0. Samples: 5830110. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:34:27,055][00031] Avg episode reward: [(0, '31.076')] +[2025-02-27 21:34:30,454][00216] Updated weights for policy 0, policy_version 2850 (0.0017) +[2025-02-27 21:34:32,053][00031] Fps is (10 sec: 10652.0, 60 sec: 10513.1, 300 sec: 10358.0). Total num frames: 23363584. Throughput: 0: 2625.3. Samples: 5838078. Policy #0 lag: (min: 0.0, avg: 2.0, max: 5.0) +[2025-02-27 21:34:32,055][00031] Avg episode reward: [(0, '30.014')] +[2025-02-27 21:34:37,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10513.1, 300 sec: 10385.8). Total num frames: 23412736. Throughput: 0: 2622.0. Samples: 5854002. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:34:37,057][00031] Avg episode reward: [(0, '28.025')] +[2025-02-27 21:34:38,207][00216] Updated weights for policy 0, policy_version 2860 (0.0018) +[2025-02-27 21:34:42,053][00031] Fps is (10 sec: 9830.1, 60 sec: 10513.0, 300 sec: 10358.0). Total num frames: 23461888. Throughput: 0: 2591.6. Samples: 5868540. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:34:42,055][00031] Avg episode reward: [(0, '27.371')] +[2025-02-27 21:34:46,465][00216] Updated weights for policy 0, policy_version 2870 (0.0016) +[2025-02-27 21:34:47,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10385.8). Total num frames: 23519232. Throughput: 0: 2583.5. Samples: 5876322. Policy #0 lag: (min: 0.0, avg: 2.1, max: 6.0) +[2025-02-27 21:34:47,055][00031] Avg episode reward: [(0, '28.926')] +[2025-02-27 21:34:52,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10376.7, 300 sec: 10385.8). Total num frames: 23568384. Throughput: 0: 2606.1. Samples: 5892276. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:34:52,054][00031] Avg episode reward: [(0, '30.796')] +[2025-02-27 21:34:54,420][00216] Updated weights for policy 0, policy_version 2880 (0.0016) +[2025-02-27 21:34:57,054][00031] Fps is (10 sec: 9829.7, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 23617536. Throughput: 0: 2612.5. Samples: 5908146. Policy #0 lag: (min: 0.0, avg: 2.5, max: 5.0) +[2025-02-27 21:34:57,057][00031] Avg episode reward: [(0, '30.507')] +[2025-02-27 21:35:02,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 23666688. Throughput: 0: 2616.0. Samples: 5916126. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:35:02,055][00031] Avg episode reward: [(0, '28.652')] +[2025-02-27 21:35:02,160][00216] Updated weights for policy 0, policy_version 2890 (0.0016) +[2025-02-27 21:35:07,053][00031] Fps is (10 sec: 11469.6, 60 sec: 10649.6, 300 sec: 10385.8). Total num frames: 23732224. Throughput: 0: 2615.5. Samples: 5932014. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:35:07,054][00031] Avg episode reward: [(0, '26.421')] +[2025-02-27 21:35:07,064][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002897_23732224.pth... +[2025-02-27 21:35:07,201][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002592_21233664.pth +[2025-02-27 21:35:09,394][00216] Updated weights for policy 0, policy_version 2900 (0.0017) +[2025-02-27 21:35:12,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 23764992. Throughput: 0: 2599.1. Samples: 5947068. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:35:12,055][00031] Avg episode reward: [(0, '29.545')] +[2025-02-27 21:35:17,053][00031] Fps is (10 sec: 9011.2, 60 sec: 10376.5, 300 sec: 10358.0). Total num frames: 23822336. Throughput: 0: 2584.9. Samples: 5954400. Policy #0 lag: (min: 0.0, avg: 2.4, max: 4.0) +[2025-02-27 21:35:17,055][00031] Avg episode reward: [(0, '31.449')] +[2025-02-27 21:35:17,962][00216] Updated weights for policy 0, policy_version 2910 (0.0018) +[2025-02-27 21:35:22,053][00031] Fps is (10 sec: 11468.5, 60 sec: 10376.9, 300 sec: 10385.8). Total num frames: 23879680. Throughput: 0: 2588.4. Samples: 5970480. Policy #0 lag: (min: 0.0, avg: 1.4, max: 5.0) +[2025-02-27 21:35:22,055][00031] Avg episode reward: [(0, '31.270')] +[2025-02-27 21:35:26,143][00216] Updated weights for policy 0, policy_version 2920 (0.0016) +[2025-02-27 21:35:27,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10513.1, 300 sec: 10385.8). Total num frames: 23937024. Throughput: 0: 2617.5. Samples: 5986326. Policy #0 lag: (min: 0.0, avg: 2.4, max: 5.0) +[2025-02-27 21:35:27,055][00031] Avg episode reward: [(0, '30.061')] +[2025-02-27 21:35:32,053][00031] Fps is (10 sec: 9830.6, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 23977984. Throughput: 0: 2623.3. Samples: 5994372. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:35:32,055][00031] Avg episode reward: [(0, '29.039')] +[2025-02-27 21:35:33,124][00216] Updated weights for policy 0, policy_version 2930 (0.0016) +[2025-02-27 21:35:37,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10385.8). Total num frames: 24035328. Throughput: 0: 2620.8. Samples: 6010212. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:35:37,055][00031] Avg episode reward: [(0, '26.491')] +[2025-02-27 21:35:40,851][00216] Updated weights for policy 0, policy_version 2940 (0.0015) +[2025-02-27 21:35:42,053][00031] Fps is (10 sec: 11468.8, 60 sec: 10513.1, 300 sec: 10413.6). Total num frames: 24092672. Throughput: 0: 2625.9. Samples: 6026310. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:35:42,055][00031] Avg episode reward: [(0, '25.735')] +[2025-02-27 21:35:47,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10240.0, 300 sec: 10358.0). Total num frames: 24133632. Throughput: 0: 2600.5. Samples: 6033150. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:35:47,054][00031] Avg episode reward: [(0, '27.142')] +[2025-02-27 21:35:49,253][00216] Updated weights for policy 0, policy_version 2950 (0.0016) +[2025-02-27 21:35:52,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.5, 300 sec: 10385.9). Total num frames: 24190976. Throughput: 0: 2593.1. Samples: 6048702. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:35:52,057][00031] Avg episode reward: [(0, '28.114')] +[2025-02-27 21:35:57,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.7, 300 sec: 10385.8). Total num frames: 24240128. Throughput: 0: 2608.5. Samples: 6064452. Policy #0 lag: (min: 0.0, avg: 2.2, max: 6.0) +[2025-02-27 21:35:57,054][00031] Avg episode reward: [(0, '29.562')] +[2025-02-27 21:35:57,081][00216] Updated weights for policy 0, policy_version 2960 (0.0016) +[2025-02-27 21:36:02,053][00031] Fps is (10 sec: 10649.1, 60 sec: 10513.0, 300 sec: 10385.8). Total num frames: 24297472. Throughput: 0: 2622.6. Samples: 6072420. Policy #0 lag: (min: 0.0, avg: 2.3, max: 5.0) +[2025-02-27 21:36:02,055][00031] Avg episode reward: [(0, '29.925')] +[2025-02-27 21:36:05,013][00216] Updated weights for policy 0, policy_version 2970 (0.0017) +[2025-02-27 21:36:07,053][00031] Fps is (10 sec: 11468.7, 60 sec: 10376.5, 300 sec: 10413.6). Total num frames: 24354816. Throughput: 0: 2617.7. Samples: 6088278. Policy #0 lag: (min: 0.0, avg: 2.1, max: 4.0) +[2025-02-27 21:36:07,056][00031] Avg episode reward: [(0, '29.327')] +[2025-02-27 21:36:12,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10649.6, 300 sec: 10385.8). Total num frames: 24403968. Throughput: 0: 2622.0. Samples: 6104316. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-02-27 21:36:12,056][00031] Avg episode reward: [(0, '30.547')] +[2025-02-27 21:36:12,637][00216] Updated weights for policy 0, policy_version 2980 (0.0015) +[2025-02-27 21:36:17,053][00031] Fps is (10 sec: 9830.5, 60 sec: 10513.1, 300 sec: 10385.8). Total num frames: 24453120. Throughput: 0: 2615.6. Samples: 6112074. Policy #0 lag: (min: 0.0, avg: 1.9, max: 5.0) +[2025-02-27 21:36:17,055][00031] Avg episode reward: [(0, '29.417')] +[2025-02-27 21:36:20,741][00216] Updated weights for policy 0, policy_version 2990 (0.0018) +[2025-02-27 21:36:22,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.6, 300 sec: 10413.6). Total num frames: 24502272. Throughput: 0: 2590.4. Samples: 6126780. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:36:22,054][00031] Avg episode reward: [(0, '29.186')] +[2025-02-27 21:36:27,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10413.6). Total num frames: 24559616. Throughput: 0: 2583.6. Samples: 6142572. Policy #0 lag: (min: 0.0, avg: 2.2, max: 4.0) +[2025-02-27 21:36:27,055][00031] Avg episode reward: [(0, '28.307')] +[2025-02-27 21:36:28,124][00216] Updated weights for policy 0, policy_version 3000 (0.0015) +[2025-02-27 21:36:32,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10513.1, 300 sec: 10413.6). Total num frames: 24608768. Throughput: 0: 2610.8. Samples: 6150636. Policy #0 lag: (min: 0.0, avg: 1.7, max: 4.0) +[2025-02-27 21:36:32,055][00031] Avg episode reward: [(0, '26.471')] +[2025-02-27 21:36:36,443][00216] Updated weights for policy 0, policy_version 3010 (0.0015) +[2025-02-27 21:36:37,054][00031] Fps is (10 sec: 10648.3, 60 sec: 10512.8, 300 sec: 10413.5). Total num frames: 24666112. Throughput: 0: 2618.3. Samples: 6166530. Policy #0 lag: (min: 0.0, avg: 2.1, max: 5.0) +[2025-02-27 21:36:37,057][00031] Avg episode reward: [(0, '28.432')] +[2025-02-27 21:36:42,053][00031] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10385.8). Total num frames: 24715264. Throughput: 0: 2625.2. Samples: 6182586. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:36:42,055][00031] Avg episode reward: [(0, '28.953')] +[2025-02-27 21:36:43,704][00216] Updated weights for policy 0, policy_version 3020 (0.0016) +[2025-02-27 21:36:47,053][00031] Fps is (10 sec: 9831.3, 60 sec: 10513.0, 300 sec: 10385.8). Total num frames: 24764416. Throughput: 0: 2620.5. Samples: 6190344. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) +[2025-02-27 21:36:47,059][00031] Avg episode reward: [(0, '29.524')] +[2025-02-27 21:36:51,958][00216] Updated weights for policy 0, policy_version 3030 (0.0018) +[2025-02-27 21:36:52,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10513.1, 300 sec: 10441.3). Total num frames: 24821760. Throughput: 0: 2606.0. Samples: 6205548. Policy #0 lag: (min: 0.0, avg: 2.0, max: 4.0) +[2025-02-27 21:36:52,055][00031] Avg episode reward: [(0, '30.358')] +[2025-02-27 21:36:57,053][00031] Fps is (10 sec: 10650.0, 60 sec: 10513.1, 300 sec: 10413.6). Total num frames: 24870912. Throughput: 0: 2588.8. Samples: 6220812. Policy #0 lag: (min: 0.0, avg: 1.8, max: 5.0) +[2025-02-27 21:36:57,055][00031] Avg episode reward: [(0, '29.560')] +[2025-02-27 21:36:59,808][00216] Updated weights for policy 0, policy_version 3040 (0.0018) +[2025-02-27 21:37:02,053][00031] Fps is (10 sec: 9830.4, 60 sec: 10376.6, 300 sec: 10413.6). Total num frames: 24920064. Throughput: 0: 2595.1. Samples: 6228852. Policy #0 lag: (min: 0.0, avg: 2.2, max: 5.0) +[2025-02-27 21:37:02,055][00031] Avg episode reward: [(0, '27.647')] +[2025-02-27 21:37:07,053][00031] Fps is (10 sec: 10649.5, 60 sec: 10376.5, 300 sec: 10413.6). Total num frames: 24977408. Throughput: 0: 2618.8. Samples: 6244626. Policy #0 lag: (min: 0.0, avg: 1.9, max: 4.0) +[2025-02-27 21:37:07,054][00031] Avg episode reward: [(0, '27.914')] +[2025-02-27 21:37:07,063][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000003049_24977408.pth... +[2025-02-27 21:37:07,198][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002743_22470656.pth +[2025-02-27 21:37:07,810][00216] Updated weights for policy 0, policy_version 3050 (0.0018) +[2025-02-27 21:37:09,812][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000003053_25010176.pth... +[2025-02-27 21:37:09,820][00031] Component Batcher_0 stopped! +[2025-02-27 21:37:09,819][00196] Stopping Batcher_0... +[2025-02-27 21:37:09,825][00196] Loop batcher_evt_loop terminating... +[2025-02-27 21:37:09,861][00216] Weights refcount: 2 0 +[2025-02-27 21:37:09,865][00216] Stopping InferenceWorker_p0-w0... +[2025-02-27 21:37:09,866][00216] Loop inference_proc0-0_evt_loop terminating... +[2025-02-27 21:37:09,866][00031] Component InferenceWorker_p0-w0 stopped! +[2025-02-27 21:37:09,936][00196] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000002897_23732224.pth +[2025-02-27 21:37:09,952][00196] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000003053_25010176.pth... +[2025-02-27 21:37:10,103][00196] Stopping LearnerWorker_p0... +[2025-02-27 21:37:10,104][00196] Loop learner_proc0_evt_loop terminating... +[2025-02-27 21:37:10,105][00031] Component LearnerWorker_p0 stopped! +[2025-02-27 21:37:10,594][00031] Component RolloutWorker_w18 stopped! +[2025-02-27 21:37:10,597][00235] Stopping RolloutWorker_w18... +[2025-02-27 21:37:10,598][00235] Loop rollout_proc18_evt_loop terminating... +[2025-02-27 21:37:10,620][00234] Stopping RolloutWorker_w17... +[2025-02-27 21:37:10,621][00234] Loop rollout_proc17_evt_loop terminating... +[2025-02-27 21:37:10,620][00031] Component RolloutWorker_w17 stopped! +[2025-02-27 21:37:10,629][00031] Component RolloutWorker_w6 stopped! +[2025-02-27 21:37:10,629][00223] Stopping RolloutWorker_w6... +[2025-02-27 21:37:10,633][00223] Loop rollout_proc6_evt_loop terminating... +[2025-02-27 21:37:10,633][00232] Stopping RolloutWorker_w15... +[2025-02-27 21:37:10,634][00031] Component RolloutWorker_w15 stopped! +[2025-02-27 21:37:10,635][00232] Loop rollout_proc15_evt_loop terminating... +[2025-02-27 21:37:10,721][00224] Stopping RolloutWorker_w7... +[2025-02-27 21:37:10,722][00224] Loop rollout_proc7_evt_loop terminating... +[2025-02-27 21:37:10,721][00031] Component RolloutWorker_w7 stopped! +[2025-02-27 21:37:10,755][00219] Stopping RolloutWorker_w2... +[2025-02-27 21:37:10,755][00219] Loop rollout_proc2_evt_loop terminating... +[2025-02-27 21:37:10,755][00031] Component RolloutWorker_w2 stopped! +[2025-02-27 21:37:10,824][00221] Stopping RolloutWorker_w3... +[2025-02-27 21:37:10,824][00221] Loop rollout_proc3_evt_loop terminating... +[2025-02-27 21:37:10,824][00031] Component RolloutWorker_w3 stopped! +[2025-02-27 21:37:10,859][00236] Stopping RolloutWorker_w19... +[2025-02-27 21:37:10,860][00236] Loop rollout_proc19_evt_loop terminating... +[2025-02-27 21:37:10,860][00031] Component RolloutWorker_w19 stopped! +[2025-02-27 21:37:10,875][00031] Component RolloutWorker_w9 stopped! +[2025-02-27 21:37:10,875][00227] Stopping RolloutWorker_w9... +[2025-02-27 21:37:10,878][00227] Loop rollout_proc9_evt_loop terminating... +[2025-02-27 21:37:10,928][00031] Component RolloutWorker_w5 stopped! +[2025-02-27 21:37:10,927][00222] Stopping RolloutWorker_w5... +[2025-02-27 21:37:10,931][00222] Loop rollout_proc5_evt_loop terminating... +[2025-02-27 21:37:10,977][00031] Component RolloutWorker_w13 stopped! +[2025-02-27 21:37:10,979][00228] Stopping RolloutWorker_w11... +[2025-02-27 21:37:10,980][00228] Loop rollout_proc11_evt_loop terminating... +[2025-02-27 21:37:10,979][00031] Component RolloutWorker_w11 stopped! +[2025-02-27 21:37:10,977][00230] Stopping RolloutWorker_w13... +[2025-02-27 21:37:10,983][00230] Loop rollout_proc13_evt_loop terminating... +[2025-02-27 21:37:10,994][00225] Stopping RolloutWorker_w8... +[2025-02-27 21:37:10,994][00225] Loop rollout_proc8_evt_loop terminating... +[2025-02-27 21:37:10,994][00031] Component RolloutWorker_w8 stopped! +[2025-02-27 21:37:11,010][00231] Stopping RolloutWorker_w14... +[2025-02-27 21:37:11,011][00231] Loop rollout_proc14_evt_loop terminating... +[2025-02-27 21:37:11,011][00031] Component RolloutWorker_w14 stopped! +[2025-02-27 21:37:11,022][00031] Component RolloutWorker_w1 stopped! +[2025-02-27 21:37:11,022][00218] Stopping RolloutWorker_w1... +[2025-02-27 21:37:11,025][00218] Loop rollout_proc1_evt_loop terminating... +[2025-02-27 21:37:11,067][00226] Stopping RolloutWorker_w10... +[2025-02-27 21:37:11,067][00226] Loop rollout_proc10_evt_loop terminating... +[2025-02-27 21:37:11,067][00031] Component RolloutWorker_w10 stopped! +[2025-02-27 21:37:11,181][00217] Stopping RolloutWorker_w0... +[2025-02-27 21:37:11,182][00217] Loop rollout_proc0_evt_loop terminating... +[2025-02-27 21:37:11,182][00031] Component RolloutWorker_w0 stopped! +[2025-02-27 21:37:11,210][00233] Stopping RolloutWorker_w16... +[2025-02-27 21:37:11,210][00233] Loop rollout_proc16_evt_loop terminating... +[2025-02-27 21:37:11,210][00031] Component RolloutWorker_w16 stopped! +[2025-02-27 21:37:11,249][00229] Stopping RolloutWorker_w12... +[2025-02-27 21:37:11,250][00229] Loop rollout_proc12_evt_loop terminating... +[2025-02-27 21:37:11,252][00031] Component RolloutWorker_w12 stopped! +[2025-02-27 21:37:11,307][00220] Stopping RolloutWorker_w4... +[2025-02-27 21:37:11,308][00220] Loop rollout_proc4_evt_loop terminating... +[2025-02-27 21:37:11,310][00031] Component RolloutWorker_w4 stopped! +[2025-02-27 21:37:11,311][00031] Waiting for process learner_proc0 to stop... +[2025-02-27 21:37:11,920][00031] Waiting for process inference_proc0-0 to join... +[2025-02-27 21:37:11,922][00031] Waiting for process rollout_proc0 to join... +[2025-02-27 21:37:15,264][00031] Waiting for process rollout_proc1 to join... +[2025-02-27 21:37:15,267][00031] Waiting for process rollout_proc2 to join... +[2025-02-27 21:37:15,268][00031] Waiting for process rollout_proc3 to join... +[2025-02-27 21:37:15,269][00031] Waiting for process rollout_proc4 to join... +[2025-02-27 21:37:15,307][00031] Waiting for process rollout_proc5 to join... +[2025-02-27 21:37:15,308][00031] Waiting for process rollout_proc6 to join... +[2025-02-27 21:37:15,309][00031] Waiting for process rollout_proc7 to join... +[2025-02-27 21:37:15,310][00031] Waiting for process rollout_proc8 to join... +[2025-02-27 21:37:15,312][00031] Waiting for process rollout_proc9 to join... +[2025-02-27 21:37:15,313][00031] Waiting for process rollout_proc10 to join... +[2025-02-27 21:37:15,314][00031] Waiting for process rollout_proc11 to join... +[2025-02-27 21:37:15,316][00031] Waiting for process rollout_proc12 to join... +[2025-02-27 21:37:15,316][00031] Waiting for process rollout_proc13 to join... +[2025-02-27 21:37:15,318][00031] Waiting for process rollout_proc14 to join... +[2025-02-27 21:37:15,319][00031] Waiting for process rollout_proc15 to join... +[2025-02-27 21:37:15,320][00031] Waiting for process rollout_proc16 to join... +[2025-02-27 21:37:15,321][00031] Waiting for process rollout_proc17 to join... +[2025-02-27 21:37:15,322][00031] Waiting for process rollout_proc18 to join... +[2025-02-27 21:37:15,323][00031] Waiting for process rollout_proc19 to join... +[2025-02-27 21:37:15,324][00031] Batcher 0 profile tree view: +batching: 140.2837, releasing_batches: 0.1004 +[2025-02-27 21:37:15,325][00031] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0001 + wait_policy_total: 1439.4642 +update_model: 13.7280 + weight_update: 0.0018 +one_step: 0.0058 + handle_policy_step: 988.8719 + deserialize: 48.9574, stack: 4.6095, obs_to_device_normalize: 195.5420, forward: 554.7885, send_messages: 38.1304 + prepare_outputs: 114.0456 + to_cpu: 63.7743 +[2025-02-27 21:37:15,326][00031] Learner 0 profile tree view: +misc: 0.0161, prepare_batch: 31.8228 +train: 177.0378 + epoch_init: 0.0260, minibatch_init: 0.0307, losses_postprocess: 1.0217, kl_divergence: 1.3678, after_optimizer: 64.6695 + calculate_losses: 63.2578 + losses_init: 0.0207, forward_head: 2.9061, bptt_initial: 39.7562, tail: 3.0017, advantages_returns: 0.8008, losses: 5.4053 + bptt: 10.3721 + bptt_forward_core: 10.0922 + update: 44.8229 + clip: 2.8333 +[2025-02-27 21:37:15,327][00031] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4210, enqueue_policy_requests: 58.0016, env_step: 2308.1572, overhead: 22.0099, complete_rollouts: 3.7025 +save_policy_outputs: 26.9566 + split_output_tensors: 10.5655 +[2025-02-27 21:37:15,329][00031] RolloutWorker_w19 profile tree view: +wait_for_trajectories: 0.4130, enqueue_policy_requests: 54.9748, env_step: 2318.6679, overhead: 22.3060, complete_rollouts: 4.6932 +save_policy_outputs: 26.8443 + split_output_tensors: 10.5471 +[2025-02-27 21:37:15,330][00031] Loop Runner_EvtLoop terminating... +[2025-02-27 21:37:15,331][00031] Runner profile tree view: +main_loop: 2522.9489 +[2025-02-27 21:37:15,333][00031] Collected {0: 25010176}, FPS: 9913.1 +[2025-02-27 21:37:15,680][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json +[2025-02-27 21:37:15,681][00031] Overriding arg 'num_workers' with value 1 passed from command line +[2025-02-27 21:37:15,682][00031] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-02-27 21:37:15,683][00031] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-02-27 21:37:15,683][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-02-27 21:37:15,684][00031] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-02-27 21:37:15,685][00031] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-02-27 21:37:15,687][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-02-27 21:37:15,687][00031] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-02-27 21:37:15,689][00031] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-02-27 21:37:15,690][00031] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-02-27 21:37:15,691][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-02-27 21:37:15,692][00031] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-02-27 21:37:15,692][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-02-27 21:37:15,693][00031] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-02-27 21:37:15,723][00031] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-02-27 21:37:15,726][00031] RunningMeanStd input shape: (3, 72, 128) +[2025-02-27 21:37:15,727][00031] RunningMeanStd input shape: (1,) +[2025-02-27 21:37:15,741][00031] ConvEncoder: input_channels=3 +[2025-02-27 21:37:15,849][00031] Conv encoder output size: 512 +[2025-02-27 21:37:15,850][00031] Policy head output size: 512 +[2025-02-27 21:37:16,013][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000003053_25010176.pth... +[2025-02-27 21:37:16,794][00031] Num frames 100... +[2025-02-27 21:37:16,915][00031] Num frames 200... +[2025-02-27 21:37:17,036][00031] Num frames 300... +[2025-02-27 21:37:17,156][00031] Num frames 400... +[2025-02-27 21:37:17,276][00031] Num frames 500... +[2025-02-27 21:37:17,396][00031] Num frames 600... +[2025-02-27 21:37:17,524][00031] Num frames 700... +[2025-02-27 21:37:17,652][00031] Num frames 800... +[2025-02-27 21:37:17,778][00031] Num frames 900... +[2025-02-27 21:37:17,869][00031] Avg episode rewards: #0: 19.280, true rewards: #0: 9.280 +[2025-02-27 21:37:17,870][00031] Avg episode reward: 19.280, avg true_objective: 9.280 +[2025-02-27 21:37:17,958][00031] Num frames 1000... +[2025-02-27 21:37:18,077][00031] Num frames 1100... +[2025-02-27 21:37:18,197][00031] Num frames 1200... +[2025-02-27 21:37:18,317][00031] Num frames 1300... +[2025-02-27 21:37:18,438][00031] Num frames 1400... +[2025-02-27 21:37:18,559][00031] Num frames 1500... +[2025-02-27 21:37:18,681][00031] Num frames 1600... +[2025-02-27 21:37:18,802][00031] Num frames 1700... +[2025-02-27 21:37:18,922][00031] Num frames 1800... +[2025-02-27 21:37:19,040][00031] Num frames 1900... +[2025-02-27 21:37:19,165][00031] Avg episode rewards: #0: 21.285, true rewards: #0: 9.785 +[2025-02-27 21:37:19,166][00031] Avg episode reward: 21.285, avg true_objective: 9.785 +[2025-02-27 21:37:19,218][00031] Num frames 2000... +[2025-02-27 21:37:19,336][00031] Num frames 2100... +[2025-02-27 21:37:19,454][00031] Num frames 2200... +[2025-02-27 21:37:19,572][00031] Num frames 2300... +[2025-02-27 21:37:19,691][00031] Num frames 2400... +[2025-02-27 21:37:19,807][00031] Num frames 2500... +[2025-02-27 21:37:19,926][00031] Num frames 2600... +[2025-02-27 21:37:20,047][00031] Num frames 2700... +[2025-02-27 21:37:20,125][00031] Avg episode rewards: #0: 19.060, true rewards: #0: 9.060 +[2025-02-27 21:37:20,126][00031] Avg episode reward: 19.060, avg true_objective: 9.060 +[2025-02-27 21:37:20,224][00031] Num frames 2800... +[2025-02-27 21:37:20,349][00031] Num frames 2900... +[2025-02-27 21:37:20,471][00031] Num frames 3000... +[2025-02-27 21:37:20,590][00031] Num frames 3100... +[2025-02-27 21:37:20,710][00031] Num frames 3200... +[2025-02-27 21:37:20,829][00031] Num frames 3300... +[2025-02-27 21:37:20,948][00031] Num frames 3400... +[2025-02-27 21:37:21,070][00031] Num frames 3500... +[2025-02-27 21:37:21,189][00031] Num frames 3600... +[2025-02-27 21:37:21,310][00031] Num frames 3700... +[2025-02-27 21:37:21,430][00031] Num frames 3800... +[2025-02-27 21:37:21,550][00031] Num frames 3900... +[2025-02-27 21:37:21,672][00031] Num frames 4000... +[2025-02-27 21:37:21,797][00031] Num frames 4100... +[2025-02-27 21:37:21,936][00031] Num frames 4200... +[2025-02-27 21:37:22,083][00031] Num frames 4300... +[2025-02-27 21:37:22,242][00031] Num frames 4400... +[2025-02-27 21:37:22,393][00031] Num frames 4500... +[2025-02-27 21:37:22,525][00031] Num frames 4600... +[2025-02-27 21:37:22,657][00031] Num frames 4700... +[2025-02-27 21:37:22,797][00031] Num frames 4800... +[2025-02-27 21:37:22,876][00031] Avg episode rewards: #0: 27.045, true rewards: #0: 12.045 +[2025-02-27 21:37:22,877][00031] Avg episode reward: 27.045, avg true_objective: 12.045 +[2025-02-27 21:37:22,989][00031] Num frames 4900... +[2025-02-27 21:37:23,129][00031] Num frames 5000... +[2025-02-27 21:37:23,261][00031] Num frames 5100... +[2025-02-27 21:37:23,383][00031] Num frames 5200... +[2025-02-27 21:37:23,501][00031] Num frames 5300... +[2025-02-27 21:37:23,622][00031] Num frames 5400... +[2025-02-27 21:37:23,748][00031] Num frames 5500... +[2025-02-27 21:37:23,875][00031] Num frames 5600... +[2025-02-27 21:37:24,021][00031] Num frames 5700... +[2025-02-27 21:37:24,148][00031] Num frames 5800... +[2025-02-27 21:37:24,269][00031] Num frames 5900... +[2025-02-27 21:37:24,390][00031] Num frames 6000... +[2025-02-27 21:37:24,510][00031] Num frames 6100... +[2025-02-27 21:37:24,629][00031] Num frames 6200... +[2025-02-27 21:37:24,755][00031] Avg episode rewards: #0: 27.316, true rewards: #0: 12.516 +[2025-02-27 21:37:24,756][00031] Avg episode reward: 27.316, avg true_objective: 12.516 +[2025-02-27 21:37:24,809][00031] Num frames 6300... +[2025-02-27 21:37:24,925][00031] Num frames 6400... +[2025-02-27 21:37:25,053][00031] Num frames 6500... +[2025-02-27 21:37:25,172][00031] Num frames 6600... +[2025-02-27 21:37:25,295][00031] Num frames 6700... +[2025-02-27 21:37:25,416][00031] Num frames 6800... +[2025-02-27 21:37:25,539][00031] Num frames 6900... +[2025-02-27 21:37:25,666][00031] Num frames 7000... +[2025-02-27 21:37:25,795][00031] Num frames 7100... +[2025-02-27 21:37:25,921][00031] Num frames 7200... +[2025-02-27 21:37:26,042][00031] Num frames 7300... +[2025-02-27 21:37:26,165][00031] Num frames 7400... +[2025-02-27 21:37:26,287][00031] Num frames 7500... +[2025-02-27 21:37:26,409][00031] Num frames 7600... +[2025-02-27 21:37:26,531][00031] Num frames 7700... +[2025-02-27 21:37:26,652][00031] Num frames 7800... +[2025-02-27 21:37:26,756][00031] Avg episode rewards: #0: 28.400, true rewards: #0: 13.067 +[2025-02-27 21:37:26,757][00031] Avg episode reward: 28.400, avg true_objective: 13.067 +[2025-02-27 21:37:26,828][00031] Num frames 7900... +[2025-02-27 21:37:26,945][00031] Num frames 8000... +[2025-02-27 21:37:27,062][00031] Num frames 8100... +[2025-02-27 21:37:27,182][00031] Num frames 8200... +[2025-02-27 21:37:27,302][00031] Num frames 8300... +[2025-02-27 21:37:27,420][00031] Num frames 8400... +[2025-02-27 21:37:27,539][00031] Num frames 8500... +[2025-02-27 21:37:27,657][00031] Num frames 8600... +[2025-02-27 21:37:27,777][00031] Num frames 8700... +[2025-02-27 21:37:27,898][00031] Num frames 8800... +[2025-02-27 21:37:27,950][00031] Avg episode rewards: #0: 26.857, true rewards: #0: 12.571 +[2025-02-27 21:37:27,951][00031] Avg episode reward: 26.857, avg true_objective: 12.571 +[2025-02-27 21:37:28,068][00031] Num frames 8900... +[2025-02-27 21:37:28,186][00031] Num frames 9000... +[2025-02-27 21:37:28,306][00031] Num frames 9100... +[2025-02-27 21:37:28,427][00031] Num frames 9200... +[2025-02-27 21:37:28,546][00031] Num frames 9300... +[2025-02-27 21:37:28,664][00031] Num frames 9400... +[2025-02-27 21:37:28,783][00031] Num frames 9500... +[2025-02-27 21:37:28,904][00031] Num frames 9600... +[2025-02-27 21:37:29,021][00031] Num frames 9700... +[2025-02-27 21:37:29,143][00031] Num frames 9800... +[2025-02-27 21:37:29,261][00031] Num frames 9900... +[2025-02-27 21:37:29,381][00031] Num frames 10000... +[2025-02-27 21:37:29,500][00031] Num frames 10100... +[2025-02-27 21:37:29,618][00031] Num frames 10200... +[2025-02-27 21:37:29,735][00031] Num frames 10300... +[2025-02-27 21:37:29,858][00031] Num frames 10400... +[2025-02-27 21:37:29,910][00031] Avg episode rewards: #0: 27.875, true rewards: #0: 13.000 +[2025-02-27 21:37:29,911][00031] Avg episode reward: 27.875, avg true_objective: 13.000 +[2025-02-27 21:37:30,028][00031] Num frames 10500... +[2025-02-27 21:37:30,148][00031] Num frames 10600... +[2025-02-27 21:37:30,262][00031] Num frames 10700... +[2025-02-27 21:37:30,380][00031] Num frames 10800... +[2025-02-27 21:37:30,500][00031] Num frames 10900... +[2025-02-27 21:37:30,618][00031] Num frames 11000... +[2025-02-27 21:37:30,736][00031] Num frames 11100... +[2025-02-27 21:37:30,857][00031] Num frames 11200... +[2025-02-27 21:37:30,950][00031] Avg episode rewards: #0: 26.702, true rewards: #0: 12.480 +[2025-02-27 21:37:30,951][00031] Avg episode reward: 26.702, avg true_objective: 12.480 +[2025-02-27 21:37:31,033][00031] Num frames 11300... +[2025-02-27 21:37:31,152][00031] Num frames 11400... +[2025-02-27 21:37:31,270][00031] Num frames 11500... +[2025-02-27 21:37:31,390][00031] Num frames 11600... +[2025-02-27 21:37:31,511][00031] Num frames 11700... +[2025-02-27 21:37:31,631][00031] Num frames 11800... +[2025-02-27 21:37:31,749][00031] Num frames 11900... +[2025-02-27 21:37:31,869][00031] Num frames 12000... +[2025-02-27 21:37:31,990][00031] Num frames 12100... +[2025-02-27 21:37:32,111][00031] Num frames 12200... +[2025-02-27 21:37:32,204][00031] Avg episode rewards: #0: 26.332, true rewards: #0: 12.232 +[2025-02-27 21:37:32,205][00031] Avg episode reward: 26.332, avg true_objective: 12.232 +[2025-02-27 21:38:11,276][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4! +[2025-02-27 21:38:12,084][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json +[2025-02-27 21:38:12,085][00031] Overriding arg 'num_workers' with value 1 passed from command line +[2025-02-27 21:38:12,086][00031] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-02-27 21:38:12,087][00031] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-02-27 21:38:12,087][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-02-27 21:38:12,088][00031] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-02-27 21:38:12,089][00031] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-02-27 21:38:12,090][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-02-27 21:38:12,091][00031] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-02-27 21:38:12,092][00031] Adding new argument 'hf_repository'='francescosabbarese/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-02-27 21:38:12,094][00031] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-02-27 21:38:12,095][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-02-27 21:38:12,096][00031] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-02-27 21:38:12,097][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-02-27 21:38:12,097][00031] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-02-27 21:38:12,122][00031] RunningMeanStd input shape: (3, 72, 128) +[2025-02-27 21:38:12,124][00031] RunningMeanStd input shape: (1,) +[2025-02-27 21:38:12,135][00031] ConvEncoder: input_channels=3 +[2025-02-27 21:38:12,191][00031] Conv encoder output size: 512 +[2025-02-27 21:38:12,192][00031] Policy head output size: 512 +[2025-02-27 21:38:12,209][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000003053_25010176.pth... +[2025-02-27 21:38:12,666][00031] Num frames 100... +[2025-02-27 21:38:12,786][00031] Num frames 200... +[2025-02-27 21:38:12,906][00031] Num frames 300... +[2025-02-27 21:38:12,964][00031] Avg episode rewards: #0: 8.010, true rewards: #0: 3.010 +[2025-02-27 21:38:12,966][00031] Avg episode reward: 8.010, avg true_objective: 3.010 +[2025-02-27 21:38:13,082][00031] Num frames 400... +[2025-02-27 21:38:13,201][00031] Num frames 500... +[2025-02-27 21:38:13,319][00031] Num frames 600... +[2025-02-27 21:38:13,437][00031] Num frames 700... +[2025-02-27 21:38:13,558][00031] Num frames 800... +[2025-02-27 21:38:13,673][00031] Num frames 900... +[2025-02-27 21:38:13,787][00031] Num frames 1000... +[2025-02-27 21:38:13,934][00031] Num frames 1100... +[2025-02-27 21:38:14,070][00031] Num frames 1200... +[2025-02-27 21:38:14,236][00031] Avg episode rewards: #0: 16.465, true rewards: #0: 6.465 +[2025-02-27 21:38:14,237][00031] Avg episode reward: 16.465, avg true_objective: 6.465 +[2025-02-27 21:38:14,246][00031] Num frames 1300... +[2025-02-27 21:38:14,371][00031] Num frames 1400... +[2025-02-27 21:38:14,497][00031] Num frames 1500... +[2025-02-27 21:38:14,619][00031] Num frames 1600... +[2025-02-27 21:38:14,743][00031] Num frames 1700... +[2025-02-27 21:38:14,868][00031] Num frames 1800... +[2025-02-27 21:38:14,995][00031] Num frames 1900... +[2025-02-27 21:38:15,122][00031] Num frames 2000... +[2025-02-27 21:38:15,250][00031] Num frames 2100... +[2025-02-27 21:38:15,375][00031] Num frames 2200... +[2025-02-27 21:38:15,499][00031] Num frames 2300... +[2025-02-27 21:38:15,618][00031] Num frames 2400... +[2025-02-27 21:38:15,736][00031] Num frames 2500... +[2025-02-27 21:38:15,855][00031] Num frames 2600... +[2025-02-27 21:38:15,971][00031] Num frames 2700... +[2025-02-27 21:38:16,098][00031] Num frames 2800... +[2025-02-27 21:38:16,228][00031] Num frames 2900... +[2025-02-27 21:38:16,354][00031] Num frames 3000... +[2025-02-27 21:38:16,479][00031] Num frames 3100... +[2025-02-27 21:38:16,603][00031] Num frames 3200... +[2025-02-27 21:38:16,721][00031] Num frames 3300... +[2025-02-27 21:38:16,885][00031] Avg episode rewards: #0: 31.310, true rewards: #0: 11.310 +[2025-02-27 21:38:16,886][00031] Avg episode reward: 31.310, avg true_objective: 11.310 +[2025-02-27 21:38:16,895][00031] Num frames 3400... +[2025-02-27 21:38:17,019][00031] Num frames 3500... +[2025-02-27 21:38:17,138][00031] Num frames 3600... +[2025-02-27 21:38:17,262][00031] Num frames 3700... +[2025-02-27 21:38:17,385][00031] Num frames 3800... +[2025-02-27 21:38:17,509][00031] Num frames 3900... +[2025-02-27 21:38:17,593][00031] Avg episode rewards: #0: 25.810, true rewards: #0: 9.810 +[2025-02-27 21:38:17,594][00031] Avg episode reward: 25.810, avg true_objective: 9.810 +[2025-02-27 21:38:17,688][00031] Num frames 4000... +[2025-02-27 21:38:17,811][00031] Num frames 4100... +[2025-02-27 21:38:17,936][00031] Num frames 4200... +[2025-02-27 21:38:18,060][00031] Num frames 4300... +[2025-02-27 21:38:18,184][00031] Num frames 4400... +[2025-02-27 21:38:18,308][00031] Num frames 4500... +[2025-02-27 21:38:18,432][00031] Num frames 4600... +[2025-02-27 21:38:18,557][00031] Num frames 4700... +[2025-02-27 21:38:18,683][00031] Num frames 4800... +[2025-02-27 21:38:18,807][00031] Num frames 4900... +[2025-02-27 21:38:18,932][00031] Num frames 5000... +[2025-02-27 21:38:19,052][00031] Num frames 5100... +[2025-02-27 21:38:19,171][00031] Num frames 5200... +[2025-02-27 21:38:19,291][00031] Num frames 5300... +[2025-02-27 21:38:19,419][00031] Num frames 5400... +[2025-02-27 21:38:19,538][00031] Num frames 5500... +[2025-02-27 21:38:19,657][00031] Num frames 5600... +[2025-02-27 21:38:19,774][00031] Num frames 5700... +[2025-02-27 21:38:19,894][00031] Num frames 5800... +[2025-02-27 21:38:20,015][00031] Num frames 5900... +[2025-02-27 21:38:20,134][00031] Num frames 6000... +[2025-02-27 21:38:20,216][00031] Avg episode rewards: #0: 32.848, true rewards: #0: 12.048 +[2025-02-27 21:38:20,217][00031] Avg episode reward: 32.848, avg true_objective: 12.048 +[2025-02-27 21:38:20,307][00031] Num frames 6100... +[2025-02-27 21:38:20,429][00031] Num frames 6200... +[2025-02-27 21:38:20,547][00031] Num frames 6300... +[2025-02-27 21:38:20,668][00031] Num frames 6400... +[2025-02-27 21:38:20,768][00031] Avg episode rewards: #0: 28.400, true rewards: #0: 10.733 +[2025-02-27 21:38:20,769][00031] Avg episode reward: 28.400, avg true_objective: 10.733 +[2025-02-27 21:38:20,839][00031] Num frames 6500... +[2025-02-27 21:38:20,953][00031] Num frames 6600... +[2025-02-27 21:38:21,071][00031] Num frames 6700... +[2025-02-27 21:38:21,189][00031] Num frames 6800... +[2025-02-27 21:38:21,308][00031] Num frames 6900... +[2025-02-27 21:38:21,423][00031] Num frames 7000... +[2025-02-27 21:38:21,541][00031] Num frames 7100... +[2025-02-27 21:38:21,663][00031] Num frames 7200... +[2025-02-27 21:38:21,765][00031] Avg episode rewards: #0: 26.914, true rewards: #0: 10.343 +[2025-02-27 21:38:21,766][00031] Avg episode reward: 26.914, avg true_objective: 10.343 +[2025-02-27 21:38:21,837][00031] Num frames 7300... +[2025-02-27 21:38:21,953][00031] Num frames 7400... +[2025-02-27 21:38:22,073][00031] Num frames 7500... +[2025-02-27 21:38:22,190][00031] Num frames 7600... +[2025-02-27 21:38:22,308][00031] Num frames 7700... +[2025-02-27 21:38:22,428][00031] Num frames 7800... +[2025-02-27 21:38:22,548][00031] Num frames 7900... +[2025-02-27 21:38:22,668][00031] Num frames 8000... +[2025-02-27 21:38:22,786][00031] Num frames 8100... +[2025-02-27 21:38:22,907][00031] Num frames 8200... +[2025-02-27 21:38:23,026][00031] Num frames 8300... +[2025-02-27 21:38:23,150][00031] Num frames 8400... +[2025-02-27 21:38:23,270][00031] Num frames 8500... +[2025-02-27 21:38:23,390][00031] Num frames 8600... +[2025-02-27 21:38:23,508][00031] Num frames 8700... +[2025-02-27 21:38:23,630][00031] Num frames 8800... +[2025-02-27 21:38:23,745][00031] Num frames 8900... +[2025-02-27 21:38:23,870][00031] Num frames 9000... +[2025-02-27 21:38:24,013][00031] Num frames 9100... +[2025-02-27 21:38:24,142][00031] Num frames 9200... +[2025-02-27 21:38:24,272][00031] Num frames 9300... +[2025-02-27 21:38:24,378][00031] Avg episode rewards: #0: 30.675, true rewards: #0: 11.675 +[2025-02-27 21:38:24,379][00031] Avg episode reward: 30.675, avg true_objective: 11.675 +[2025-02-27 21:38:24,453][00031] Num frames 9400... +[2025-02-27 21:38:24,574][00031] Num frames 9500... +[2025-02-27 21:38:24,694][00031] Num frames 9600... +[2025-02-27 21:38:24,812][00031] Num frames 9700... +[2025-02-27 21:38:24,936][00031] Num frames 9800... +[2025-02-27 21:38:25,060][00031] Num frames 9900... +[2025-02-27 21:38:25,187][00031] Num frames 10000... +[2025-02-27 21:38:25,318][00031] Num frames 10100... +[2025-02-27 21:38:25,438][00031] Num frames 10200... +[2025-02-27 21:38:25,555][00031] Num frames 10300... +[2025-02-27 21:38:25,685][00031] Num frames 10400... +[2025-02-27 21:38:25,800][00031] Avg episode rewards: #0: 30.052, true rewards: #0: 11.608 +[2025-02-27 21:38:25,801][00031] Avg episode reward: 30.052, avg true_objective: 11.608 +[2025-02-27 21:38:25,866][00031] Num frames 10500... +[2025-02-27 21:38:25,985][00031] Num frames 10600... +[2025-02-27 21:38:26,107][00031] Num frames 10700... +[2025-02-27 21:38:26,227][00031] Num frames 10800... +[2025-02-27 21:38:26,346][00031] Num frames 10900... +[2025-02-27 21:38:26,465][00031] Num frames 11000... +[2025-02-27 21:38:26,608][00031] Num frames 11100... +[2025-02-27 21:38:26,752][00031] Num frames 11200... +[2025-02-27 21:38:26,889][00031] Num frames 11300... +[2025-02-27 21:38:27,036][00031] Num frames 11400... +[2025-02-27 21:38:27,182][00031] Num frames 11500... +[2025-02-27 21:38:27,308][00031] Num frames 11600... +[2025-02-27 21:38:27,437][00031] Num frames 11700... +[2025-02-27 21:38:27,584][00031] Num frames 11800... +[2025-02-27 21:38:27,716][00031] Num frames 11900... +[2025-02-27 21:38:27,844][00031] Num frames 12000... +[2025-02-27 21:38:27,970][00031] Num frames 12100... +[2025-02-27 21:38:28,092][00031] Num frames 12200... +[2025-02-27 21:38:28,212][00031] Num frames 12300... +[2025-02-27 21:38:28,337][00031] Num frames 12400... +[2025-02-27 21:38:28,457][00031] Num frames 12500... +[2025-02-27 21:38:28,568][00031] Avg episode rewards: #0: 33.147, true rewards: #0: 12.547 +[2025-02-27 21:38:28,570][00031] Avg episode reward: 33.147, avg true_objective: 12.547 +[2025-02-27 21:39:08,176][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4!