Subarashi's picture
Upload folder using huggingface_hub
0f380df verified
[2025-03-07 02:04:46,508][01872] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-03-07 02:04:46,510][01872] Rollout worker 0 uses device cpu
[2025-03-07 02:04:46,511][01872] Rollout worker 1 uses device cpu
[2025-03-07 02:04:46,512][01872] Rollout worker 2 uses device cpu
[2025-03-07 02:04:46,513][01872] Rollout worker 3 uses device cpu
[2025-03-07 02:04:46,514][01872] Rollout worker 4 uses device cpu
[2025-03-07 02:04:46,514][01872] Rollout worker 5 uses device cpu
[2025-03-07 02:04:46,515][01872] Rollout worker 6 uses device cpu
[2025-03-07 02:04:46,516][01872] Rollout worker 7 uses device cpu
[2025-03-07 02:04:46,658][01872] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-07 02:04:46,659][01872] InferenceWorker_p0-w0: min num requests: 2
[2025-03-07 02:04:46,690][01872] Starting all processes...
[2025-03-07 02:04:46,691][01872] Starting process learner_proc0
[2025-03-07 02:04:46,758][01872] Starting all processes...
[2025-03-07 02:04:46,765][01872] Starting process inference_proc0-0
[2025-03-07 02:04:46,766][01872] Starting process rollout_proc0
[2025-03-07 02:04:46,766][01872] Starting process rollout_proc1
[2025-03-07 02:04:46,767][01872] Starting process rollout_proc2
[2025-03-07 02:04:46,767][01872] Starting process rollout_proc3
[2025-03-07 02:04:46,767][01872] Starting process rollout_proc4
[2025-03-07 02:04:46,767][01872] Starting process rollout_proc5
[2025-03-07 02:04:46,767][01872] Starting process rollout_proc6
[2025-03-07 02:04:46,767][01872] Starting process rollout_proc7
[2025-03-07 02:05:01,782][06307] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-07 02:05:01,783][06307] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-03-07 02:05:01,837][06307] Num visible devices: 1
[2025-03-07 02:05:01,884][06307] Starting seed is not provided
[2025-03-07 02:05:01,885][06307] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-07 02:05:01,885][06307] Initializing actor-critic model on device cuda:0
[2025-03-07 02:05:01,886][06307] RunningMeanStd input shape: (3, 72, 128)
[2025-03-07 02:05:01,889][06307] RunningMeanStd input shape: (1,)
[2025-03-07 02:05:01,933][06320] Worker 1 uses CPU cores [1]
[2025-03-07 02:05:01,950][06307] ConvEncoder: input_channels=3
[2025-03-07 02:05:02,047][06322] Worker 2 uses CPU cores [0]
[2025-03-07 02:05:02,112][06321] Worker 0 uses CPU cores [0]
[2025-03-07 02:05:02,410][06323] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-07 02:05:02,411][06329] Worker 3 uses CPU cores [1]
[2025-03-07 02:05:02,415][06323] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-03-07 02:05:02,511][06323] Num visible devices: 1
[2025-03-07 02:05:02,527][06332] Worker 5 uses CPU cores [1]
[2025-03-07 02:05:02,562][06307] Conv encoder output size: 512
[2025-03-07 02:05:02,563][06307] Policy head output size: 512
[2025-03-07 02:05:02,609][06330] Worker 6 uses CPU cores [0]
[2025-03-07 02:05:02,610][06331] Worker 7 uses CPU cores [1]
[2025-03-07 02:05:02,659][06307] Created Actor Critic model with architecture:
[2025-03-07 02:05:02,659][06307] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-03-07 02:05:02,674][06328] Worker 4 uses CPU cores [0]
[2025-03-07 02:05:02,900][06307] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-03-07 02:05:06,651][01872] Heartbeat connected on Batcher_0
[2025-03-07 02:05:06,659][01872] Heartbeat connected on InferenceWorker_p0-w0
[2025-03-07 02:05:06,668][01872] Heartbeat connected on RolloutWorker_w0
[2025-03-07 02:05:06,670][01872] Heartbeat connected on RolloutWorker_w1
[2025-03-07 02:05:06,675][01872] Heartbeat connected on RolloutWorker_w2
[2025-03-07 02:05:06,677][01872] Heartbeat connected on RolloutWorker_w3
[2025-03-07 02:05:06,682][01872] Heartbeat connected on RolloutWorker_w4
[2025-03-07 02:05:06,683][01872] Heartbeat connected on RolloutWorker_w5
[2025-03-07 02:05:06,687][01872] Heartbeat connected on RolloutWorker_w6
[2025-03-07 02:05:06,690][01872] Heartbeat connected on RolloutWorker_w7
[2025-03-07 02:05:07,543][06307] No checkpoints found
[2025-03-07 02:05:07,543][06307] Did not load from checkpoint, starting from scratch!
[2025-03-07 02:05:07,543][06307] Initialized policy 0 weights for model version 0
[2025-03-07 02:05:07,553][06307] LearnerWorker_p0 finished initialization!
[2025-03-07 02:05:07,555][06307] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-03-07 02:05:07,554][01872] Heartbeat connected on LearnerWorker_p0
[2025-03-07 02:05:07,766][06323] RunningMeanStd input shape: (3, 72, 128)
[2025-03-07 02:05:07,768][06323] RunningMeanStd input shape: (1,)
[2025-03-07 02:05:07,785][06323] ConvEncoder: input_channels=3
[2025-03-07 02:05:07,938][06323] Conv encoder output size: 512
[2025-03-07 02:05:07,940][06323] Policy head output size: 512
[2025-03-07 02:05:07,989][01872] Inference worker 0-0 is ready!
[2025-03-07 02:05:07,990][01872] All inference workers are ready! Signal rollout workers to start!
[2025-03-07 02:05:08,333][06329] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:05:08,349][06328] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:05:08,375][06320] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:05:08,401][06330] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:05:08,403][06322] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:05:08,424][06332] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:05:08,435][06331] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:05:08,459][06321] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:05:10,071][06330] Decorrelating experience for 0 frames...
[2025-03-07 02:05:10,070][06328] Decorrelating experience for 0 frames...
[2025-03-07 02:05:10,068][06320] Decorrelating experience for 0 frames...
[2025-03-07 02:05:10,072][06321] Decorrelating experience for 0 frames...
[2025-03-07 02:05:10,073][06322] Decorrelating experience for 0 frames...
[2025-03-07 02:05:10,070][06331] Decorrelating experience for 0 frames...
[2025-03-07 02:05:10,071][06329] Decorrelating experience for 0 frames...
[2025-03-07 02:05:11,225][06322] Decorrelating experience for 32 frames...
[2025-03-07 02:05:11,227][06321] Decorrelating experience for 32 frames...
[2025-03-07 02:05:11,235][06331] Decorrelating experience for 32 frames...
[2025-03-07 02:05:11,232][06332] Decorrelating experience for 0 frames...
[2025-03-07 02:05:11,237][06320] Decorrelating experience for 32 frames...
[2025-03-07 02:05:11,237][06330] Decorrelating experience for 32 frames...
[2025-03-07 02:05:11,437][01872] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-03-07 02:05:12,567][06332] Decorrelating experience for 32 frames...
[2025-03-07 02:05:12,574][06328] Decorrelating experience for 32 frames...
[2025-03-07 02:05:12,587][06329] Decorrelating experience for 32 frames...
[2025-03-07 02:05:12,919][06322] Decorrelating experience for 64 frames...
[2025-03-07 02:05:12,922][06321] Decorrelating experience for 64 frames...
[2025-03-07 02:05:13,064][06331] Decorrelating experience for 64 frames...
[2025-03-07 02:05:13,662][06320] Decorrelating experience for 64 frames...
[2025-03-07 02:05:13,913][06329] Decorrelating experience for 64 frames...
[2025-03-07 02:05:14,274][06320] Decorrelating experience for 96 frames...
[2025-03-07 02:05:14,435][06328] Decorrelating experience for 64 frames...
[2025-03-07 02:05:14,544][06322] Decorrelating experience for 96 frames...
[2025-03-07 02:05:14,547][06321] Decorrelating experience for 96 frames...
[2025-03-07 02:05:14,605][06330] Decorrelating experience for 64 frames...
[2025-03-07 02:05:15,448][06331] Decorrelating experience for 96 frames...
[2025-03-07 02:05:15,650][06329] Decorrelating experience for 96 frames...
[2025-03-07 02:05:16,362][06328] Decorrelating experience for 96 frames...
[2025-03-07 02:05:16,437][01872] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-03-07 02:05:16,439][01872] Avg episode reward: [(0, '0.480')]
[2025-03-07 02:05:16,616][06330] Decorrelating experience for 96 frames...
[2025-03-07 02:05:19,177][06332] Decorrelating experience for 64 frames...
[2025-03-07 02:05:20,036][06307] Signal inference workers to stop experience collection...
[2025-03-07 02:05:20,072][06323] InferenceWorker_p0-w0: stopping experience collection
[2025-03-07 02:05:20,759][06332] Decorrelating experience for 96 frames...
[2025-03-07 02:05:21,437][01872] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 234.2. Samples: 2342. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-03-07 02:05:21,438][01872] Avg episode reward: [(0, '2.448')]
[2025-03-07 02:05:22,058][06307] Signal inference workers to resume experience collection...
[2025-03-07 02:05:22,058][06323] InferenceWorker_p0-w0: resuming experience collection
[2025-03-07 02:05:26,437][01872] Fps is (10 sec: 2457.6, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 24576. Throughput: 0: 452.1. Samples: 6782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:05:26,444][01872] Avg episode reward: [(0, '3.445')]
[2025-03-07 02:05:29,375][06323] Updated weights for policy 0, policy_version 10 (0.0019)
[2025-03-07 02:05:31,437][01872] Fps is (10 sec: 4915.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 49152. Throughput: 0: 509.9. Samples: 10198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:05:31,439][01872] Avg episode reward: [(0, '4.227')]
[2025-03-07 02:05:36,437][01872] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 622.6. Samples: 15566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:05:36,439][01872] Avg episode reward: [(0, '4.372')]
[2025-03-07 02:05:40,808][06323] Updated weights for policy 0, policy_version 20 (0.0027)
[2025-03-07 02:05:41,437][01872] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 81920. Throughput: 0: 714.8. Samples: 21444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:05:41,439][01872] Avg episode reward: [(0, '4.383')]
[2025-03-07 02:05:46,440][01872] Fps is (10 sec: 4504.4, 60 sec: 3042.5, 300 sec: 3042.5). Total num frames: 106496. Throughput: 0: 712.3. Samples: 24934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:05:46,451][01872] Avg episode reward: [(0, '4.284')]
[2025-03-07 02:05:46,456][06307] Saving new best policy, reward=4.284!
[2025-03-07 02:05:51,437][01872] Fps is (10 sec: 3686.4, 60 sec: 2969.6, 300 sec: 2969.6). Total num frames: 118784. Throughput: 0: 749.1. Samples: 29964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:05:51,441][01872] Avg episode reward: [(0, '4.299')]
[2025-03-07 02:05:51,463][06307] Saving new best policy, reward=4.299!
[2025-03-07 02:05:51,478][06323] Updated weights for policy 0, policy_version 30 (0.0022)
[2025-03-07 02:05:56,437][01872] Fps is (10 sec: 3687.4, 60 sec: 3185.8, 300 sec: 3185.8). Total num frames: 143360. Throughput: 0: 813.0. Samples: 36586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:05:56,443][01872] Avg episode reward: [(0, '4.351')]
[2025-03-07 02:05:56,449][06307] Saving new best policy, reward=4.351!
[2025-03-07 02:06:00,393][06323] Updated weights for policy 0, policy_version 40 (0.0015)
[2025-03-07 02:06:01,438][01872] Fps is (10 sec: 4915.1, 60 sec: 3358.7, 300 sec: 3358.7). Total num frames: 167936. Throughput: 0: 887.9. Samples: 39956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:06:01,441][01872] Avg episode reward: [(0, '4.628')]
[2025-03-07 02:06:01,447][06307] Saving new best policy, reward=4.628!
[2025-03-07 02:06:06,439][01872] Fps is (10 sec: 3685.6, 60 sec: 3276.7, 300 sec: 3276.7). Total num frames: 180224. Throughput: 0: 942.7. Samples: 44764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:06:06,441][01872] Avg episode reward: [(0, '4.693')]
[2025-03-07 02:06:06,444][06307] Saving new best policy, reward=4.693!
[2025-03-07 02:06:11,437][01872] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 196608. Throughput: 0: 947.8. Samples: 49432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:06:11,441][01872] Avg episode reward: [(0, '4.582')]
[2025-03-07 02:06:13,110][06323] Updated weights for policy 0, policy_version 50 (0.0012)
[2025-03-07 02:06:16,437][01872] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 217088. Throughput: 0: 951.2. Samples: 53000. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:06:16,440][01872] Avg episode reward: [(0, '4.649')]
[2025-03-07 02:06:21,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 233472. Throughput: 0: 949.5. Samples: 58294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-03-07 02:06:21,439][01872] Avg episode reward: [(0, '4.807')]
[2025-03-07 02:06:21,449][06307] Saving new best policy, reward=4.807!
[2025-03-07 02:06:23,751][06323] Updated weights for policy 0, policy_version 60 (0.0020)
[2025-03-07 02:06:26,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 963.3. Samples: 64794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:06:26,439][01872] Avg episode reward: [(0, '4.621')]
[2025-03-07 02:06:31,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 278528. Throughput: 0: 964.1. Samples: 68316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:06:31,443][01872] Avg episode reward: [(0, '4.553')]
[2025-03-07 02:06:33,185][06323] Updated weights for policy 0, policy_version 70 (0.0019)
[2025-03-07 02:06:36,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3469.6). Total num frames: 294912. Throughput: 0: 969.8. Samples: 73606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:06:36,439][01872] Avg episode reward: [(0, '4.517')]
[2025-03-07 02:06:41,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 319488. Throughput: 0: 969.7. Samples: 80222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:06:41,441][01872] Avg episode reward: [(0, '4.328')]
[2025-03-07 02:06:41,451][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth...
[2025-03-07 02:06:43,112][06323] Updated weights for policy 0, policy_version 80 (0.0022)
[2025-03-07 02:06:46,438][01872] Fps is (10 sec: 4505.4, 60 sec: 3891.3, 300 sec: 3578.6). Total num frames: 339968. Throughput: 0: 970.3. Samples: 83622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:06:46,439][01872] Avg episode reward: [(0, '4.388')]
[2025-03-07 02:06:51,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3563.5). Total num frames: 356352. Throughput: 0: 977.7. Samples: 88760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:06:51,439][01872] Avg episode reward: [(0, '4.321')]
[2025-03-07 02:06:53,556][06323] Updated weights for policy 0, policy_version 90 (0.0014)
[2025-03-07 02:06:56,437][01872] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3627.9). Total num frames: 380928. Throughput: 0: 1027.3. Samples: 95660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:06:56,439][01872] Avg episode reward: [(0, '4.375')]
[2025-03-07 02:07:01,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3649.2). Total num frames: 401408. Throughput: 0: 1026.4. Samples: 99190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:07:01,440][01872] Avg episode reward: [(0, '4.424')]
[2025-03-07 02:07:03,265][06323] Updated weights for policy 0, policy_version 100 (0.0023)
[2025-03-07 02:07:06,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3633.0). Total num frames: 417792. Throughput: 0: 1019.5. Samples: 104170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:07:06,441][01872] Avg episode reward: [(0, '4.445')]
[2025-03-07 02:07:11,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3686.4). Total num frames: 442368. Throughput: 0: 1029.9. Samples: 111138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:07:11,443][01872] Avg episode reward: [(0, '4.428')]
[2025-03-07 02:07:12,875][06323] Updated weights for policy 0, policy_version 110 (0.0027)
[2025-03-07 02:07:16,437][01872] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3702.8). Total num frames: 462848. Throughput: 0: 1029.6. Samples: 114650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:07:16,439][01872] Avg episode reward: [(0, '4.609')]
[2025-03-07 02:07:21,438][01872] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 3686.4). Total num frames: 479232. Throughput: 0: 1020.6. Samples: 119532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:07:21,444][01872] Avg episode reward: [(0, '4.643')]
[2025-03-07 02:07:23,181][06323] Updated weights for policy 0, policy_version 120 (0.0017)
[2025-03-07 02:07:26,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3731.9). Total num frames: 503808. Throughput: 0: 1029.8. Samples: 126562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:07:26,443][01872] Avg episode reward: [(0, '4.495')]
[2025-03-07 02:07:31,441][01872] Fps is (10 sec: 4504.2, 60 sec: 4095.8, 300 sec: 3744.8). Total num frames: 524288. Throughput: 0: 1033.6. Samples: 130136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:07:31,445][01872] Avg episode reward: [(0, '4.646')]
[2025-03-07 02:07:33,454][06323] Updated weights for policy 0, policy_version 130 (0.0017)
[2025-03-07 02:07:36,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3757.0). Total num frames: 544768. Throughput: 0: 1028.8. Samples: 135056. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:07:36,439][01872] Avg episode reward: [(0, '4.595')]
[2025-03-07 02:07:41,437][01872] Fps is (10 sec: 4097.4, 60 sec: 4096.0, 300 sec: 3768.3). Total num frames: 565248. Throughput: 0: 1031.0. Samples: 142054. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:07:41,442][01872] Avg episode reward: [(0, '4.475')]
[2025-03-07 02:07:42,508][06323] Updated weights for policy 0, policy_version 140 (0.0017)
[2025-03-07 02:07:46,440][01872] Fps is (10 sec: 4095.1, 60 sec: 4095.9, 300 sec: 3778.8). Total num frames: 585728. Throughput: 0: 1031.0. Samples: 145586. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:07:46,443][01872] Avg episode reward: [(0, '4.459')]
[2025-03-07 02:07:51,438][01872] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 3788.8). Total num frames: 606208. Throughput: 0: 1031.1. Samples: 150570. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:07:51,440][01872] Avg episode reward: [(0, '4.220')]
[2025-03-07 02:07:52,973][06323] Updated weights for policy 0, policy_version 150 (0.0023)
[2025-03-07 02:07:56,437][01872] Fps is (10 sec: 4506.7, 60 sec: 4164.3, 300 sec: 3822.9). Total num frames: 630784. Throughput: 0: 1032.0. Samples: 157576. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:07:56,440][01872] Avg episode reward: [(0, '4.228')]
[2025-03-07 02:08:01,437][01872] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3806.9). Total num frames: 647168. Throughput: 0: 1032.5. Samples: 161112. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:08:01,440][01872] Avg episode reward: [(0, '4.532')]
[2025-03-07 02:08:03,288][06323] Updated weights for policy 0, policy_version 160 (0.0015)
[2025-03-07 02:08:06,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3815.1). Total num frames: 667648. Throughput: 0: 1036.7. Samples: 166184. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:08:06,440][01872] Avg episode reward: [(0, '4.607')]
[2025-03-07 02:08:11,437][01872] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3845.7). Total num frames: 692224. Throughput: 0: 1040.2. Samples: 173372. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:08:11,441][01872] Avg episode reward: [(0, '4.400')]
[2025-03-07 02:08:11,999][06323] Updated weights for policy 0, policy_version 170 (0.0013)
[2025-03-07 02:08:16,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3830.3). Total num frames: 708608. Throughput: 0: 1031.1. Samples: 176534. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:08:16,440][01872] Avg episode reward: [(0, '4.386')]
[2025-03-07 02:08:21,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3837.3). Total num frames: 729088. Throughput: 0: 1039.2. Samples: 181820. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:08:21,439][01872] Avg episode reward: [(0, '4.488')]
[2025-03-07 02:08:22,540][06323] Updated weights for policy 0, policy_version 180 (0.0015)
[2025-03-07 02:08:26,437][01872] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3864.9). Total num frames: 753664. Throughput: 0: 1042.8. Samples: 188980. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:08:26,442][01872] Avg episode reward: [(0, '4.641')]
[2025-03-07 02:08:31,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 3850.2). Total num frames: 770048. Throughput: 0: 1035.6. Samples: 192186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:08:31,439][01872] Avg episode reward: [(0, '4.912')]
[2025-03-07 02:08:31,458][06307] Saving new best policy, reward=4.912!
[2025-03-07 02:08:32,730][06323] Updated weights for policy 0, policy_version 190 (0.0028)
[2025-03-07 02:08:36,438][01872] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 3876.2). Total num frames: 794624. Throughput: 0: 1043.7. Samples: 197536. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:08:36,441][01872] Avg episode reward: [(0, '4.816')]
[2025-03-07 02:08:41,438][01872] Fps is (10 sec: 4505.4, 60 sec: 4164.2, 300 sec: 3881.4). Total num frames: 815104. Throughput: 0: 1046.3. Samples: 204658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:08:41,444][01872] Avg episode reward: [(0, '4.395')]
[2025-03-07 02:08:41,455][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000199_815104.pth...
[2025-03-07 02:08:41,717][06323] Updated weights for policy 0, policy_version 200 (0.0012)
[2025-03-07 02:08:46,440][01872] Fps is (10 sec: 3685.3, 60 sec: 4096.0, 300 sec: 3867.3). Total num frames: 831488. Throughput: 0: 1028.1. Samples: 207380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:08:46,442][01872] Avg episode reward: [(0, '4.397')]
[2025-03-07 02:08:51,437][01872] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 3891.2). Total num frames: 856064. Throughput: 0: 1038.5. Samples: 212916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:08:51,442][01872] Avg episode reward: [(0, '4.499')]
[2025-03-07 02:08:52,257][06323] Updated weights for policy 0, policy_version 210 (0.0031)
[2025-03-07 02:08:56,437][01872] Fps is (10 sec: 4507.0, 60 sec: 4096.0, 300 sec: 3895.8). Total num frames: 876544. Throughput: 0: 1037.4. Samples: 220054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:08:56,439][01872] Avg episode reward: [(0, '4.696')]
[2025-03-07 02:09:01,439][01872] Fps is (10 sec: 4095.3, 60 sec: 4164.2, 300 sec: 3900.1). Total num frames: 897024. Throughput: 0: 1028.4. Samples: 222812. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:09:01,440][01872] Avg episode reward: [(0, '4.651')]
[2025-03-07 02:09:02,490][06323] Updated weights for policy 0, policy_version 220 (0.0018)
[2025-03-07 02:09:06,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3904.3). Total num frames: 917504. Throughput: 0: 1041.8. Samples: 228702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:09:06,441][01872] Avg episode reward: [(0, '4.737')]
[2025-03-07 02:09:11,437][01872] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 3908.3). Total num frames: 937984. Throughput: 0: 1026.0. Samples: 235150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:09:11,440][01872] Avg episode reward: [(0, '4.600')]
[2025-03-07 02:09:12,611][06323] Updated weights for policy 0, policy_version 230 (0.0016)
[2025-03-07 02:09:16,437][01872] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3878.7). Total num frames: 950272. Throughput: 0: 995.5. Samples: 236982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:09:16,441][01872] Avg episode reward: [(0, '4.491')]
[2025-03-07 02:09:21,437][01872] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3883.0). Total num frames: 970752. Throughput: 0: 992.8. Samples: 242214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:09:21,441][01872] Avg episode reward: [(0, '4.487')]
[2025-03-07 02:09:23,546][06323] Updated weights for policy 0, policy_version 240 (0.0017)
[2025-03-07 02:09:26,438][01872] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3903.2). Total num frames: 995328. Throughput: 0: 987.7. Samples: 249106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:09:26,442][01872] Avg episode reward: [(0, '4.630')]
[2025-03-07 02:09:31,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3891.2). Total num frames: 1011712. Throughput: 0: 988.6. Samples: 251864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:09:31,441][01872] Avg episode reward: [(0, '4.681')]
[2025-03-07 02:09:34,181][06323] Updated weights for policy 0, policy_version 250 (0.0012)
[2025-03-07 02:09:36,437][01872] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3895.1). Total num frames: 1032192. Throughput: 0: 990.8. Samples: 257500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:09:36,439][01872] Avg episode reward: [(0, '4.713')]
[2025-03-07 02:09:41,437][01872] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3914.0). Total num frames: 1056768. Throughput: 0: 987.4. Samples: 264488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:09:41,442][01872] Avg episode reward: [(0, '4.594')]
[2025-03-07 02:09:43,797][06323] Updated weights for policy 0, policy_version 260 (0.0031)
[2025-03-07 02:09:46,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3887.5). Total num frames: 1069056. Throughput: 0: 981.1. Samples: 266958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:09:46,440][01872] Avg episode reward: [(0, '4.744')]
[2025-03-07 02:09:51,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3905.8). Total num frames: 1093632. Throughput: 0: 980.6. Samples: 272828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:09:51,444][01872] Avg episode reward: [(0, '4.776')]
[2025-03-07 02:09:53,862][06323] Updated weights for policy 0, policy_version 270 (0.0022)
[2025-03-07 02:09:56,438][01872] Fps is (10 sec: 4915.0, 60 sec: 4027.7, 300 sec: 3923.5). Total num frames: 1118208. Throughput: 0: 988.1. Samples: 279616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:09:56,439][01872] Avg episode reward: [(0, '5.052')]
[2025-03-07 02:09:56,441][06307] Saving new best policy, reward=5.052!
[2025-03-07 02:10:01,440][01872] Fps is (10 sec: 3685.6, 60 sec: 3891.1, 300 sec: 3898.2). Total num frames: 1130496. Throughput: 0: 1002.1. Samples: 282080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:10:01,447][01872] Avg episode reward: [(0, '5.223')]
[2025-03-07 02:10:01,456][06307] Saving new best policy, reward=5.223!
[2025-03-07 02:10:04,394][06323] Updated weights for policy 0, policy_version 280 (0.0036)
[2025-03-07 02:10:06,437][01872] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1155072. Throughput: 0: 1014.1. Samples: 287850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:10:06,439][01872] Avg episode reward: [(0, '5.290')]
[2025-03-07 02:10:06,441][06307] Saving new best policy, reward=5.290!
[2025-03-07 02:10:11,437][01872] Fps is (10 sec: 4506.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1175552. Throughput: 0: 1003.4. Samples: 294260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:10:11,441][01872] Avg episode reward: [(0, '5.354')]
[2025-03-07 02:10:11,449][06307] Saving new best policy, reward=5.354!
[2025-03-07 02:10:15,748][06323] Updated weights for policy 0, policy_version 290 (0.0036)
[2025-03-07 02:10:16,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1187840. Throughput: 0: 989.9. Samples: 296410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:10:16,441][01872] Avg episode reward: [(0, '5.403')]
[2025-03-07 02:10:16,443][06307] Saving new best policy, reward=5.403!
[2025-03-07 02:10:21,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1208320. Throughput: 0: 985.5. Samples: 301848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:10:21,441][01872] Avg episode reward: [(0, '5.235')]
[2025-03-07 02:10:25,483][06323] Updated weights for policy 0, policy_version 300 (0.0020)
[2025-03-07 02:10:26,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1232896. Throughput: 0: 972.0. Samples: 308226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:10:26,439][01872] Avg episode reward: [(0, '5.262')]
[2025-03-07 02:10:31,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1245184. Throughput: 0: 967.1. Samples: 310478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:10:31,441][01872] Avg episode reward: [(0, '5.688')]
[2025-03-07 02:10:31,448][06307] Saving new best policy, reward=5.688!
[2025-03-07 02:10:36,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1265664. Throughput: 0: 957.5. Samples: 315916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:10:36,439][01872] Avg episode reward: [(0, '6.334')]
[2025-03-07 02:10:36,441][06307] Saving new best policy, reward=6.334!
[2025-03-07 02:10:36,803][06323] Updated weights for policy 0, policy_version 310 (0.0021)
[2025-03-07 02:10:41,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 1286144. Throughput: 0: 949.9. Samples: 322362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:10:41,441][01872] Avg episode reward: [(0, '6.487')]
[2025-03-07 02:10:41,456][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth...
[2025-03-07 02:10:41,634][06307] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth
[2025-03-07 02:10:41,648][06307] Saving new best policy, reward=6.487!
[2025-03-07 02:10:46,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 1298432. Throughput: 0: 938.4. Samples: 324308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:10:46,442][01872] Avg episode reward: [(0, '6.442')]
[2025-03-07 02:10:48,693][06323] Updated weights for policy 0, policy_version 320 (0.0027)
[2025-03-07 02:10:51,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 1323008. Throughput: 0: 931.5. Samples: 329768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:10:51,439][01872] Avg episode reward: [(0, '6.687')]
[2025-03-07 02:10:51,445][06307] Saving new best policy, reward=6.687!
[2025-03-07 02:10:56,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3984.9). Total num frames: 1343488. Throughput: 0: 932.1. Samples: 336204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:10:56,440][01872] Avg episode reward: [(0, '6.781')]
[2025-03-07 02:10:56,444][06307] Saving new best policy, reward=6.781!
[2025-03-07 02:10:59,230][06323] Updated weights for policy 0, policy_version 330 (0.0026)
[2025-03-07 02:11:01,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3984.9). Total num frames: 1355776. Throughput: 0: 928.4. Samples: 338188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:11:01,446][01872] Avg episode reward: [(0, '6.895')]
[2025-03-07 02:11:01,454][06307] Saving new best policy, reward=6.895!
[2025-03-07 02:11:06,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3998.8). Total num frames: 1376256. Throughput: 0: 924.9. Samples: 343470. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:11:06,441][01872] Avg episode reward: [(0, '7.115')]
[2025-03-07 02:11:06,445][06307] Saving new best policy, reward=7.115!
[2025-03-07 02:11:09,900][06323] Updated weights for policy 0, policy_version 340 (0.0023)
[2025-03-07 02:11:11,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3998.8). Total num frames: 1396736. Throughput: 0: 922.0. Samples: 349718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:11:11,443][01872] Avg episode reward: [(0, '7.155')]
[2025-03-07 02:11:11,457][06307] Saving new best policy, reward=7.155!
[2025-03-07 02:11:16,438][01872] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3984.9). Total num frames: 1409024. Throughput: 0: 913.8. Samples: 351600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:11:16,442][01872] Avg episode reward: [(0, '7.269')]
[2025-03-07 02:11:16,445][06307] Saving new best policy, reward=7.269!
[2025-03-07 02:11:21,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3971.0). Total num frames: 1429504. Throughput: 0: 911.3. Samples: 356926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:11:21,441][01872] Avg episode reward: [(0, '7.336')]
[2025-03-07 02:11:21,448][06307] Saving new best policy, reward=7.336!
[2025-03-07 02:11:21,854][06323] Updated weights for policy 0, policy_version 350 (0.0019)
[2025-03-07 02:11:26,437][01872] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3971.0). Total num frames: 1449984. Throughput: 0: 911.5. Samples: 363380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:11:26,441][01872] Avg episode reward: [(0, '7.373')]
[2025-03-07 02:11:26,442][06307] Saving new best policy, reward=7.373!
[2025-03-07 02:11:31,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3971.0). Total num frames: 1466368. Throughput: 0: 913.7. Samples: 365426. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-03-07 02:11:31,441][01872] Avg episode reward: [(0, '8.061')]
[2025-03-07 02:11:31,448][06307] Saving new best policy, reward=8.061!
[2025-03-07 02:11:32,964][06323] Updated weights for policy 0, policy_version 360 (0.0021)
[2025-03-07 02:11:36,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3957.2). Total num frames: 1486848. Throughput: 0: 923.4. Samples: 371320. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:11:36,442][01872] Avg episode reward: [(0, '8.647')]
[2025-03-07 02:11:36,447][06307] Saving new best policy, reward=8.647!
[2025-03-07 02:11:41,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3957.2). Total num frames: 1507328. Throughput: 0: 921.2. Samples: 377660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:11:41,440][01872] Avg episode reward: [(0, '8.470')]
[2025-03-07 02:11:43,918][06323] Updated weights for policy 0, policy_version 370 (0.0022)
[2025-03-07 02:11:46,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 1523712. Throughput: 0: 919.1. Samples: 379546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:11:46,439][01872] Avg episode reward: [(0, '8.167')]
[2025-03-07 02:11:51,438][01872] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3943.3). Total num frames: 1544192. Throughput: 0: 935.6. Samples: 385570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:11:51,439][01872] Avg episode reward: [(0, '8.030')]
[2025-03-07 02:11:53,785][06323] Updated weights for policy 0, policy_version 380 (0.0018)
[2025-03-07 02:11:56,443][01872] Fps is (10 sec: 4093.9, 60 sec: 3686.1, 300 sec: 3943.2). Total num frames: 1564672. Throughput: 0: 939.1. Samples: 391982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:11:56,447][01872] Avg episode reward: [(0, '7.994')]
[2025-03-07 02:12:01,438][01872] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 1581056. Throughput: 0: 940.6. Samples: 393928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:12:01,443][01872] Avg episode reward: [(0, '7.585')]
[2025-03-07 02:12:04,814][06323] Updated weights for policy 0, policy_version 390 (0.0022)
[2025-03-07 02:12:06,437][01872] Fps is (10 sec: 3688.3, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 1601536. Throughput: 0: 958.4. Samples: 400054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:12:06,442][01872] Avg episode reward: [(0, '7.719')]
[2025-03-07 02:12:11,437][01872] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 1622016. Throughput: 0: 956.8. Samples: 406438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:12:11,439][01872] Avg episode reward: [(0, '8.709')]
[2025-03-07 02:12:11,450][06307] Saving new best policy, reward=8.709!
[2025-03-07 02:12:15,916][06323] Updated weights for policy 0, policy_version 400 (0.0020)
[2025-03-07 02:12:16,440][01872] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3929.4). Total num frames: 1638400. Throughput: 0: 952.7. Samples: 408300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:12:16,445][01872] Avg episode reward: [(0, '9.493')]
[2025-03-07 02:12:16,450][06307] Saving new best policy, reward=9.493!
[2025-03-07 02:12:21,437][01872] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 1650688. Throughput: 0: 924.4. Samples: 412920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:12:21,443][01872] Avg episode reward: [(0, '9.496')]
[2025-03-07 02:12:21,463][06307] Saving new best policy, reward=9.496!
[2025-03-07 02:12:26,437][01872] Fps is (10 sec: 3277.6, 60 sec: 3686.4, 300 sec: 3887.8). Total num frames: 1671168. Throughput: 0: 916.7. Samples: 418912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:12:26,441][01872] Avg episode reward: [(0, '10.006')]
[2025-03-07 02:12:26,444][06307] Saving new best policy, reward=10.006!
[2025-03-07 02:12:28,454][06323] Updated weights for policy 0, policy_version 410 (0.0031)
[2025-03-07 02:12:31,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 1687552. Throughput: 0: 918.0. Samples: 420854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:12:31,442][01872] Avg episode reward: [(0, '9.511')]
[2025-03-07 02:12:36,439][01872] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3887.7). Total num frames: 1712128. Throughput: 0: 929.9. Samples: 427418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:12:36,443][01872] Avg episode reward: [(0, '10.779')]
[2025-03-07 02:12:36,446][06307] Saving new best policy, reward=10.779!
[2025-03-07 02:12:37,968][06323] Updated weights for policy 0, policy_version 420 (0.0015)
[2025-03-07 02:12:41,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3887.8). Total num frames: 1732608. Throughput: 0: 927.5. Samples: 433716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:12:41,440][01872] Avg episode reward: [(0, '10.452')]
[2025-03-07 02:12:41,450][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000423_1732608.pth...
[2025-03-07 02:12:41,615][06307] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000199_815104.pth
[2025-03-07 02:12:46,437][01872] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1748992. Throughput: 0: 931.4. Samples: 435840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:12:46,442][01872] Avg episode reward: [(0, '10.750')]
[2025-03-07 02:12:48,535][06323] Updated weights for policy 0, policy_version 430 (0.0019)
[2025-03-07 02:12:51,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1773568. Throughput: 0: 947.3. Samples: 442682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:12:51,442][01872] Avg episode reward: [(0, '11.059')]
[2025-03-07 02:12:51,451][06307] Saving new best policy, reward=11.059!
[2025-03-07 02:12:56,438][01872] Fps is (10 sec: 4505.2, 60 sec: 3823.2, 300 sec: 3887.7). Total num frames: 1794048. Throughput: 0: 944.2. Samples: 448930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:12:56,443][01872] Avg episode reward: [(0, '11.108')]
[2025-03-07 02:12:56,446][06307] Saving new best policy, reward=11.108!
[2025-03-07 02:12:59,267][06323] Updated weights for policy 0, policy_version 440 (0.0034)
[2025-03-07 02:13:01,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1810432. Throughput: 0: 947.6. Samples: 450938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:13:01,439][01872] Avg episode reward: [(0, '11.174')]
[2025-03-07 02:13:01,447][06307] Saving new best policy, reward=11.174!
[2025-03-07 02:13:06,439][01872] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 1835008. Throughput: 0: 996.3. Samples: 457754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:13:06,441][01872] Avg episode reward: [(0, '10.772')]
[2025-03-07 02:13:08,103][06323] Updated weights for policy 0, policy_version 450 (0.0025)
[2025-03-07 02:13:11,439][01872] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3873.8). Total num frames: 1851392. Throughput: 0: 1001.8. Samples: 463996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:13:11,442][01872] Avg episode reward: [(0, '10.779')]
[2025-03-07 02:13:16,437][01872] Fps is (10 sec: 3687.1, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 1871872. Throughput: 0: 1007.2. Samples: 466180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:13:16,439][01872] Avg episode reward: [(0, '12.266')]
[2025-03-07 02:13:16,441][06307] Saving new best policy, reward=12.266!
[2025-03-07 02:13:18,763][06323] Updated weights for policy 0, policy_version 460 (0.0025)
[2025-03-07 02:13:21,437][01872] Fps is (10 sec: 4506.4, 60 sec: 4096.0, 300 sec: 3873.8). Total num frames: 1896448. Throughput: 0: 1014.6. Samples: 473072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:13:21,439][01872] Avg episode reward: [(0, '12.852')]
[2025-03-07 02:13:21,446][06307] Saving new best policy, reward=12.852!
[2025-03-07 02:13:26,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1912832. Throughput: 0: 1008.0. Samples: 479076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:13:26,440][01872] Avg episode reward: [(0, '13.182')]
[2025-03-07 02:13:26,445][06307] Saving new best policy, reward=13.182!
[2025-03-07 02:13:29,348][06323] Updated weights for policy 0, policy_version 470 (0.0021)
[2025-03-07 02:13:31,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 1933312. Throughput: 0: 1013.5. Samples: 481448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:13:31,443][01872] Avg episode reward: [(0, '13.792')]
[2025-03-07 02:13:31,450][06307] Saving new best policy, reward=13.792!
[2025-03-07 02:13:36,438][01872] Fps is (10 sec: 4505.5, 60 sec: 4096.1, 300 sec: 3873.8). Total num frames: 1957888. Throughput: 0: 1018.7. Samples: 488524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:13:36,443][01872] Avg episode reward: [(0, '14.605')]
[2025-03-07 02:13:36,448][06307] Saving new best policy, reward=14.605!
[2025-03-07 02:13:38,144][06323] Updated weights for policy 0, policy_version 480 (0.0015)
[2025-03-07 02:13:41,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3873.9). Total num frames: 1974272. Throughput: 0: 1010.6. Samples: 494406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:13:41,445][01872] Avg episode reward: [(0, '14.707')]
[2025-03-07 02:13:41,454][06307] Saving new best policy, reward=14.707!
[2025-03-07 02:13:46,438][01872] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 1994752. Throughput: 0: 1021.5. Samples: 496904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:13:46,439][01872] Avg episode reward: [(0, '14.390')]
[2025-03-07 02:13:48,710][06323] Updated weights for policy 0, policy_version 490 (0.0016)
[2025-03-07 02:13:51,438][01872] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3873.8). Total num frames: 2019328. Throughput: 0: 1024.2. Samples: 503840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:13:51,439][01872] Avg episode reward: [(0, '14.480')]
[2025-03-07 02:13:56,439][01872] Fps is (10 sec: 4095.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2035712. Throughput: 0: 1012.8. Samples: 509572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:13:56,440][01872] Avg episode reward: [(0, '14.496')]
[2025-03-07 02:13:59,166][06323] Updated weights for policy 0, policy_version 500 (0.0014)
[2025-03-07 02:14:01,437][01872] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 2056192. Throughput: 0: 1025.4. Samples: 512322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:14:01,438][01872] Avg episode reward: [(0, '14.272')]
[2025-03-07 02:14:06,437][01872] Fps is (10 sec: 4506.2, 60 sec: 4096.1, 300 sec: 3873.8). Total num frames: 2080768. Throughput: 0: 1029.0. Samples: 519378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:14:06,439][01872] Avg episode reward: [(0, '15.949')]
[2025-03-07 02:14:06,444][06307] Saving new best policy, reward=15.949!
[2025-03-07 02:14:08,055][06323] Updated weights for policy 0, policy_version 510 (0.0029)
[2025-03-07 02:14:11,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3887.7). Total num frames: 2097152. Throughput: 0: 1018.9. Samples: 524928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:14:11,439][01872] Avg episode reward: [(0, '16.674')]
[2025-03-07 02:14:11,449][06307] Saving new best policy, reward=16.674!
[2025-03-07 02:14:16,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2117632. Throughput: 0: 1028.8. Samples: 527742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:14:16,439][01872] Avg episode reward: [(0, '17.181')]
[2025-03-07 02:14:16,442][06307] Saving new best policy, reward=17.181!
[2025-03-07 02:14:18,743][06323] Updated weights for policy 0, policy_version 520 (0.0017)
[2025-03-07 02:14:21,437][01872] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2142208. Throughput: 0: 1022.4. Samples: 534530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:14:21,439][01872] Avg episode reward: [(0, '17.512')]
[2025-03-07 02:14:21,448][06307] Saving new best policy, reward=17.512!
[2025-03-07 02:14:26,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2158592. Throughput: 0: 1011.2. Samples: 539912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:14:26,440][01872] Avg episode reward: [(0, '16.117')]
[2025-03-07 02:14:29,291][06323] Updated weights for policy 0, policy_version 530 (0.0024)
[2025-03-07 02:14:31,438][01872] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2179072. Throughput: 0: 1023.5. Samples: 542962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:14:31,439][01872] Avg episode reward: [(0, '17.361')]
[2025-03-07 02:14:36,437][01872] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2203648. Throughput: 0: 1028.1. Samples: 550102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:14:36,444][01872] Avg episode reward: [(0, '17.367')]
[2025-03-07 02:14:38,536][06323] Updated weights for policy 0, policy_version 540 (0.0018)
[2025-03-07 02:14:41,437][01872] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2215936. Throughput: 0: 1018.0. Samples: 555380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:14:41,446][01872] Avg episode reward: [(0, '16.800')]
[2025-03-07 02:14:41,455][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000542_2220032.pth...
[2025-03-07 02:14:41,643][06307] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth
[2025-03-07 02:14:46,438][01872] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2240512. Throughput: 0: 1029.2. Samples: 558636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:14:46,440][01872] Avg episode reward: [(0, '16.587')]
[2025-03-07 02:14:48,575][06323] Updated weights for policy 0, policy_version 550 (0.0037)
[2025-03-07 02:14:51,437][01872] Fps is (10 sec: 4915.3, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2265088. Throughput: 0: 1026.1. Samples: 565552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:14:51,439][01872] Avg episode reward: [(0, '15.477')]
[2025-03-07 02:14:56,437][01872] Fps is (10 sec: 4096.1, 60 sec: 4096.1, 300 sec: 3901.6). Total num frames: 2281472. Throughput: 0: 1017.0. Samples: 570694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:14:56,439][01872] Avg episode reward: [(0, '15.327')]
[2025-03-07 02:14:59,074][06323] Updated weights for policy 0, policy_version 560 (0.0026)
[2025-03-07 02:15:01,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2301952. Throughput: 0: 1030.0. Samples: 574094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:15:01,442][01872] Avg episode reward: [(0, '16.907')]
[2025-03-07 02:15:06,437][01872] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2326528. Throughput: 0: 1036.2. Samples: 581160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:15:06,441][01872] Avg episode reward: [(0, '17.255')]
[2025-03-07 02:15:08,340][06323] Updated weights for policy 0, policy_version 570 (0.0018)
[2025-03-07 02:15:11,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 2342912. Throughput: 0: 1027.4. Samples: 586146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:15:11,440][01872] Avg episode reward: [(0, '17.450')]
[2025-03-07 02:15:16,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 2363392. Throughput: 0: 1037.2. Samples: 589634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:15:16,442][01872] Avg episode reward: [(0, '18.211')]
[2025-03-07 02:15:16,490][06307] Saving new best policy, reward=18.211!
[2025-03-07 02:15:18,675][06323] Updated weights for policy 0, policy_version 580 (0.0029)
[2025-03-07 02:15:21,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2379776. Throughput: 0: 1006.4. Samples: 595392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:15:21,441][01872] Avg episode reward: [(0, '18.127')]
[2025-03-07 02:15:26,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2396160. Throughput: 0: 979.1. Samples: 599438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:15:26,442][01872] Avg episode reward: [(0, '17.620')]
[2025-03-07 02:15:30,529][06323] Updated weights for policy 0, policy_version 590 (0.0035)
[2025-03-07 02:15:31,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3915.5). Total num frames: 2420736. Throughput: 0: 984.7. Samples: 602946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:15:31,439][01872] Avg episode reward: [(0, '16.498')]
[2025-03-07 02:15:36,439][01872] Fps is (10 sec: 4504.6, 60 sec: 3959.3, 300 sec: 3915.5). Total num frames: 2441216. Throughput: 0: 990.4. Samples: 610122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-03-07 02:15:36,441][01872] Avg episode reward: [(0, '16.106')]
[2025-03-07 02:15:41,019][06323] Updated weights for policy 0, policy_version 600 (0.0029)
[2025-03-07 02:15:41,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2457600. Throughput: 0: 985.9. Samples: 615058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:15:41,444][01872] Avg episode reward: [(0, '16.428')]
[2025-03-07 02:15:46,437][01872] Fps is (10 sec: 4096.9, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2482176. Throughput: 0: 990.4. Samples: 618664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:15:46,441][01872] Avg episode reward: [(0, '16.871')]
[2025-03-07 02:15:49,662][06323] Updated weights for policy 0, policy_version 610 (0.0020)
[2025-03-07 02:15:51,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2502656. Throughput: 0: 987.8. Samples: 625610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:15:51,439][01872] Avg episode reward: [(0, '18.193')]
[2025-03-07 02:15:56,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2519040. Throughput: 0: 982.8. Samples: 630372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:15:56,441][01872] Avg episode reward: [(0, '18.253')]
[2025-03-07 02:15:56,446][06307] Saving new best policy, reward=18.253!
[2025-03-07 02:16:00,453][06323] Updated weights for policy 0, policy_version 620 (0.0027)
[2025-03-07 02:16:01,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2543616. Throughput: 0: 981.0. Samples: 633778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:16:01,440][01872] Avg episode reward: [(0, '18.894')]
[2025-03-07 02:16:01,449][06307] Saving new best policy, reward=18.894!
[2025-03-07 02:16:06,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2564096. Throughput: 0: 1004.7. Samples: 640602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:16:06,441][01872] Avg episode reward: [(0, '18.355')]
[2025-03-07 02:16:11,209][06323] Updated weights for policy 0, policy_version 630 (0.0023)
[2025-03-07 02:16:11,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2580480. Throughput: 0: 1021.0. Samples: 645382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:16:11,443][01872] Avg episode reward: [(0, '18.029')]
[2025-03-07 02:16:16,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2600960. Throughput: 0: 1018.3. Samples: 648768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-03-07 02:16:16,441][01872] Avg episode reward: [(0, '17.022')]
[2025-03-07 02:16:20,731][06323] Updated weights for policy 0, policy_version 640 (0.0024)
[2025-03-07 02:16:21,441][01872] Fps is (10 sec: 4094.6, 60 sec: 4027.5, 300 sec: 3971.0). Total num frames: 2621440. Throughput: 0: 1004.7. Samples: 655336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:16:21,442][01872] Avg episode reward: [(0, '16.984')]
[2025-03-07 02:16:26,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2637824. Throughput: 0: 995.9. Samples: 659872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:16:26,439][01872] Avg episode reward: [(0, '17.241')]
[2025-03-07 02:16:31,437][01872] Fps is (10 sec: 3687.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2658304. Throughput: 0: 987.4. Samples: 663096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:16:31,439][01872] Avg episode reward: [(0, '17.059')]
[2025-03-07 02:16:31,758][06323] Updated weights for policy 0, policy_version 650 (0.0022)
[2025-03-07 02:16:36,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3971.0). Total num frames: 2678784. Throughput: 0: 981.3. Samples: 669768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:16:36,445][01872] Avg episode reward: [(0, '16.332')]
[2025-03-07 02:16:41,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2695168. Throughput: 0: 975.4. Samples: 674264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:16:41,443][01872] Avg episode reward: [(0, '17.730')]
[2025-03-07 02:16:41,452][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000658_2695168.pth...
[2025-03-07 02:16:41,589][06307] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000423_1732608.pth
[2025-03-07 02:16:42,988][06323] Updated weights for policy 0, policy_version 660 (0.0026)
[2025-03-07 02:16:46,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2715648. Throughput: 0: 972.9. Samples: 677558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:16:46,439][01872] Avg episode reward: [(0, '17.535')]
[2025-03-07 02:16:51,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 2736128. Throughput: 0: 965.4. Samples: 684046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:16:51,439][01872] Avg episode reward: [(0, '17.839')]
[2025-03-07 02:16:53,893][06323] Updated weights for policy 0, policy_version 670 (0.0026)
[2025-03-07 02:16:56,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2752512. Throughput: 0: 965.8. Samples: 688844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:16:56,439][01872] Avg episode reward: [(0, '18.309')]
[2025-03-07 02:17:01,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 2777088. Throughput: 0: 967.6. Samples: 692308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:17:01,440][01872] Avg episode reward: [(0, '19.786')]
[2025-03-07 02:17:01,446][06307] Saving new best policy, reward=19.786!
[2025-03-07 02:17:03,208][06323] Updated weights for policy 0, policy_version 680 (0.0026)
[2025-03-07 02:17:06,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 2793472. Throughput: 0: 962.1. Samples: 698628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:17:06,443][01872] Avg episode reward: [(0, '18.904')]
[2025-03-07 02:17:11,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3971.1). Total num frames: 2809856. Throughput: 0: 971.6. Samples: 703594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:17:11,440][01872] Avg episode reward: [(0, '19.364')]
[2025-03-07 02:17:14,169][06323] Updated weights for policy 0, policy_version 690 (0.0013)
[2025-03-07 02:17:16,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2834432. Throughput: 0: 974.3. Samples: 706940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:17:16,440][01872] Avg episode reward: [(0, '18.781')]
[2025-03-07 02:17:21,438][01872] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3998.8). Total num frames: 2850816. Throughput: 0: 963.3. Samples: 713116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:17:21,439][01872] Avg episode reward: [(0, '19.895')]
[2025-03-07 02:17:21,444][06307] Saving new best policy, reward=19.895!
[2025-03-07 02:17:25,468][06323] Updated weights for policy 0, policy_version 700 (0.0014)
[2025-03-07 02:17:26,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2871296. Throughput: 0: 972.4. Samples: 718020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:17:26,439][01872] Avg episode reward: [(0, '18.685')]
[2025-03-07 02:17:31,437][01872] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2891776. Throughput: 0: 974.5. Samples: 721412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:17:31,439][01872] Avg episode reward: [(0, '20.637')]
[2025-03-07 02:17:31,447][06307] Saving new best policy, reward=20.637!
[2025-03-07 02:17:34,709][06323] Updated weights for policy 0, policy_version 710 (0.0027)
[2025-03-07 02:17:36,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2912256. Throughput: 0: 970.5. Samples: 727720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:17:36,439][01872] Avg episode reward: [(0, '20.865')]
[2025-03-07 02:17:36,444][06307] Saving new best policy, reward=20.865!
[2025-03-07 02:17:41,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2928640. Throughput: 0: 982.8. Samples: 733072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:17:41,447][01872] Avg episode reward: [(0, '20.512')]
[2025-03-07 02:17:45,173][06323] Updated weights for policy 0, policy_version 720 (0.0017)
[2025-03-07 02:17:46,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2953216. Throughput: 0: 982.4. Samples: 736516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:17:46,442][01872] Avg episode reward: [(0, '21.774')]
[2025-03-07 02:17:46,447][06307] Saving new best policy, reward=21.774!
[2025-03-07 02:17:51,438][01872] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 2969600. Throughput: 0: 974.3. Samples: 742470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:17:51,439][01872] Avg episode reward: [(0, '21.218')]
[2025-03-07 02:17:56,012][06323] Updated weights for policy 0, policy_version 730 (0.0019)
[2025-03-07 02:17:56,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2990080. Throughput: 0: 986.0. Samples: 747966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:17:56,442][01872] Avg episode reward: [(0, '19.889')]
[2025-03-07 02:18:01,437][01872] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3014656. Throughput: 0: 987.9. Samples: 751396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:18:01,439][01872] Avg episode reward: [(0, '20.091')]
[2025-03-07 02:18:06,307][06323] Updated weights for policy 0, policy_version 740 (0.0032)
[2025-03-07 02:18:06,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3031040. Throughput: 0: 985.6. Samples: 757470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:18:06,440][01872] Avg episode reward: [(0, '18.795')]
[2025-03-07 02:18:11,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3051520. Throughput: 0: 1002.8. Samples: 763148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:18:11,439][01872] Avg episode reward: [(0, '18.436')]
[2025-03-07 02:18:15,684][06323] Updated weights for policy 0, policy_version 750 (0.0028)
[2025-03-07 02:18:16,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3072000. Throughput: 0: 1003.5. Samples: 766568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:18:16,439][01872] Avg episode reward: [(0, '19.787')]
[2025-03-07 02:18:21,438][01872] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 3088384. Throughput: 0: 983.7. Samples: 771988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:18:21,439][01872] Avg episode reward: [(0, '19.936')]
[2025-03-07 02:18:26,437][01872] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3100672. Throughput: 0: 955.2. Samples: 776054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:18:26,439][01872] Avg episode reward: [(0, '19.151')]
[2025-03-07 02:18:28,262][06323] Updated weights for policy 0, policy_version 760 (0.0026)
[2025-03-07 02:18:31,437][01872] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3125248. Throughput: 0: 957.9. Samples: 779620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:18:31,439][01872] Avg episode reward: [(0, '19.299')]
[2025-03-07 02:18:36,439][01872] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3971.0). Total num frames: 3145728. Throughput: 0: 979.7. Samples: 786556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:18:36,444][01872] Avg episode reward: [(0, '19.344')]
[2025-03-07 02:18:38,547][06323] Updated weights for policy 0, policy_version 770 (0.0024)
[2025-03-07 02:18:41,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3162112. Throughput: 0: 966.4. Samples: 791452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:18:41,442][01872] Avg episode reward: [(0, '20.669')]
[2025-03-07 02:18:41,450][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000772_3162112.pth...
[2025-03-07 02:18:41,584][06307] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000542_2220032.pth
[2025-03-07 02:18:46,439][01872] Fps is (10 sec: 4096.1, 60 sec: 3891.1, 300 sec: 3957.1). Total num frames: 3186688. Throughput: 0: 967.4. Samples: 794932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:18:46,446][01872] Avg episode reward: [(0, '20.986')]
[2025-03-07 02:18:47,777][06323] Updated weights for policy 0, policy_version 780 (0.0019)
[2025-03-07 02:18:51,438][01872] Fps is (10 sec: 4505.3, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 3207168. Throughput: 0: 980.6. Samples: 801596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:18:51,443][01872] Avg episode reward: [(0, '22.010')]
[2025-03-07 02:18:51,450][06307] Saving new best policy, reward=22.010!
[2025-03-07 02:18:56,437][01872] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3223552. Throughput: 0: 967.3. Samples: 806676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:18:56,443][01872] Avg episode reward: [(0, '22.611')]
[2025-03-07 02:18:56,448][06307] Saving new best policy, reward=22.611!
[2025-03-07 02:18:58,513][06323] Updated weights for policy 0, policy_version 790 (0.0013)
[2025-03-07 02:19:01,437][01872] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3248128. Throughput: 0: 967.5. Samples: 810104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:19:01,444][01872] Avg episode reward: [(0, '23.261')]
[2025-03-07 02:19:01,451][06307] Saving new best policy, reward=23.261!
[2025-03-07 02:19:06,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3264512. Throughput: 0: 992.9. Samples: 816668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:19:06,440][01872] Avg episode reward: [(0, '21.336')]
[2025-03-07 02:19:09,206][06323] Updated weights for policy 0, policy_version 800 (0.0027)
[2025-03-07 02:19:11,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3284992. Throughput: 0: 1017.0. Samples: 821818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:19:11,442][01872] Avg episode reward: [(0, '21.320')]
[2025-03-07 02:19:16,438][01872] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 3309568. Throughput: 0: 1013.8. Samples: 825240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:19:16,443][01872] Avg episode reward: [(0, '22.015')]
[2025-03-07 02:19:18,210][06323] Updated weights for policy 0, policy_version 810 (0.0019)
[2025-03-07 02:19:21,439][01872] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 3325952. Throughput: 0: 996.3. Samples: 831388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:19:21,443][01872] Avg episode reward: [(0, '22.642')]
[2025-03-07 02:19:26,437][01872] Fps is (10 sec: 3276.9, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3342336. Throughput: 0: 1004.8. Samples: 836670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:19:26,443][01872] Avg episode reward: [(0, '21.670')]
[2025-03-07 02:19:29,121][06323] Updated weights for policy 0, policy_version 820 (0.0036)
[2025-03-07 02:19:31,437][01872] Fps is (10 sec: 4096.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3366912. Throughput: 0: 1004.1. Samples: 840116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:19:31,443][01872] Avg episode reward: [(0, '22.798')]
[2025-03-07 02:19:36,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3957.2). Total num frames: 3383296. Throughput: 0: 993.7. Samples: 846310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:19:36,441][01872] Avg episode reward: [(0, '22.851')]
[2025-03-07 02:19:39,950][06323] Updated weights for policy 0, policy_version 830 (0.0023)
[2025-03-07 02:19:41,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3403776. Throughput: 0: 1001.0. Samples: 851720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:19:41,443][01872] Avg episode reward: [(0, '23.049')]
[2025-03-07 02:19:46,437][01872] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 3428352. Throughput: 0: 1000.5. Samples: 855128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:19:46,439][01872] Avg episode reward: [(0, '21.827')]
[2025-03-07 02:19:49,335][06323] Updated weights for policy 0, policy_version 840 (0.0022)
[2025-03-07 02:19:51,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3444736. Throughput: 0: 988.2. Samples: 861138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:19:51,439][01872] Avg episode reward: [(0, '22.493')]
[2025-03-07 02:19:56,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3465216. Throughput: 0: 994.9. Samples: 866588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:19:56,442][01872] Avg episode reward: [(0, '21.036')]
[2025-03-07 02:19:59,860][06323] Updated weights for policy 0, policy_version 850 (0.0021)
[2025-03-07 02:20:01,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3485696. Throughput: 0: 994.0. Samples: 869972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:20:01,443][01872] Avg episode reward: [(0, '22.672')]
[2025-03-07 02:20:06,441][01872] Fps is (10 sec: 4094.6, 60 sec: 4027.5, 300 sec: 3943.2). Total num frames: 3506176. Throughput: 0: 994.8. Samples: 876156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:20:06,442][01872] Avg episode reward: [(0, '22.255')]
[2025-03-07 02:20:10,599][06323] Updated weights for policy 0, policy_version 860 (0.0025)
[2025-03-07 02:20:11,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3522560. Throughput: 0: 997.2. Samples: 881546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:20:11,442][01872] Avg episode reward: [(0, '23.232')]
[2025-03-07 02:20:16,437][01872] Fps is (10 sec: 4097.5, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3547136. Throughput: 0: 997.4. Samples: 885000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:20:16,440][01872] Avg episode reward: [(0, '23.643')]
[2025-03-07 02:20:16,443][06307] Saving new best policy, reward=23.643!
[2025-03-07 02:20:20,879][06323] Updated weights for policy 0, policy_version 870 (0.0029)
[2025-03-07 02:20:21,440][01872] Fps is (10 sec: 4095.0, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 3563520. Throughput: 0: 989.6. Samples: 890846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:20:21,443][01872] Avg episode reward: [(0, '24.750')]
[2025-03-07 02:20:21,457][06307] Saving new best policy, reward=24.750!
[2025-03-07 02:20:26,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3584000. Throughput: 0: 994.8. Samples: 896488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:20:26,439][01872] Avg episode reward: [(0, '25.947')]
[2025-03-07 02:20:26,445][06307] Saving new best policy, reward=25.947!
[2025-03-07 02:20:30,594][06323] Updated weights for policy 0, policy_version 880 (0.0015)
[2025-03-07 02:20:31,437][01872] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3604480. Throughput: 0: 993.5. Samples: 899836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:20:31,439][01872] Avg episode reward: [(0, '27.042')]
[2025-03-07 02:20:31,457][06307] Saving new best policy, reward=27.042!
[2025-03-07 02:20:36,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3620864. Throughput: 0: 990.5. Samples: 905712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:20:36,439][01872] Avg episode reward: [(0, '25.963')]
[2025-03-07 02:20:41,152][06323] Updated weights for policy 0, policy_version 890 (0.0022)
[2025-03-07 02:20:41,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3645440. Throughput: 0: 1003.5. Samples: 911744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:20:41,439][01872] Avg episode reward: [(0, '25.881')]
[2025-03-07 02:20:41,445][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000890_3645440.pth...
[2025-03-07 02:20:41,581][06307] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000658_2695168.pth
[2025-03-07 02:20:46,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3665920. Throughput: 0: 1001.3. Samples: 915032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:20:46,446][01872] Avg episode reward: [(0, '26.743')]
[2025-03-07 02:20:51,439][01872] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3943.2). Total num frames: 3682304. Throughput: 0: 987.3. Samples: 920584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:20:51,440][01872] Avg episode reward: [(0, '25.015')]
[2025-03-07 02:20:52,164][06323] Updated weights for policy 0, policy_version 900 (0.0020)
[2025-03-07 02:20:56,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3702784. Throughput: 0: 1003.7. Samples: 926712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:20:56,442][01872] Avg episode reward: [(0, '23.703')]
[2025-03-07 02:21:00,900][06323] Updated weights for policy 0, policy_version 910 (0.0025)
[2025-03-07 02:21:01,437][01872] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3727360. Throughput: 0: 1004.8. Samples: 930218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:21:01,444][01872] Avg episode reward: [(0, '23.943')]
[2025-03-07 02:21:06,437][01872] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3943.3). Total num frames: 3743744. Throughput: 0: 1000.4. Samples: 935862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:21:06,439][01872] Avg episode reward: [(0, '24.268')]
[2025-03-07 02:21:11,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3764224. Throughput: 0: 1013.8. Samples: 942110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:21:11,438][01872] Avg episode reward: [(0, '22.636')]
[2025-03-07 02:21:11,497][06323] Updated weights for policy 0, policy_version 920 (0.0018)
[2025-03-07 02:21:16,437][01872] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3788800. Throughput: 0: 1018.0. Samples: 945646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:21:16,439][01872] Avg episode reward: [(0, '22.637')]
[2025-03-07 02:21:21,437][01872] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3957.2). Total num frames: 3805184. Throughput: 0: 1009.6. Samples: 951146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:21:21,439][01872] Avg episode reward: [(0, '23.350')]
[2025-03-07 02:21:22,030][06323] Updated weights for policy 0, policy_version 930 (0.0019)
[2025-03-07 02:21:26,437][01872] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3825664. Throughput: 0: 1014.0. Samples: 957376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-03-07 02:21:26,439][01872] Avg episode reward: [(0, '23.673')]
[2025-03-07 02:21:31,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3842048. Throughput: 0: 989.1. Samples: 959542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-03-07 02:21:31,443][01872] Avg episode reward: [(0, '23.163')]
[2025-03-07 02:21:33,499][06323] Updated weights for policy 0, policy_version 940 (0.0043)
[2025-03-07 02:21:36,437][01872] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3858432. Throughput: 0: 967.3. Samples: 964110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-03-07 02:21:36,440][01872] Avg episode reward: [(0, '22.874')]
[2025-03-07 02:21:41,440][01872] Fps is (10 sec: 3685.5, 60 sec: 3891.0, 300 sec: 3943.2). Total num frames: 3878912. Throughput: 0: 975.8. Samples: 970624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:21:41,441][01872] Avg episode reward: [(0, '24.126')]
[2025-03-07 02:21:43,373][06323] Updated weights for policy 0, policy_version 950 (0.0033)
[2025-03-07 02:21:46,437][01872] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3903488. Throughput: 0: 974.0. Samples: 974050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:21:46,440][01872] Avg episode reward: [(0, '24.361')]
[2025-03-07 02:21:51,437][01872] Fps is (10 sec: 3687.3, 60 sec: 3891.3, 300 sec: 3943.3). Total num frames: 3915776. Throughput: 0: 964.0. Samples: 979244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:21:51,439][01872] Avg episode reward: [(0, '24.356')]
[2025-03-07 02:21:54,134][06323] Updated weights for policy 0, policy_version 960 (0.0019)
[2025-03-07 02:21:56,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3940352. Throughput: 0: 972.2. Samples: 985860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-03-07 02:21:56,442][01872] Avg episode reward: [(0, '23.160')]
[2025-03-07 02:22:01,437][01872] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3964928. Throughput: 0: 971.0. Samples: 989340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:22:01,440][01872] Avg episode reward: [(0, '24.279')]
[2025-03-07 02:22:04,026][06323] Updated weights for policy 0, policy_version 970 (0.0022)
[2025-03-07 02:22:06,437][01872] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3977216. Throughput: 0: 962.5. Samples: 994460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-03-07 02:22:06,440][01872] Avg episode reward: [(0, '24.201')]
[2025-03-07 02:22:11,438][01872] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3957.1). Total num frames: 4001792. Throughput: 0: 976.1. Samples: 1001300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-03-07 02:22:11,444][01872] Avg episode reward: [(0, '22.846')]
[2025-03-07 02:22:11,673][06307] Stopping Batcher_0...
[2025-03-07 02:22:11,674][01872] Component Batcher_0 stopped!
[2025-03-07 02:22:11,675][06307] Loop batcher_evt_loop terminating...
[2025-03-07 02:22:11,675][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-07 02:22:11,743][06323] Weights refcount: 2 0
[2025-03-07 02:22:11,747][06323] Stopping InferenceWorker_p0-w0...
[2025-03-07 02:22:11,751][06323] Loop inference_proc0-0_evt_loop terminating...
[2025-03-07 02:22:11,747][01872] Component InferenceWorker_p0-w0 stopped!
[2025-03-07 02:22:11,794][06307] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000772_3162112.pth
[2025-03-07 02:22:11,812][06307] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-07 02:22:11,991][06307] Stopping LearnerWorker_p0...
[2025-03-07 02:22:11,992][06307] Loop learner_proc0_evt_loop terminating...
[2025-03-07 02:22:11,991][01872] Component LearnerWorker_p0 stopped!
[2025-03-07 02:22:12,112][01872] Component RolloutWorker_w1 stopped!
[2025-03-07 02:22:12,118][06320] Stopping RolloutWorker_w1...
[2025-03-07 02:22:12,118][06320] Loop rollout_proc1_evt_loop terminating...
[2025-03-07 02:22:12,129][01872] Component RolloutWorker_w3 stopped!
[2025-03-07 02:22:12,135][06329] Stopping RolloutWorker_w3...
[2025-03-07 02:22:12,135][06329] Loop rollout_proc3_evt_loop terminating...
[2025-03-07 02:22:12,154][01872] Component RolloutWorker_w7 stopped!
[2025-03-07 02:22:12,158][06331] Stopping RolloutWorker_w7...
[2025-03-07 02:22:12,159][06331] Loop rollout_proc7_evt_loop terminating...
[2025-03-07 02:22:12,168][01872] Component RolloutWorker_w5 stopped!
[2025-03-07 02:22:12,172][06332] Stopping RolloutWorker_w5...
[2025-03-07 02:22:12,173][06332] Loop rollout_proc5_evt_loop terminating...
[2025-03-07 02:22:12,192][01872] Component RolloutWorker_w0 stopped!
[2025-03-07 02:22:12,196][01872] Component RolloutWorker_w4 stopped!
[2025-03-07 02:22:12,197][06328] Stopping RolloutWorker_w4...
[2025-03-07 02:22:12,197][06328] Loop rollout_proc4_evt_loop terminating...
[2025-03-07 02:22:12,193][06321] Stopping RolloutWorker_w0...
[2025-03-07 02:22:12,198][06321] Loop rollout_proc0_evt_loop terminating...
[2025-03-07 02:22:12,224][01872] Component RolloutWorker_w6 stopped!
[2025-03-07 02:22:12,225][06330] Stopping RolloutWorker_w6...
[2025-03-07 02:22:12,226][06330] Loop rollout_proc6_evt_loop terminating...
[2025-03-07 02:22:12,252][01872] Component RolloutWorker_w2 stopped!
[2025-03-07 02:22:12,253][01872] Waiting for process learner_proc0 to stop...
[2025-03-07 02:22:12,254][06322] Stopping RolloutWorker_w2...
[2025-03-07 02:22:12,255][06322] Loop rollout_proc2_evt_loop terminating...
[2025-03-07 02:22:13,871][01872] Waiting for process inference_proc0-0 to join...
[2025-03-07 02:22:13,872][01872] Waiting for process rollout_proc0 to join...
[2025-03-07 02:22:16,318][01872] Waiting for process rollout_proc1 to join...
[2025-03-07 02:22:16,319][01872] Waiting for process rollout_proc2 to join...
[2025-03-07 02:22:16,320][01872] Waiting for process rollout_proc3 to join...
[2025-03-07 02:22:16,321][01872] Waiting for process rollout_proc4 to join...
[2025-03-07 02:22:16,322][01872] Waiting for process rollout_proc5 to join...
[2025-03-07 02:22:16,331][01872] Waiting for process rollout_proc6 to join...
[2025-03-07 02:22:16,332][01872] Waiting for process rollout_proc7 to join...
[2025-03-07 02:22:16,333][01872] Batcher 0 profile tree view:
batching: 25.6767, releasing_batches: 0.0254
[2025-03-07 02:22:16,333][01872] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 403.5980
update_model: 8.4069
weight_update: 0.0028
one_step: 0.0029
handle_policy_step: 573.3370
deserialize: 14.0923, stack: 3.1925, obs_to_device_normalize: 120.3982, forward: 296.5736, send_messages: 28.0486
prepare_outputs: 86.6529
to_cpu: 52.9743
[2025-03-07 02:22:16,334][01872] Learner 0 profile tree view:
misc: 0.0044, prepare_batch: 13.1813
train: 72.2140
epoch_init: 0.0083, minibatch_init: 0.0117, losses_postprocess: 0.6433, kl_divergence: 0.5886, after_optimizer: 33.4039
calculate_losses: 25.5714
losses_init: 0.0035, forward_head: 1.3389, bptt_initial: 16.7712, tail: 1.0934, advantages_returns: 0.2788, losses: 3.8426
bptt: 1.9650
bptt_forward_core: 1.8917
update: 11.3895
clip: 0.8564
[2025-03-07 02:22:16,341][01872] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.2936, enqueue_policy_requests: 101.0515, env_step: 809.8159, overhead: 11.8186, complete_rollouts: 7.4814
save_policy_outputs: 18.2375
split_output_tensors: 7.0758
[2025-03-07 02:22:16,342][01872] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.2588, enqueue_policy_requests: 103.3507, env_step: 805.8754, overhead: 12.0878, complete_rollouts: 7.1244
save_policy_outputs: 18.5261
split_output_tensors: 7.1554
[2025-03-07 02:22:16,343][01872] Loop Runner_EvtLoop terminating...
[2025-03-07 02:22:16,344][01872] Runner profile tree view:
main_loop: 1049.6537
[2025-03-07 02:22:16,344][01872] Collected {0: 4005888}, FPS: 3816.4
[2025-03-07 02:22:51,171][01872] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-03-07 02:22:51,172][01872] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-07 02:22:51,175][01872] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-07 02:22:51,176][01872] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-03-07 02:22:51,177][01872] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-07 02:22:51,177][01872] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-07 02:22:51,178][01872] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-03-07 02:22:51,182][01872] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-07 02:22:51,183][01872] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-03-07 02:22:51,184][01872] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-03-07 02:22:51,184][01872] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-07 02:22:51,185][01872] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-07 02:22:51,186][01872] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-07 02:22:51,186][01872] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-07 02:22:51,187][01872] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-07 02:22:51,241][01872] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-03-07 02:22:51,244][01872] RunningMeanStd input shape: (3, 72, 128)
[2025-03-07 02:22:51,245][01872] RunningMeanStd input shape: (1,)
[2025-03-07 02:22:51,265][01872] ConvEncoder: input_channels=3
[2025-03-07 02:22:51,421][01872] Conv encoder output size: 512
[2025-03-07 02:22:51,422][01872] Policy head output size: 512
[2025-03-07 02:22:51,655][01872] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-07 02:22:52,407][01872] Num frames 100...
[2025-03-07 02:22:52,538][01872] Num frames 200...
[2025-03-07 02:22:52,667][01872] Num frames 300...
[2025-03-07 02:22:52,795][01872] Num frames 400...
[2025-03-07 02:22:52,931][01872] Num frames 500...
[2025-03-07 02:22:53,057][01872] Num frames 600...
[2025-03-07 02:22:53,185][01872] Num frames 700...
[2025-03-07 02:22:53,325][01872] Num frames 800...
[2025-03-07 02:22:53,501][01872] Avg episode rewards: #0: 16.960, true rewards: #0: 8.960
[2025-03-07 02:22:53,502][01872] Avg episode reward: 16.960, avg true_objective: 8.960
[2025-03-07 02:22:53,509][01872] Num frames 900...
[2025-03-07 02:22:53,641][01872] Num frames 1000...
[2025-03-07 02:22:53,766][01872] Num frames 1100...
[2025-03-07 02:22:53,889][01872] Num frames 1200...
[2025-03-07 02:22:54,025][01872] Num frames 1300...
[2025-03-07 02:22:54,152][01872] Num frames 1400...
[2025-03-07 02:22:54,280][01872] Num frames 1500...
[2025-03-07 02:22:54,408][01872] Num frames 1600...
[2025-03-07 02:22:54,542][01872] Num frames 1700...
[2025-03-07 02:22:54,668][01872] Num frames 1800...
[2025-03-07 02:22:54,795][01872] Num frames 1900...
[2025-03-07 02:22:54,921][01872] Num frames 2000...
[2025-03-07 02:22:55,058][01872] Num frames 2100...
[2025-03-07 02:22:55,186][01872] Num frames 2200...
[2025-03-07 02:22:55,338][01872] Avg episode rewards: #0: 24.860, true rewards: #0: 11.360
[2025-03-07 02:22:55,339][01872] Avg episode reward: 24.860, avg true_objective: 11.360
[2025-03-07 02:22:55,378][01872] Num frames 2300...
[2025-03-07 02:22:55,509][01872] Num frames 2400...
[2025-03-07 02:22:55,636][01872] Num frames 2500...
[2025-03-07 02:22:55,766][01872] Num frames 2600...
[2025-03-07 02:22:55,891][01872] Num frames 2700...
[2025-03-07 02:22:56,027][01872] Num frames 2800...
[2025-03-07 02:22:56,155][01872] Num frames 2900...
[2025-03-07 02:22:56,284][01872] Num frames 3000...
[2025-03-07 02:22:56,411][01872] Num frames 3100...
[2025-03-07 02:22:56,541][01872] Num frames 3200...
[2025-03-07 02:22:56,671][01872] Num frames 3300...
[2025-03-07 02:22:56,801][01872] Avg episode rewards: #0: 25.200, true rewards: #0: 11.200
[2025-03-07 02:22:56,802][01872] Avg episode reward: 25.200, avg true_objective: 11.200
[2025-03-07 02:22:56,855][01872] Num frames 3400...
[2025-03-07 02:22:56,985][01872] Num frames 3500...
[2025-03-07 02:22:57,121][01872] Num frames 3600...
[2025-03-07 02:22:57,249][01872] Num frames 3700...
[2025-03-07 02:22:57,378][01872] Num frames 3800...
[2025-03-07 02:22:57,513][01872] Num frames 3900...
[2025-03-07 02:22:57,641][01872] Num frames 4000...
[2025-03-07 02:22:57,767][01872] Num frames 4100...
[2025-03-07 02:22:57,899][01872] Num frames 4200...
[2025-03-07 02:22:58,035][01872] Num frames 4300...
[2025-03-07 02:22:58,162][01872] Num frames 4400...
[2025-03-07 02:22:58,289][01872] Num frames 4500...
[2025-03-07 02:22:58,415][01872] Num frames 4600...
[2025-03-07 02:22:58,546][01872] Num frames 4700...
[2025-03-07 02:22:58,648][01872] Avg episode rewards: #0: 27.590, true rewards: #0: 11.840
[2025-03-07 02:22:58,649][01872] Avg episode reward: 27.590, avg true_objective: 11.840
[2025-03-07 02:22:58,734][01872] Num frames 4800...
[2025-03-07 02:22:58,860][01872] Num frames 4900...
[2025-03-07 02:22:58,991][01872] Num frames 5000...
[2025-03-07 02:22:59,135][01872] Num frames 5100...
[2025-03-07 02:22:59,266][01872] Num frames 5200...
[2025-03-07 02:22:59,398][01872] Num frames 5300...
[2025-03-07 02:22:59,530][01872] Num frames 5400...
[2025-03-07 02:22:59,661][01872] Num frames 5500...
[2025-03-07 02:22:59,789][01872] Num frames 5600...
[2025-03-07 02:22:59,921][01872] Num frames 5700...
[2025-03-07 02:23:00,092][01872] Avg episode rewards: #0: 27.566, true rewards: #0: 11.566
[2025-03-07 02:23:00,094][01872] Avg episode reward: 27.566, avg true_objective: 11.566
[2025-03-07 02:23:00,118][01872] Num frames 5800...
[2025-03-07 02:23:00,247][01872] Num frames 5900...
[2025-03-07 02:23:00,379][01872] Num frames 6000...
[2025-03-07 02:23:00,517][01872] Num frames 6100...
[2025-03-07 02:23:00,651][01872] Num frames 6200...
[2025-03-07 02:23:00,779][01872] Num frames 6300...
[2025-03-07 02:23:00,906][01872] Num frames 6400...
[2025-03-07 02:23:01,034][01872] Num frames 6500...
[2025-03-07 02:23:01,165][01872] Num frames 6600...
[2025-03-07 02:23:01,292][01872] Num frames 6700...
[2025-03-07 02:23:01,442][01872] Avg episode rewards: #0: 26.792, true rewards: #0: 11.292
[2025-03-07 02:23:01,443][01872] Avg episode reward: 26.792, avg true_objective: 11.292
[2025-03-07 02:23:01,481][01872] Num frames 6800...
[2025-03-07 02:23:01,605][01872] Num frames 6900...
[2025-03-07 02:23:01,777][01872] Num frames 7000...
[2025-03-07 02:23:01,951][01872] Num frames 7100...
[2025-03-07 02:23:02,127][01872] Num frames 7200...
[2025-03-07 02:23:02,297][01872] Num frames 7300...
[2025-03-07 02:23:02,464][01872] Num frames 7400...
[2025-03-07 02:23:02,639][01872] Num frames 7500...
[2025-03-07 02:23:02,841][01872] Avg episode rewards: #0: 25.557, true rewards: #0: 10.843
[2025-03-07 02:23:02,844][01872] Avg episode reward: 25.557, avg true_objective: 10.843
[2025-03-07 02:23:02,863][01872] Num frames 7600...
[2025-03-07 02:23:03,039][01872] Num frames 7700...
[2025-03-07 02:23:03,223][01872] Num frames 7800...
[2025-03-07 02:23:03,400][01872] Num frames 7900...
[2025-03-07 02:23:03,585][01872] Num frames 8000...
[2025-03-07 02:23:03,762][01872] Num frames 8100...
[2025-03-07 02:23:03,889][01872] Num frames 8200...
[2025-03-07 02:23:04,020][01872] Num frames 8300...
[2025-03-07 02:23:04,148][01872] Num frames 8400...
[2025-03-07 02:23:04,286][01872] Num frames 8500...
[2025-03-07 02:23:04,417][01872] Num frames 8600...
[2025-03-07 02:23:04,548][01872] Num frames 8700...
[2025-03-07 02:23:04,675][01872] Num frames 8800...
[2025-03-07 02:23:04,802][01872] Num frames 8900...
[2025-03-07 02:23:04,981][01872] Avg episode rewards: #0: 26.248, true rewards: #0: 11.247
[2025-03-07 02:23:04,982][01872] Avg episode reward: 26.248, avg true_objective: 11.247
[2025-03-07 02:23:04,987][01872] Num frames 9000...
[2025-03-07 02:23:05,115][01872] Num frames 9100...
[2025-03-07 02:23:05,253][01872] Num frames 9200...
[2025-03-07 02:23:05,378][01872] Num frames 9300...
[2025-03-07 02:23:05,507][01872] Num frames 9400...
[2025-03-07 02:23:05,637][01872] Num frames 9500...
[2025-03-07 02:23:05,764][01872] Num frames 9600...
[2025-03-07 02:23:05,891][01872] Num frames 9700...
[2025-03-07 02:23:06,021][01872] Num frames 9800...
[2025-03-07 02:23:06,149][01872] Num frames 9900...
[2025-03-07 02:23:06,284][01872] Num frames 10000...
[2025-03-07 02:23:06,410][01872] Num frames 10100...
[2025-03-07 02:23:06,539][01872] Num frames 10200...
[2025-03-07 02:23:06,670][01872] Num frames 10300...
[2025-03-07 02:23:06,799][01872] Num frames 10400...
[2025-03-07 02:23:06,929][01872] Num frames 10500...
[2025-03-07 02:23:07,061][01872] Num frames 10600...
[2025-03-07 02:23:07,191][01872] Num frames 10700...
[2025-03-07 02:23:07,330][01872] Num frames 10800...
[2025-03-07 02:23:07,463][01872] Num frames 10900...
[2025-03-07 02:23:07,593][01872] Num frames 11000...
[2025-03-07 02:23:07,772][01872] Avg episode rewards: #0: 29.664, true rewards: #0: 12.331
[2025-03-07 02:23:07,773][01872] Avg episode reward: 29.664, avg true_objective: 12.331
[2025-03-07 02:23:07,777][01872] Num frames 11100...
[2025-03-07 02:23:07,901][01872] Num frames 11200...
[2025-03-07 02:23:08,030][01872] Num frames 11300...
[2025-03-07 02:23:08,157][01872] Num frames 11400...
[2025-03-07 02:23:08,284][01872] Num frames 11500...
[2025-03-07 02:23:08,418][01872] Num frames 11600...
[2025-03-07 02:23:08,546][01872] Num frames 11700...
[2025-03-07 02:23:08,676][01872] Num frames 11800...
[2025-03-07 02:23:08,802][01872] Num frames 11900...
[2025-03-07 02:23:08,932][01872] Num frames 12000...
[2025-03-07 02:23:09,059][01872] Num frames 12100...
[2025-03-07 02:23:09,189][01872] Num frames 12200...
[2025-03-07 02:23:09,320][01872] Num frames 12300...
[2025-03-07 02:23:09,459][01872] Num frames 12400...
[2025-03-07 02:23:09,592][01872] Num frames 12500...
[2025-03-07 02:23:09,736][01872] Avg episode rewards: #0: 30.070, true rewards: #0: 12.570
[2025-03-07 02:23:09,737][01872] Avg episode reward: 30.070, avg true_objective: 12.570
[2025-03-07 02:24:27,157][01872] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-03-07 02:25:57,682][01872] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-03-07 02:25:57,683][01872] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-07 02:25:57,684][01872] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-07 02:25:57,685][01872] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-03-07 02:25:57,686][01872] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-07 02:25:57,687][01872] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-07 02:25:57,688][01872] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-03-07 02:25:57,689][01872] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-07 02:25:57,690][01872] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-03-07 02:25:57,692][01872] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-03-07 02:25:57,693][01872] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-07 02:25:57,694][01872] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-07 02:25:57,695][01872] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-07 02:25:57,696][01872] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-07 02:25:57,697][01872] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-07 02:25:57,721][01872] RunningMeanStd input shape: (3, 72, 128)
[2025-03-07 02:25:57,723][01872] RunningMeanStd input shape: (1,)
[2025-03-07 02:25:57,735][01872] ConvEncoder: input_channels=3
[2025-03-07 02:25:57,768][01872] Conv encoder output size: 512
[2025-03-07 02:25:57,769][01872] Policy head output size: 512
[2025-03-07 02:25:57,787][01872] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-07 02:25:58,228][01872] Num frames 100...
[2025-03-07 02:25:58,359][01872] Num frames 200...
[2025-03-07 02:25:58,497][01872] Num frames 300...
[2025-03-07 02:25:58,626][01872] Num frames 400...
[2025-03-07 02:25:58,758][01872] Num frames 500...
[2025-03-07 02:25:58,934][01872] Avg episode rewards: #0: 9.940, true rewards: #0: 5.940
[2025-03-07 02:25:58,935][01872] Avg episode reward: 9.940, avg true_objective: 5.940
[2025-03-07 02:25:58,944][01872] Num frames 600...
[2025-03-07 02:25:59,072][01872] Num frames 700...
[2025-03-07 02:25:59,196][01872] Num frames 800...
[2025-03-07 02:25:59,320][01872] Num frames 900...
[2025-03-07 02:25:59,446][01872] Num frames 1000...
[2025-03-07 02:25:59,575][01872] Num frames 1100...
[2025-03-07 02:25:59,679][01872] Avg episode rewards: #0: 8.690, true rewards: #0: 5.690
[2025-03-07 02:25:59,680][01872] Avg episode reward: 8.690, avg true_objective: 5.690
[2025-03-07 02:25:59,761][01872] Num frames 1200...
[2025-03-07 02:25:59,886][01872] Num frames 1300...
[2025-03-07 02:26:00,025][01872] Num frames 1400...
[2025-03-07 02:26:00,153][01872] Num frames 1500...
[2025-03-07 02:26:00,280][01872] Num frames 1600...
[2025-03-07 02:26:00,407][01872] Num frames 1700...
[2025-03-07 02:26:00,539][01872] Num frames 1800...
[2025-03-07 02:26:00,666][01872] Num frames 1900...
[2025-03-07 02:26:00,794][01872] Num frames 2000...
[2025-03-07 02:26:00,922][01872] Num frames 2100...
[2025-03-07 02:26:01,053][01872] Num frames 2200...
[2025-03-07 02:26:01,183][01872] Num frames 2300...
[2025-03-07 02:26:01,309][01872] Num frames 2400...
[2025-03-07 02:26:01,440][01872] Num frames 2500...
[2025-03-07 02:26:01,611][01872] Num frames 2600...
[2025-03-07 02:26:01,791][01872] Num frames 2700...
[2025-03-07 02:26:01,969][01872] Num frames 2800...
[2025-03-07 02:26:02,086][01872] Avg episode rewards: #0: 17.113, true rewards: #0: 9.447
[2025-03-07 02:26:02,088][01872] Avg episode reward: 17.113, avg true_objective: 9.447
[2025-03-07 02:26:10,516][01872] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-03-07 02:26:10,516][01872] Overriding arg 'num_workers' with value 1 passed from command line
[2025-03-07 02:26:10,517][01872] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-03-07 02:26:10,519][01872] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-03-07 02:26:10,519][01872] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-03-07 02:26:10,520][01872] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-03-07 02:26:10,521][01872] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-03-07 02:26:10,522][01872] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-03-07 02:26:10,523][01872] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-03-07 02:26:10,524][01872] Adding new argument 'hf_repository'='Subarashi/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-03-07 02:26:10,525][01872] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-03-07 02:26:10,526][01872] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-03-07 02:26:10,527][01872] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-03-07 02:26:10,528][01872] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-03-07 02:26:10,529][01872] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-03-07 02:26:10,557][01872] RunningMeanStd input shape: (3, 72, 128)
[2025-03-07 02:26:10,559][01872] RunningMeanStd input shape: (1,)
[2025-03-07 02:26:10,570][01872] ConvEncoder: input_channels=3
[2025-03-07 02:26:10,603][01872] Conv encoder output size: 512
[2025-03-07 02:26:10,604][01872] Policy head output size: 512
[2025-03-07 02:26:10,621][01872] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-03-07 02:26:11,051][01872] Num frames 100...
[2025-03-07 02:26:11,179][01872] Num frames 200...
[2025-03-07 02:26:11,310][01872] Num frames 300...
[2025-03-07 02:26:11,433][01872] Num frames 400...
[2025-03-07 02:26:11,569][01872] Num frames 500...
[2025-03-07 02:26:11,702][01872] Num frames 600...
[2025-03-07 02:26:11,848][01872] Avg episode rewards: #0: 11.720, true rewards: #0: 6.720
[2025-03-07 02:26:11,849][01872] Avg episode reward: 11.720, avg true_objective: 6.720
[2025-03-07 02:26:11,888][01872] Num frames 700...
[2025-03-07 02:26:12,014][01872] Num frames 800...
[2025-03-07 02:26:12,143][01872] Num frames 900...
[2025-03-07 02:26:12,280][01872] Num frames 1000...
[2025-03-07 02:26:12,408][01872] Num frames 1100...
[2025-03-07 02:26:12,538][01872] Num frames 1200...
[2025-03-07 02:26:12,663][01872] Num frames 1300...
[2025-03-07 02:26:12,790][01872] Num frames 1400...
[2025-03-07 02:26:12,917][01872] Num frames 1500...
[2025-03-07 02:26:13,042][01872] Num frames 1600...
[2025-03-07 02:26:13,219][01872] Num frames 1700...
[2025-03-07 02:26:13,331][01872] Avg episode rewards: #0: 20.140, true rewards: #0: 8.640
[2025-03-07 02:26:13,332][01872] Avg episode reward: 20.140, avg true_objective: 8.640
[2025-03-07 02:26:13,452][01872] Num frames 1800...
[2025-03-07 02:26:13,621][01872] Num frames 1900...
[2025-03-07 02:26:13,786][01872] Num frames 2000...
[2025-03-07 02:26:13,956][01872] Num frames 2100...
[2025-03-07 02:26:14,121][01872] Num frames 2200...
[2025-03-07 02:26:14,285][01872] Num frames 2300...
[2025-03-07 02:26:14,467][01872] Num frames 2400...
[2025-03-07 02:26:14,648][01872] Num frames 2500...
[2025-03-07 02:26:14,820][01872] Num frames 2600...
[2025-03-07 02:26:14,978][01872] Avg episode rewards: #0: 18.853, true rewards: #0: 8.853
[2025-03-07 02:26:14,979][01872] Avg episode reward: 18.853, avg true_objective: 8.853
[2025-03-07 02:26:15,060][01872] Num frames 2700...
[2025-03-07 02:26:15,209][01872] Num frames 2800...
[2025-03-07 02:26:15,343][01872] Num frames 2900...
[2025-03-07 02:26:15,470][01872] Num frames 3000...
[2025-03-07 02:26:15,597][01872] Num frames 3100...
[2025-03-07 02:26:15,723][01872] Num frames 3200...
[2025-03-07 02:26:15,843][01872] Avg episode rewards: #0: 17.128, true rewards: #0: 8.127
[2025-03-07 02:26:15,843][01872] Avg episode reward: 17.128, avg true_objective: 8.127
[2025-03-07 02:26:15,908][01872] Num frames 3300...
[2025-03-07 02:26:16,034][01872] Num frames 3400...
[2025-03-07 02:26:16,163][01872] Num frames 3500...
[2025-03-07 02:26:16,291][01872] Num frames 3600...
[2025-03-07 02:26:16,425][01872] Num frames 3700...
[2025-03-07 02:26:16,555][01872] Num frames 3800...
[2025-03-07 02:26:16,680][01872] Num frames 3900...
[2025-03-07 02:26:16,807][01872] Num frames 4000...
[2025-03-07 02:26:16,963][01872] Avg episode rewards: #0: 16.762, true rewards: #0: 8.162
[2025-03-07 02:26:16,964][01872] Avg episode reward: 16.762, avg true_objective: 8.162
[2025-03-07 02:26:16,989][01872] Num frames 4100...
[2025-03-07 02:26:17,115][01872] Num frames 4200...
[2025-03-07 02:26:17,238][01872] Num frames 4300...
[2025-03-07 02:26:17,361][01872] Num frames 4400...
[2025-03-07 02:26:17,504][01872] Num frames 4500...
[2025-03-07 02:26:17,596][01872] Avg episode rewards: #0: 14.882, true rewards: #0: 7.548
[2025-03-07 02:26:17,596][01872] Avg episode reward: 14.882, avg true_objective: 7.548
[2025-03-07 02:26:17,683][01872] Num frames 4600...
[2025-03-07 02:26:17,807][01872] Num frames 4700...
[2025-03-07 02:26:17,933][01872] Num frames 4800...
[2025-03-07 02:26:18,061][01872] Num frames 4900...
[2025-03-07 02:26:18,187][01872] Num frames 5000...
[2025-03-07 02:26:18,316][01872] Num frames 5100...
[2025-03-07 02:26:18,456][01872] Num frames 5200...
[2025-03-07 02:26:18,594][01872] Num frames 5300...
[2025-03-07 02:26:18,720][01872] Num frames 5400...
[2025-03-07 02:26:18,848][01872] Num frames 5500...
[2025-03-07 02:26:18,973][01872] Num frames 5600...
[2025-03-07 02:26:19,100][01872] Num frames 5700...
[2025-03-07 02:26:19,226][01872] Num frames 5800...
[2025-03-07 02:26:19,352][01872] Num frames 5900...
[2025-03-07 02:26:19,489][01872] Num frames 6000...
[2025-03-07 02:26:19,616][01872] Num frames 6100...
[2025-03-07 02:26:19,749][01872] Avg episode rewards: #0: 18.516, true rewards: #0: 8.801
[2025-03-07 02:26:19,750][01872] Avg episode reward: 18.516, avg true_objective: 8.801
[2025-03-07 02:26:19,800][01872] Num frames 6200...
[2025-03-07 02:26:19,925][01872] Num frames 6300...
[2025-03-07 02:26:20,049][01872] Num frames 6400...
[2025-03-07 02:26:20,174][01872] Num frames 6500...
[2025-03-07 02:26:20,304][01872] Num frames 6600...
[2025-03-07 02:26:20,431][01872] Num frames 6700...
[2025-03-07 02:26:20,572][01872] Num frames 6800...
[2025-03-07 02:26:20,699][01872] Num frames 6900...
[2025-03-07 02:26:20,824][01872] Num frames 7000...
[2025-03-07 02:26:20,951][01872] Num frames 7100...
[2025-03-07 02:26:21,078][01872] Num frames 7200...
[2025-03-07 02:26:21,198][01872] Avg episode rewards: #0: 18.686, true rewards: #0: 9.061
[2025-03-07 02:26:21,200][01872] Avg episode reward: 18.686, avg true_objective: 9.061
[2025-03-07 02:26:21,275][01872] Num frames 7300...
[2025-03-07 02:26:21,399][01872] Num frames 7400...
[2025-03-07 02:26:21,537][01872] Num frames 7500...
[2025-03-07 02:26:21,662][01872] Num frames 7600...
[2025-03-07 02:26:21,787][01872] Num frames 7700...
[2025-03-07 02:26:21,912][01872] Num frames 7800...
[2025-03-07 02:26:22,066][01872] Avg episode rewards: #0: 18.199, true rewards: #0: 8.754
[2025-03-07 02:26:22,067][01872] Avg episode reward: 18.199, avg true_objective: 8.754
[2025-03-07 02:26:22,097][01872] Num frames 7900...
[2025-03-07 02:26:22,223][01872] Num frames 8000...
[2025-03-07 02:26:22,350][01872] Num frames 8100...
[2025-03-07 02:26:22,480][01872] Num frames 8200...
[2025-03-07 02:26:22,617][01872] Num frames 8300...
[2025-03-07 02:26:22,743][01872] Num frames 8400...
[2025-03-07 02:26:22,876][01872] Num frames 8500...
[2025-03-07 02:26:23,019][01872] Num frames 8600...
[2025-03-07 02:26:23,147][01872] Num frames 8700...
[2025-03-07 02:26:23,276][01872] Num frames 8800...
[2025-03-07 02:26:23,410][01872] Num frames 8900...
[2025-03-07 02:26:23,544][01872] Num frames 9000...
[2025-03-07 02:26:23,677][01872] Num frames 9100...
[2025-03-07 02:26:23,805][01872] Num frames 9200...
[2025-03-07 02:26:23,932][01872] Num frames 9300...
[2025-03-07 02:26:24,060][01872] Num frames 9400...
[2025-03-07 02:26:24,187][01872] Num frames 9500...
[2025-03-07 02:26:24,252][01872] Avg episode rewards: #0: 20.408, true rewards: #0: 9.508
[2025-03-07 02:26:24,253][01872] Avg episode reward: 20.408, avg true_objective: 9.508
[2025-03-07 02:27:21,219][01872] Replay video saved to /content/train_dir/default_experiment/replay.mp4!