2023-10-13 18:40:22,179 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,181 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 18:40:22,181 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,181 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-13 18:40:22,181 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,181 Train: 14465 sentences 2023-10-13 18:40:22,181 (train_with_dev=False, train_with_test=False) 2023-10-13 18:40:22,182 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,182 Training Params: 2023-10-13 18:40:22,182 - learning_rate: "0.00016" 2023-10-13 18:40:22,182 - mini_batch_size: "4" 2023-10-13 18:40:22,182 - max_epochs: "10" 2023-10-13 18:40:22,182 - shuffle: "True" 2023-10-13 18:40:22,182 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,182 Plugins: 2023-10-13 18:40:22,182 - TensorboardLogger 2023-10-13 18:40:22,182 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 18:40:22,182 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,182 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 18:40:22,182 - metric: "('micro avg', 'f1-score')" 2023-10-13 18:40:22,182 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,182 Computation: 2023-10-13 18:40:22,183 - compute on device: cuda:0 2023-10-13 18:40:22,183 - embedding storage: none 2023-10-13 18:40:22,183 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,183 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" 2023-10-13 18:40:22,183 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,183 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:40:22,183 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 18:41:58,767 epoch 1 - iter 361/3617 - loss 2.52124674 - time (sec): 96.58 - samples/sec: 386.79 - lr: 0.000016 - momentum: 0.000000 2023-10-13 18:43:34,508 epoch 1 - iter 722/3617 - loss 2.11644111 - time (sec): 192.32 - samples/sec: 388.65 - lr: 0.000032 - momentum: 0.000000 2023-10-13 18:45:09,650 epoch 1 - iter 1083/3617 - loss 1.64036489 - time (sec): 287.46 - samples/sec: 393.68 - lr: 0.000048 - momentum: 0.000000 2023-10-13 18:46:45,788 epoch 1 - iter 1444/3617 - loss 1.30596637 - time (sec): 383.60 - samples/sec: 393.95 - lr: 0.000064 - momentum: 0.000000 2023-10-13 18:48:22,571 epoch 1 - iter 1805/3617 - loss 1.08312946 - time (sec): 480.39 - samples/sec: 393.97 - lr: 0.000080 - momentum: 0.000000 2023-10-13 18:49:59,710 epoch 1 - iter 2166/3617 - loss 0.93558591 - time (sec): 577.53 - samples/sec: 392.82 - lr: 0.000096 - momentum: 0.000000 2023-10-13 18:51:37,931 epoch 1 - iter 2527/3617 - loss 0.82548930 - time (sec): 675.75 - samples/sec: 392.25 - lr: 0.000112 - momentum: 0.000000 2023-10-13 18:53:15,367 epoch 1 - iter 2888/3617 - loss 0.73875966 - time (sec): 773.18 - samples/sec: 391.69 - lr: 0.000128 - momentum: 0.000000 2023-10-13 18:54:56,085 epoch 1 - iter 3249/3617 - loss 0.67039148 - time (sec): 873.90 - samples/sec: 391.02 - lr: 0.000144 - momentum: 0.000000 2023-10-13 18:56:40,552 epoch 1 - iter 3610/3617 - loss 0.61660340 - time (sec): 978.37 - samples/sec: 387.65 - lr: 0.000160 - momentum: 0.000000 2023-10-13 18:56:42,400 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:56:42,400 EPOCH 1 done: loss 0.6157 - lr: 0.000160 2023-10-13 18:57:20,914 DEV : loss 0.13260270655155182 - f1-score (micro avg) 0.5508 2023-10-13 18:57:20,972 saving best model 2023-10-13 18:57:21,837 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:59:02,582 epoch 2 - iter 361/3617 - loss 0.11003363 - time (sec): 100.74 - samples/sec: 364.24 - lr: 0.000158 - momentum: 0.000000 2023-10-13 19:00:41,098 epoch 2 - iter 722/3617 - loss 0.10629388 - time (sec): 199.26 - samples/sec: 374.92 - lr: 0.000156 - momentum: 0.000000 2023-10-13 19:02:20,551 epoch 2 - iter 1083/3617 - loss 0.10240780 - time (sec): 298.71 - samples/sec: 378.36 - lr: 0.000155 - momentum: 0.000000 2023-10-13 19:03:57,204 epoch 2 - iter 1444/3617 - loss 0.10038512 - time (sec): 395.36 - samples/sec: 381.86 - lr: 0.000153 - momentum: 0.000000 2023-10-13 19:05:34,866 epoch 2 - iter 1805/3617 - loss 0.09845307 - time (sec): 493.03 - samples/sec: 385.47 - lr: 0.000151 - momentum: 0.000000 2023-10-13 19:07:10,754 epoch 2 - iter 2166/3617 - loss 0.09905672 - time (sec): 588.91 - samples/sec: 384.71 - lr: 0.000149 - momentum: 0.000000 2023-10-13 19:08:47,888 epoch 2 - iter 2527/3617 - loss 0.09733769 - time (sec): 686.05 - samples/sec: 385.00 - lr: 0.000148 - momentum: 0.000000 2023-10-13 19:10:26,147 epoch 2 - iter 2888/3617 - loss 0.09555977 - time (sec): 784.31 - samples/sec: 386.99 - lr: 0.000146 - momentum: 0.000000 2023-10-13 19:12:03,836 epoch 2 - iter 3249/3617 - loss 0.09404990 - time (sec): 882.00 - samples/sec: 387.21 - lr: 0.000144 - momentum: 0.000000 2023-10-13 19:13:40,645 epoch 2 - iter 3610/3617 - loss 0.09376699 - time (sec): 978.81 - samples/sec: 387.37 - lr: 0.000142 - momentum: 0.000000 2023-10-13 19:13:42,408 ---------------------------------------------------------------------------------------------------- 2023-10-13 19:13:42,408 EPOCH 2 done: loss 0.0938 - lr: 0.000142 2023-10-13 19:14:21,070 DEV : loss 0.1215815320611 - f1-score (micro avg) 0.5926 2023-10-13 19:14:21,126 saving best model 2023-10-13 19:14:23,664 ---------------------------------------------------------------------------------------------------- 2023-10-13 19:16:01,864 epoch 3 - iter 361/3617 - loss 0.05640694 - time (sec): 98.20 - samples/sec: 399.92 - lr: 0.000140 - momentum: 0.000000 2023-10-13 19:17:37,015 epoch 3 - iter 722/3617 - loss 0.05950424 - time (sec): 193.35 - samples/sec: 392.78 - lr: 0.000139 - momentum: 0.000000 2023-10-13 19:19:13,647 epoch 3 - iter 1083/3617 - loss 0.06066695 - time (sec): 289.98 - samples/sec: 391.53 - lr: 0.000137 - momentum: 0.000000 2023-10-13 19:20:51,628 epoch 3 - iter 1444/3617 - loss 0.06135792 - time (sec): 387.96 - samples/sec: 389.69 - lr: 0.000135 - momentum: 0.000000 2023-10-13 19:22:29,614 epoch 3 - iter 1805/3617 - loss 0.06119400 - time (sec): 485.95 - samples/sec: 390.11 - lr: 0.000133 - momentum: 0.000000 2023-10-13 19:24:06,053 epoch 3 - iter 2166/3617 - loss 0.06300933 - time (sec): 582.39 - samples/sec: 388.28 - lr: 0.000132 - momentum: 0.000000 2023-10-13 19:25:44,104 epoch 3 - iter 2527/3617 - loss 0.06310662 - time (sec): 680.44 - samples/sec: 390.67 - lr: 0.000130 - momentum: 0.000000 2023-10-13 19:27:20,850 epoch 3 - iter 2888/3617 - loss 0.06456942 - time (sec): 777.18 - samples/sec: 388.75 - lr: 0.000128 - momentum: 0.000000 2023-10-13 19:29:01,123 epoch 3 - iter 3249/3617 - loss 0.06429657 - time (sec): 877.46 - samples/sec: 388.25 - lr: 0.000126 - momentum: 0.000000 2023-10-13 19:30:43,178 epoch 3 - iter 3610/3617 - loss 0.06441539 - time (sec): 979.51 - samples/sec: 387.28 - lr: 0.000124 - momentum: 0.000000 2023-10-13 19:30:44,906 ---------------------------------------------------------------------------------------------------- 2023-10-13 19:30:44,907 EPOCH 3 done: loss 0.0644 - lr: 0.000124 2023-10-13 19:31:24,062 DEV : loss 0.1586698442697525 - f1-score (micro avg) 0.6321 2023-10-13 19:31:24,118 saving best model 2023-10-13 19:31:26,657 ---------------------------------------------------------------------------------------------------- 2023-10-13 19:33:04,849 epoch 4 - iter 361/3617 - loss 0.04452930 - time (sec): 98.19 - samples/sec: 378.11 - lr: 0.000123 - momentum: 0.000000 2023-10-13 19:34:44,327 epoch 4 - iter 722/3617 - loss 0.04066016 - time (sec): 197.67 - samples/sec: 385.17 - lr: 0.000121 - momentum: 0.000000 2023-10-13 19:36:23,824 epoch 4 - iter 1083/3617 - loss 0.04578705 - time (sec): 297.16 - samples/sec: 382.49 - lr: 0.000119 - momentum: 0.000000 2023-10-13 19:38:02,132 epoch 4 - iter 1444/3617 - loss 0.04562518 - time (sec): 395.47 - samples/sec: 381.71 - lr: 0.000117 - momentum: 0.000000 2023-10-13 19:39:39,551 epoch 4 - iter 1805/3617 - loss 0.04646266 - time (sec): 492.89 - samples/sec: 383.13 - lr: 0.000116 - momentum: 0.000000 2023-10-13 19:41:15,820 epoch 4 - iter 2166/3617 - loss 0.04519717 - time (sec): 589.16 - samples/sec: 384.26 - lr: 0.000114 - momentum: 0.000000 2023-10-13 19:42:55,082 epoch 4 - iter 2527/3617 - loss 0.04533348 - time (sec): 688.42 - samples/sec: 383.51 - lr: 0.000112 - momentum: 0.000000 2023-10-13 19:44:36,264 epoch 4 - iter 2888/3617 - loss 0.04473135 - time (sec): 789.60 - samples/sec: 382.69 - lr: 0.000110 - momentum: 0.000000 2023-10-13 19:46:16,493 epoch 4 - iter 3249/3617 - loss 0.04510623 - time (sec): 889.83 - samples/sec: 383.59 - lr: 0.000108 - momentum: 0.000000 2023-10-13 19:47:53,562 epoch 4 - iter 3610/3617 - loss 0.04637355 - time (sec): 986.90 - samples/sec: 384.36 - lr: 0.000107 - momentum: 0.000000 2023-10-13 19:47:55,182 ---------------------------------------------------------------------------------------------------- 2023-10-13 19:47:55,182 EPOCH 4 done: loss 0.0464 - lr: 0.000107 2023-10-13 19:48:34,460 DEV : loss 0.2130165547132492 - f1-score (micro avg) 0.6471 2023-10-13 19:48:34,518 saving best model 2023-10-13 19:48:37,082 ---------------------------------------------------------------------------------------------------- 2023-10-13 19:50:14,518 epoch 5 - iter 361/3617 - loss 0.02758511 - time (sec): 97.43 - samples/sec: 397.38 - lr: 0.000105 - momentum: 0.000000 2023-10-13 19:51:50,077 epoch 5 - iter 722/3617 - loss 0.02910352 - time (sec): 192.99 - samples/sec: 401.98 - lr: 0.000103 - momentum: 0.000000 2023-10-13 19:53:24,280 epoch 5 - iter 1083/3617 - loss 0.02918864 - time (sec): 287.19 - samples/sec: 398.11 - lr: 0.000101 - momentum: 0.000000 2023-10-13 19:55:00,737 epoch 5 - iter 1444/3617 - loss 0.03179865 - time (sec): 383.65 - samples/sec: 400.35 - lr: 0.000100 - momentum: 0.000000 2023-10-13 19:56:40,217 epoch 5 - iter 1805/3617 - loss 0.03047882 - time (sec): 483.13 - samples/sec: 397.62 - lr: 0.000098 - momentum: 0.000000 2023-10-13 19:58:17,353 epoch 5 - iter 2166/3617 - loss 0.03136089 - time (sec): 580.27 - samples/sec: 393.32 - lr: 0.000096 - momentum: 0.000000 2023-10-13 19:59:58,091 epoch 5 - iter 2527/3617 - loss 0.03107949 - time (sec): 681.01 - samples/sec: 391.65 - lr: 0.000094 - momentum: 0.000000 2023-10-13 20:01:34,403 epoch 5 - iter 2888/3617 - loss 0.03125495 - time (sec): 777.32 - samples/sec: 392.23 - lr: 0.000092 - momentum: 0.000000 2023-10-13 20:03:15,510 epoch 5 - iter 3249/3617 - loss 0.03139656 - time (sec): 878.42 - samples/sec: 388.36 - lr: 0.000091 - momentum: 0.000000 2023-10-13 20:04:55,226 epoch 5 - iter 3610/3617 - loss 0.03151360 - time (sec): 978.14 - samples/sec: 387.71 - lr: 0.000089 - momentum: 0.000000 2023-10-13 20:04:56,955 ---------------------------------------------------------------------------------------------------- 2023-10-13 20:04:56,955 EPOCH 5 done: loss 0.0316 - lr: 0.000089 2023-10-13 20:05:37,673 DEV : loss 0.2452983260154724 - f1-score (micro avg) 0.6203 2023-10-13 20:05:37,733 ---------------------------------------------------------------------------------------------------- 2023-10-13 20:07:19,848 epoch 6 - iter 361/3617 - loss 0.01651558 - time (sec): 102.11 - samples/sec: 372.95 - lr: 0.000087 - momentum: 0.000000 2023-10-13 20:08:58,158 epoch 6 - iter 722/3617 - loss 0.01869225 - time (sec): 200.42 - samples/sec: 375.10 - lr: 0.000085 - momentum: 0.000000 2023-10-13 20:10:36,227 epoch 6 - iter 1083/3617 - loss 0.01996453 - time (sec): 298.49 - samples/sec: 375.91 - lr: 0.000084 - momentum: 0.000000 2023-10-13 20:12:14,888 epoch 6 - iter 1444/3617 - loss 0.02211179 - time (sec): 397.15 - samples/sec: 379.66 - lr: 0.000082 - momentum: 0.000000 2023-10-13 20:13:51,084 epoch 6 - iter 1805/3617 - loss 0.02255716 - time (sec): 493.35 - samples/sec: 381.30 - lr: 0.000080 - momentum: 0.000000 2023-10-13 20:15:26,711 epoch 6 - iter 2166/3617 - loss 0.02243773 - time (sec): 588.98 - samples/sec: 383.43 - lr: 0.000078 - momentum: 0.000000 2023-10-13 20:17:02,468 epoch 6 - iter 2527/3617 - loss 0.02283358 - time (sec): 684.73 - samples/sec: 385.02 - lr: 0.000076 - momentum: 0.000000 2023-10-13 20:18:38,763 epoch 6 - iter 2888/3617 - loss 0.02312108 - time (sec): 781.03 - samples/sec: 387.56 - lr: 0.000075 - momentum: 0.000000 2023-10-13 20:20:14,206 epoch 6 - iter 3249/3617 - loss 0.02327124 - time (sec): 876.47 - samples/sec: 388.97 - lr: 0.000073 - momentum: 0.000000 2023-10-13 20:21:49,847 epoch 6 - iter 3610/3617 - loss 0.02419111 - time (sec): 972.11 - samples/sec: 390.08 - lr: 0.000071 - momentum: 0.000000 2023-10-13 20:21:51,535 ---------------------------------------------------------------------------------------------------- 2023-10-13 20:21:51,535 EPOCH 6 done: loss 0.0242 - lr: 0.000071 2023-10-13 20:22:30,924 DEV : loss 0.27563825249671936 - f1-score (micro avg) 0.6231 2023-10-13 20:22:30,982 ---------------------------------------------------------------------------------------------------- 2023-10-13 20:24:08,378 epoch 7 - iter 361/3617 - loss 0.01521082 - time (sec): 97.39 - samples/sec: 396.60 - lr: 0.000069 - momentum: 0.000000 2023-10-13 20:25:46,638 epoch 7 - iter 722/3617 - loss 0.01323902 - time (sec): 195.65 - samples/sec: 389.93 - lr: 0.000068 - momentum: 0.000000 2023-10-13 20:27:24,850 epoch 7 - iter 1083/3617 - loss 0.01408018 - time (sec): 293.87 - samples/sec: 392.34 - lr: 0.000066 - momentum: 0.000000 2023-10-13 20:29:00,668 epoch 7 - iter 1444/3617 - loss 0.01314577 - time (sec): 389.68 - samples/sec: 389.85 - lr: 0.000064 - momentum: 0.000000 2023-10-13 20:30:36,682 epoch 7 - iter 1805/3617 - loss 0.01419142 - time (sec): 485.70 - samples/sec: 391.03 - lr: 0.000062 - momentum: 0.000000 2023-10-13 20:32:12,581 epoch 7 - iter 2166/3617 - loss 0.01554244 - time (sec): 581.60 - samples/sec: 394.58 - lr: 0.000060 - momentum: 0.000000 2023-10-13 20:33:47,873 epoch 7 - iter 2527/3617 - loss 0.01578059 - time (sec): 676.89 - samples/sec: 394.74 - lr: 0.000059 - momentum: 0.000000 2023-10-13 20:35:23,614 epoch 7 - iter 2888/3617 - loss 0.01553333 - time (sec): 772.63 - samples/sec: 393.38 - lr: 0.000057 - momentum: 0.000000 2023-10-13 20:36:59,404 epoch 7 - iter 3249/3617 - loss 0.01558103 - time (sec): 868.42 - samples/sec: 392.48 - lr: 0.000055 - momentum: 0.000000 2023-10-13 20:38:37,239 epoch 7 - iter 3610/3617 - loss 0.01537530 - time (sec): 966.25 - samples/sec: 392.34 - lr: 0.000053 - momentum: 0.000000 2023-10-13 20:38:39,132 ---------------------------------------------------------------------------------------------------- 2023-10-13 20:38:39,133 EPOCH 7 done: loss 0.0153 - lr: 0.000053 2023-10-13 20:39:17,634 DEV : loss 0.33923789858818054 - f1-score (micro avg) 0.6456 2023-10-13 20:39:17,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 20:40:58,364 epoch 8 - iter 361/3617 - loss 0.01277912 - time (sec): 100.67 - samples/sec: 378.20 - lr: 0.000052 - momentum: 0.000000 2023-10-13 20:42:39,601 epoch 8 - iter 722/3617 - loss 0.01152385 - time (sec): 201.91 - samples/sec: 384.45 - lr: 0.000050 - momentum: 0.000000 2023-10-13 20:44:18,870 epoch 8 - iter 1083/3617 - loss 0.01084500 - time (sec): 301.18 - samples/sec: 385.62 - lr: 0.000048 - momentum: 0.000000 2023-10-13 20:45:57,550 epoch 8 - iter 1444/3617 - loss 0.00969400 - time (sec): 399.86 - samples/sec: 386.70 - lr: 0.000046 - momentum: 0.000000 2023-10-13 20:47:33,176 epoch 8 - iter 1805/3617 - loss 0.00998289 - time (sec): 495.48 - samples/sec: 385.30 - lr: 0.000044 - momentum: 0.000000 2023-10-13 20:49:09,788 epoch 8 - iter 2166/3617 - loss 0.01065515 - time (sec): 592.09 - samples/sec: 388.20 - lr: 0.000043 - momentum: 0.000000 2023-10-13 20:50:44,987 epoch 8 - iter 2527/3617 - loss 0.01060654 - time (sec): 687.29 - samples/sec: 387.95 - lr: 0.000041 - momentum: 0.000000 2023-10-13 20:52:20,864 epoch 8 - iter 2888/3617 - loss 0.01058643 - time (sec): 783.17 - samples/sec: 388.32 - lr: 0.000039 - momentum: 0.000000 2023-10-13 20:53:59,752 epoch 8 - iter 3249/3617 - loss 0.01058392 - time (sec): 882.06 - samples/sec: 388.07 - lr: 0.000037 - momentum: 0.000000 2023-10-13 20:55:42,153 epoch 8 - iter 3610/3617 - loss 0.01028093 - time (sec): 984.46 - samples/sec: 385.49 - lr: 0.000036 - momentum: 0.000000 2023-10-13 20:55:43,713 ---------------------------------------------------------------------------------------------------- 2023-10-13 20:55:43,713 EPOCH 8 done: loss 0.0103 - lr: 0.000036 2023-10-13 20:56:23,186 DEV : loss 0.33330851793289185 - f1-score (micro avg) 0.6595 2023-10-13 20:56:23,252 saving best model 2023-10-13 20:56:25,832 ---------------------------------------------------------------------------------------------------- 2023-10-13 20:58:02,028 epoch 9 - iter 361/3617 - loss 0.00470043 - time (sec): 96.19 - samples/sec: 378.73 - lr: 0.000034 - momentum: 0.000000 2023-10-13 20:59:40,260 epoch 9 - iter 722/3617 - loss 0.00531921 - time (sec): 194.42 - samples/sec: 384.79 - lr: 0.000032 - momentum: 0.000000 2023-10-13 21:01:18,063 epoch 9 - iter 1083/3617 - loss 0.00654361 - time (sec): 292.22 - samples/sec: 387.48 - lr: 0.000030 - momentum: 0.000000 2023-10-13 21:02:57,618 epoch 9 - iter 1444/3617 - loss 0.00676990 - time (sec): 391.78 - samples/sec: 385.89 - lr: 0.000028 - momentum: 0.000000 2023-10-13 21:04:40,159 epoch 9 - iter 1805/3617 - loss 0.00686632 - time (sec): 494.32 - samples/sec: 383.67 - lr: 0.000027 - momentum: 0.000000 2023-10-13 21:06:17,098 epoch 9 - iter 2166/3617 - loss 0.00685262 - time (sec): 591.26 - samples/sec: 385.86 - lr: 0.000025 - momentum: 0.000000 2023-10-13 21:07:52,623 epoch 9 - iter 2527/3617 - loss 0.00658488 - time (sec): 686.78 - samples/sec: 386.29 - lr: 0.000023 - momentum: 0.000000 2023-10-13 21:09:28,784 epoch 9 - iter 2888/3617 - loss 0.00668225 - time (sec): 782.95 - samples/sec: 384.96 - lr: 0.000021 - momentum: 0.000000 2023-10-13 21:11:07,550 epoch 9 - iter 3249/3617 - loss 0.00659644 - time (sec): 881.71 - samples/sec: 385.57 - lr: 0.000020 - momentum: 0.000000 2023-10-13 21:12:45,880 epoch 9 - iter 3610/3617 - loss 0.00676788 - time (sec): 980.04 - samples/sec: 386.92 - lr: 0.000018 - momentum: 0.000000 2023-10-13 21:12:47,696 ---------------------------------------------------------------------------------------------------- 2023-10-13 21:12:47,696 EPOCH 9 done: loss 0.0068 - lr: 0.000018 2023-10-13 21:13:27,417 DEV : loss 0.3747117519378662 - f1-score (micro avg) 0.6531 2023-10-13 21:13:27,476 ---------------------------------------------------------------------------------------------------- 2023-10-13 21:15:06,935 epoch 10 - iter 361/3617 - loss 0.00214926 - time (sec): 99.46 - samples/sec: 384.82 - lr: 0.000016 - momentum: 0.000000 2023-10-13 21:16:45,731 epoch 10 - iter 722/3617 - loss 0.00207497 - time (sec): 198.25 - samples/sec: 383.67 - lr: 0.000014 - momentum: 0.000000 2023-10-13 21:18:22,667 epoch 10 - iter 1083/3617 - loss 0.00303010 - time (sec): 295.19 - samples/sec: 384.39 - lr: 0.000012 - momentum: 0.000000 2023-10-13 21:20:05,149 epoch 10 - iter 1444/3617 - loss 0.00354649 - time (sec): 397.67 - samples/sec: 382.40 - lr: 0.000011 - momentum: 0.000000 2023-10-13 21:21:42,479 epoch 10 - iter 1805/3617 - loss 0.00366430 - time (sec): 495.00 - samples/sec: 382.22 - lr: 0.000009 - momentum: 0.000000 2023-10-13 21:23:21,257 epoch 10 - iter 2166/3617 - loss 0.00439229 - time (sec): 593.78 - samples/sec: 382.12 - lr: 0.000007 - momentum: 0.000000 2023-10-13 21:25:03,186 epoch 10 - iter 2527/3617 - loss 0.00430784 - time (sec): 695.71 - samples/sec: 381.88 - lr: 0.000005 - momentum: 0.000000 2023-10-13 21:26:45,484 epoch 10 - iter 2888/3617 - loss 0.00395290 - time (sec): 798.01 - samples/sec: 381.84 - lr: 0.000004 - momentum: 0.000000 2023-10-13 21:28:24,586 epoch 10 - iter 3249/3617 - loss 0.00388602 - time (sec): 897.11 - samples/sec: 379.67 - lr: 0.000002 - momentum: 0.000000 2023-10-13 21:30:08,585 epoch 10 - iter 3610/3617 - loss 0.00392291 - time (sec): 1001.11 - samples/sec: 378.90 - lr: 0.000000 - momentum: 0.000000 2023-10-13 21:30:10,346 ---------------------------------------------------------------------------------------------------- 2023-10-13 21:30:10,347 EPOCH 10 done: loss 0.0039 - lr: 0.000000 2023-10-13 21:30:52,851 DEV : loss 0.3852955400943756 - f1-score (micro avg) 0.6562 2023-10-13 21:30:53,782 ---------------------------------------------------------------------------------------------------- 2023-10-13 21:30:53,784 Loading model from best epoch ... 2023-10-13 21:30:57,709 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-13 21:31:56,313 Results: - F-score (micro) 0.6292 - F-score (macro) 0.4868 - Accuracy 0.4702 By class: precision recall f1-score support loc 0.6219 0.7597 0.6839 591 pers 0.5708 0.7003 0.6289 357 org 0.1571 0.1392 0.1477 79 micro avg 0.5772 0.6913 0.6292 1027 macro avg 0.4499 0.5331 0.4868 1027 weighted avg 0.5684 0.6913 0.6236 1027 2023-10-13 21:31:56,313 ----------------------------------------------------------------------------------------------------