You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 64 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 64, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("disi-unibo-nlp/foodex-baseterm-retriever")
# Run inference
sentences = [
    'sauce cream salad dressing based facets desc food production commercial brandname productname known',
    'The group includes any type of Salad dressing. The part consumed/analysed is by default the whole marketed unit or a homogeneous representative portion.',
    'The group includes any type of Seasonings and extracts. The part consumed/analysed is by default the whole marketed unit or a representative portion of it.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Device Aware Information Retrieval

  • Evaluated with src.utils.eval_functions.DeviceAwareInformationRetrievalEvaluator
Metric Value
cosine_accuracy@1 0.9617
cosine_accuracy@3 0.9974
cosine_accuracy@5 0.9993
cosine_accuracy@10 1.0
cosine_precision@1 0.9617
cosine_precision@3 0.3347
cosine_precision@5 0.2012
cosine_precision@10 0.1008
cosine_recall@1 0.9586
cosine_recall@3 0.997
cosine_recall@5 0.9988
cosine_recall@10 1.0
cosine_ndcg@10 0.9846
cosine_mrr@10 0.9793
cosine_map@100 0.9791

Training Details

Training Dataset

Unnamed Dataset

  • Size: 553,000 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 3 tokens
    • mean: 16.88 tokens
    • max: 57 tokens
    • min: 11 tokens
    • mean: 55.65 tokens
    • max: 64 tokens
    • min: 12 tokens
    • mean: 53.82 tokens
    • max: 64 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    porridge made with ssmilk water The group includes any type of Oat porridge in dry form to be diluted with milk or water. The term refers to e.g. ground oatmeal or rolled oat intended to be used for making oat porridge. The part consumed/analysed is by default the whole or a portion of it representing the observed heterogeneity. The group includes any type of Oat flour (finely milled grains with particles not easy to distinguish). Different grades of refinement and types are all included in this group. The part consumed/analysed is by default the whole or a portion of it representing the observed heterogeneity.
    pepper sweet red raw unprocessed fresh no treatment brand product name with skin peel crust bought chilled fresh department 2 8 c Fruiting vegetables from the plant classified under the species Capsicum annuum var. grossum (L.) Sendtner or Capsicum annuum var. longum Bailey, commonly known as Sweet peppers or Bell peppers or Paprika or PeppersLong or Pimento or Pimiento. The part consumed/analysed is not specified. When relevant, information on the part consumed/analysed has to be reported with additional facet descriptors. In case of data collections related to legislations, the default part consumed/analysed is the one defined in the applicable legislation. Spices from the fruits of the plant classified under the species Piper guineense Thonn., commonly known as West African pepper fruit. The part consumed/analysed is not specified. When relevant, information on the part consumed/analysed has to be reported with additional facet descriptors. In case of data collections related to legislations, the default part consumed/analysed is the one defined in the applicable legislation.
    yeasted wheat bread with sourmilk sliced The group includes any type of bread and rolls made with wheat flour containing high proportion of bran or wholemeal (brown or wholemeal wheat flour). The part consumed/analysed is by default the whole or a portion of it representing the observed heterogeneity. The group includes any type of bread and rolls made with wheat flour containing moderate amounts of bran (semi-brown wheat flour). The part consumed/analysed is by default the whole or a portion of it representing the observed heterogeneity.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 5
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss cosine_ndcg@10
0 0 - 0.2918
0.0579 500 1.2336 0.7926
0.1157 1000 0.6594 0.8605
0.1736 1500 0.5586 0.8827
0.2315 2000 0.5041 0.8998
0.2893 2500 0.4777 0.9075
0.3472 3000 0.4616 0.9172
0.4050 3500 0.4416 0.9309
0.4629 4000 0.4292 0.9279
0.5208 4500 0.4105 0.9375
0.5786 5000 0.4011 0.9384
0.6365 5500 0.4064 0.9491
0.6944 6000 0.3864 0.9504
0.7522 6500 0.3864 0.9502
0.8101 7000 0.3803 0.9546
0.8680 7500 0.3837 0.9571
0.9258 8000 0.3775 0.9606
0.9837 8500 0.3728 0.9629
1.0 8641 - 0.9621
1.0415 9000 0.3612 0.9612
1.0994 9500 0.3657 0.9650
1.1573 10000 0.3613 0.9659
1.2151 10500 0.3556 0.9630
1.2730 11000 0.3657 0.9655
1.3309 11500 0.3645 0.9697
1.3887 12000 0.351 0.9702
1.4466 12500 0.3533 0.9702
1.5045 13000 0.3505 0.9723
1.5623 13500 0.3444 0.9713
1.6202 14000 0.3517 0.9725
1.6780 14500 0.3535 0.9735
1.7359 15000 0.353 0.9726
1.7938 15500 0.3444 0.9740
1.8516 16000 0.3455 0.9785
1.9095 16500 0.3459 0.9763
1.9674 17000 0.3494 0.9787
2.0 17282 - 0.9790
2.0252 17500 0.3487 0.9794
2.0831 18000 0.3371 0.9761
2.1410 18500 0.3315 0.9788
2.1988 19000 0.3352 0.9785
2.2567 19500 0.3396 0.9763
2.3145 20000 0.3356 0.9776
2.3724 20500 0.3382 0.9811
2.4303 21000 0.34 0.9805
2.4881 21500 0.3309 0.9802
2.5460 22000 0.3353 0.9797
2.6039 22500 0.3423 0.9798
2.6617 23000 0.3289 0.9809
2.7196 23500 0.3333 0.9803
2.7775 24000 0.338 0.9815
2.8353 24500 0.336 0.9816
2.8932 25000 0.3346 0.9813
2.9510 25500 0.3311 0.9807
3.0 25923 - 0.9819
3.0089 26000 0.3302 0.9824
3.0668 26500 0.3275 0.9833
3.1246 27000 0.3331 0.9840
3.1825 27500 0.3231 0.9839
3.2404 28000 0.3308 0.9839
3.2982 28500 0.3259 0.9836
3.3561 29000 0.3252 0.9838
3.4140 29500 0.315 0.9847
3.4718 30000 0.322 0.9829
3.5297 30500 0.3323 0.9837
3.5875 31000 0.3318 0.9833
3.6454 31500 0.3307 0.9842
3.7033 32000 0.331 0.9841
3.7611 32500 0.3209 0.9849
3.8190 33000 0.3267 0.9841
3.8769 33500 0.3214 0.9846
3.9347 34000 0.3232 0.9847
3.9926 34500 0.3291 0.9848
4.0 34564 - 0.9844
4.0505 35000 0.3257 0.9843
4.1083 35500 0.3237 0.9841
4.1662 36000 0.3177 0.9837
4.2240 36500 0.3314 0.9838
4.2819 37000 0.3266 0.9842
4.3398 37500 0.3184 0.9841
4.3976 38000 0.3162 0.9844
4.4555 38500 0.3164 0.9850
4.5134 39000 0.3209 0.9849
4.5712 39500 0.3292 0.9850
4.6291 40000 0.3194 0.9850
4.6870 40500 0.3212 0.9850
4.7448 41000 0.3359 0.9849
4.8027 41500 0.3199 0.9848
4.8605 42000 0.3257 0.9847
4.9184 42500 0.3172 0.9846
4.9763 43000 0.324 0.9846
5.0 43205 - 0.9846

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.4.0
  • Datasets: 3.3.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
32
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for disi-unibo-nlp/foodex-baseterm-retriever

Base model

BAAI/bge-m3
Finetuned
(234)
this model

Collection including disi-unibo-nlp/foodex-baseterm-retriever

Evaluation results