🥗 FoodEx2 System
Collection
Datasets and Models for the FoodEx2 System Project
•
10 items
•
Updated
•
1
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 96, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("disi-unibo-nlp/foodex-facet-descriptors-retriever")
# Run inference
sentences = [
'tome des bauges raw milk aoc in plastic container brand product name </s> This facet allows recording whether the food list code was chosen because of lack of information on the food item or because the proper entry in the food list was missing. Only one descriptor from this facet can be added to each entry.',
'The food list item has been chosen because none of the more detailed items corresponded to the available information. Please consider the eventual addition of a new term in the list',
'Deprecated term that must NOT be used for any purpose. Its original scopenote was: The group includes any type of Other fruiting vegetables (exposure). The part consumed/analysed is by default unspecified. When relevant, information on the part consumed/analysed has to be reported with additional facet descriptors.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
src.utils.eval_functions.DeviceAwareInformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.985 |
cosine_accuracy@3 | 0.999 |
cosine_accuracy@5 | 0.9998 |
cosine_accuracy@10 | 1.0 |
cosine_precision@1 | 0.985 |
cosine_precision@3 | 0.4171 |
cosine_precision@5 | 0.2537 |
cosine_precision@10 | 0.1275 |
cosine_recall@1 | 0.8691 |
cosine_recall@3 | 0.9939 |
cosine_recall@5 | 0.9985 |
cosine_recall@10 | 0.9999 |
cosine_ndcg@10 | 0.9936 |
cosine_mrr@10 | 0.9919 |
cosine_map@100 | 0.9909 |
sentence_0
, sentence_1
, and sentence_2
sentence_0 | sentence_1 | sentence_2 | |
---|---|---|---|
type | string | string | string |
details |
|
|
|
sentence_0 | sentence_1 | sentence_2 |
---|---|---|
peach fresh flesh baked with skin This facet allows recording different characteristics of the food: preservation treatments a food item underwent, technological steps or treatments applied while producing a food item, the way a food item has been heat treated before consumption and the way a food item has been prepared for final consumption (particularly needed for consumption surveys and includes preparation (like battering or breading) as well as heat treatment steps). More (none contradicting) descriptors can be applied to each entry. |
Cooking by dry heat in or as if in an oven |
Previously cooked or heat-treated fodd, heated again in order to raise its temperature (all different techniques) |
turkey breast with bones frozen barbecued without skin This facet allows recording different characteristics of the food: preservation treatments a food item underwent, technological steps or treatments applied while producing a food item, the way a food item has been heat treated before consumption and the way a food item has been prepared for final consumption (particularly needed for consumption surveys and includes preparation (like battering or breading) as well as heat treatment steps). More (none contradicting) descriptors can be applied to each entry. |
Preserving by freezing sufficiently rapidly to avoid spoilage and microbial growth |
Drying to a water content low enough to guarantee microbiological stability, but still keeping a relatively soft structure (often used for fruit) |
yoghurt flavoured cow blueberry sweetened with sugar sucrose whole in glass commercial supermarket shop organic shop brand product name This facet provides some principal claims related to important nutrients-ingredients, like fat, sugar etc. It is not intended to include health claims or similar. The present guidance provides a limited list, to be eventually improved during the evolution of the system. More than one descriptor can be applied to each entry, provided they are not contradicting each other. |
The food item has all the natural (or average expected )fat content (for milk, at least the value defined in legislation, when available). In the case of cheese, the fat on the dry matter is 45-60% |
The food item has an almost completely reduced amount of fat, with respect to the expected natural fat content (for milk, at least the value defined in legislation, when available). For meat, this is the entry for what is commercially intended as 'lean' meat, where fat is not visible.In the case of cheese, the fat on the dry matter is 10-25% |
MultipleNegativesRankingLoss
with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy
: stepsper_device_train_batch_size
: 48per_device_eval_batch_size
: 48fp16
: Truemulti_dataset_batch_sampler
: round_robinoverwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 48per_device_eval_batch_size
: 48per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
: auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robinEpoch | Step | Training Loss | cosine_ndcg@10 |
---|---|---|---|
0 | 0 | - | 0.0266 |
0.0196 | 500 | 1.5739 | - |
0.0392 | 1000 | 0.9043 | - |
0.0587 | 1500 | 0.8234 | - |
0.0783 | 2000 | 0.7861 | - |
0.0979 | 2500 | 0.7628 | - |
0.1175 | 3000 | 0.7348 | - |
0.1371 | 3500 | 0.7184 | - |
0.1566 | 4000 | 0.7167 | - |
0.1762 | 4500 | 0.7002 | - |
0.1958 | 5000 | 0.6791 | 0.9264 |
0.2154 | 5500 | 0.6533 | - |
0.2350 | 6000 | 0.6628 | - |
0.2545 | 6500 | 0.6637 | - |
0.2741 | 7000 | 0.639 | - |
0.2937 | 7500 | 0.6395 | - |
0.3133 | 8000 | 0.6358 | - |
0.3329 | 8500 | 0.617 | - |
0.3524 | 9000 | 0.6312 | - |
0.3720 | 9500 | 0.6107 | - |
0.3916 | 10000 | 0.6083 | 0.9518 |
0.4112 | 10500 | 0.6073 | - |
0.4307 | 11000 | 0.601 | - |
0.4503 | 11500 | 0.6047 | - |
0.4699 | 12000 | 0.5986 | - |
0.4895 | 12500 | 0.5913 | - |
0.5091 | 13000 | 0.5992 | - |
0.5286 | 13500 | 0.5911 | - |
0.5482 | 14000 | 0.5923 | - |
0.5678 | 14500 | 0.5816 | - |
0.5874 | 15000 | 0.582 | 0.9628 |
0.6070 | 15500 | 0.5815 | - |
0.6265 | 16000 | 0.5827 | - |
0.6461 | 16500 | 0.5885 | - |
0.6657 | 17000 | 0.5737 | - |
0.6853 | 17500 | 0.577 | - |
0.7049 | 18000 | 0.5687 | - |
0.7244 | 18500 | 0.5744 | - |
0.7440 | 19000 | 0.5774 | - |
0.7636 | 19500 | 0.5792 | - |
0.7832 | 20000 | 0.5645 | 0.9739 |
0.8028 | 20500 | 0.5769 | - |
0.8223 | 21000 | 0.5659 | - |
0.8419 | 21500 | 0.5635 | - |
0.8615 | 22000 | 0.5677 | - |
0.8811 | 22500 | 0.5693 | - |
0.9007 | 23000 | 0.5666 | - |
0.9202 | 23500 | 0.5526 | - |
0.9398 | 24000 | 0.5591 | - |
0.9594 | 24500 | 0.563 | - |
0.9790 | 25000 | 0.555 | 0.9808 |
0.9986 | 25500 | 0.5585 | - |
1.0 | 25537 | - | 0.9811 |
1.0181 | 26000 | 0.5595 | - |
1.0377 | 26500 | 0.5507 | - |
1.0573 | 27000 | 0.5582 | - |
1.0769 | 27500 | 0.5543 | - |
1.0964 | 28000 | 0.5598 | - |
1.1160 | 28500 | 0.5613 | - |
1.1356 | 29000 | 0.5457 | - |
1.1552 | 29500 | 0.5524 | - |
1.1748 | 30000 | 0.5324 | 0.9836 |
1.1943 | 30500 | 0.5531 | - |
1.2139 | 31000 | 0.5505 | - |
1.2335 | 31500 | 0.5623 | - |
1.2531 | 32000 | 0.5505 | - |
1.2727 | 32500 | 0.5583 | - |
1.2922 | 33000 | 0.548 | - |
1.3118 | 33500 | 0.5485 | - |
1.3314 | 34000 | 0.5509 | - |
1.3510 | 34500 | 0.54 | - |
1.3706 | 35000 | 0.5478 | 0.9835 |
1.3901 | 35500 | 0.5416 | - |
1.4097 | 36000 | 0.5438 | - |
1.4293 | 36500 | 0.543 | - |
1.4489 | 37000 | 0.547 | - |
1.4685 | 37500 | 0.5362 | - |
1.4880 | 38000 | 0.5536 | - |
1.5076 | 38500 | 0.5356 | - |
1.5272 | 39000 | 0.5382 | - |
1.5468 | 39500 | 0.5481 | - |
1.5664 | 40000 | 0.5302 | 0.9880 |
1.5859 | 40500 | 0.5275 | - |
1.6055 | 41000 | 0.5327 | - |
1.6251 | 41500 | 0.5414 | - |
1.6447 | 42000 | 0.5354 | - |
1.6643 | 42500 | 0.536 | - |
1.6838 | 43000 | 0.5364 | - |
1.7034 | 43500 | 0.5391 | - |
1.7230 | 44000 | 0.5342 | - |
1.7426 | 44500 | 0.5369 | - |
1.7621 | 45000 | 0.5387 | 0.9894 |
1.7817 | 45500 | 0.5312 | - |
1.8013 | 46000 | 0.5297 | - |
1.8209 | 46500 | 0.5222 | - |
1.8405 | 47000 | 0.5255 | - |
1.8600 | 47500 | 0.5379 | - |
1.8796 | 48000 | 0.5317 | - |
1.8992 | 48500 | 0.5312 | - |
1.9188 | 49000 | 0.5307 | - |
1.9384 | 49500 | 0.5375 | - |
1.9579 | 50000 | 0.527 | 0.9908 |
1.9775 | 50500 | 0.538 | - |
1.9971 | 51000 | 0.5312 | - |
2.0 | 51074 | - | 0.9911 |
2.0167 | 51500 | 0.5346 | - |
2.0363 | 52000 | 0.5279 | - |
2.0558 | 52500 | 0.517 | - |
2.0754 | 53000 | 0.5193 | - |
2.0950 | 53500 | 0.5286 | - |
2.1146 | 54000 | 0.5229 | - |
2.1342 | 54500 | 0.5183 | - |
2.1537 | 55000 | 0.5194 | 0.9915 |
2.1733 | 55500 | 0.5362 | - |
2.1929 | 56000 | 0.5186 | - |
2.2125 | 56500 | 0.5202 | - |
2.2321 | 57000 | 0.5276 | - |
2.2516 | 57500 | 0.5266 | - |
2.2712 | 58000 | 0.5334 | - |
2.2908 | 58500 | 0.5206 | - |
2.3104 | 59000 | 0.5229 | - |
2.3300 | 59500 | 0.5111 | - |
2.3495 | 60000 | 0.5175 | 0.9928 |
2.3691 | 60500 | 0.5235 | - |
2.3887 | 61000 | 0.5127 | - |
2.4083 | 61500 | 0.5291 | - |
2.4278 | 62000 | 0.5122 | - |
2.4474 | 62500 | 0.5196 | - |
2.4670 | 63000 | 0.5159 | - |
2.4866 | 63500 | 0.5207 | - |
2.5062 | 64000 | 0.5157 | - |
2.5257 | 64500 | 0.5094 | - |
2.5453 | 65000 | 0.5283 | 0.9937 |
2.5649 | 65500 | 0.5256 | - |
2.5845 | 66000 | 0.524 | - |
2.6041 | 66500 | 0.5324 | - |
2.6236 | 67000 | 0.5132 | - |
2.6432 | 67500 | 0.5203 | - |
2.6628 | 68000 | 0.5224 | - |
2.6824 | 68500 | 0.5255 | - |
2.7020 | 69000 | 0.5132 | - |
2.7215 | 69500 | 0.525 | - |
2.7411 | 70000 | 0.5257 | 0.9936 |
2.7607 | 70500 | 0.5206 | - |
2.7803 | 71000 | 0.514 | - |
2.7999 | 71500 | 0.5175 | - |
2.8194 | 72000 | 0.5245 | - |
2.8390 | 72500 | 0.5144 | - |
2.8586 | 73000 | 0.5246 | - |
2.8782 | 73500 | 0.5227 | - |
2.8978 | 74000 | 0.5199 | - |
2.9173 | 74500 | 0.5216 | - |
2.9369 | 75000 | 0.5253 | 0.9936 |
2.9565 | 75500 | 0.5303 | - |
2.9761 | 76000 | 0.5148 | - |
2.9957 | 76500 | 0.5248 | - |
3.0 | 76611 | - | 0.9936 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-m3