Edit model card

SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("alperctnkaya/bge-m3-distilled-en-tr")
# Run inference
sentences = [
    'Nippon Paint Garden Furniture Maintenance Oil is a high quality maintenance oil produced with a mixture of specially selected natural oils, specially developed for the care of hard woods such as teak, and can be applied to other wood types.',
    'Nippon Paint Bahçe Mobilyası Bakım Yağı, özel olarak seçilen doğal yağların karışımı ile üretilen, özellikle teak gibi sert ahşapların bakımı için özel olarak geliştirilmiş, diğer ahşap türlerine de uygulanabilen üstün nitelikli bakım yağıdır.',
    'Stover Gönüllü Faaliyet Ödülüne layık görülenlerin her biri, kâr amacı gütmeyen kendi seçtiği bir kuruluşa ödenmek üzere verilen 5000 Amerikan doları değerinde bir çeki içeren bir hatıra ödülünün yanı sıra, resmi bir törende genel başkan ve CEO tarafından verilen özel bir takdirnameye hak kazanacaktır.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

Metric Value
negative_mse -0.039

Translation

Metric Value
src2trg_accuracy 0.8951
trg2src_accuracy 0.8837
mean_accuracy 0.8894

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss eval_mean_accuracy eval_negative_mse
0.02 100 0.0019 - - -
0.04 200 0.0013 - - -
0.06 300 0.0008 - - -
0.08 400 0.0008 - - -
0.1 500 0.0008 - - -
0.12 600 0.0007 - - -
0.14 700 0.0007 - - -
0.16 800 0.0007 - - -
0.18 900 0.0007 - - -
0.2 1000 0.0007 - - -
0.22 1100 0.0007 - - -
0.24 1200 0.0006 - - -
0.26 1300 0.0006 - - -
0.28 1400 0.0006 - - -
0.3 1500 0.0006 - - -
0.32 1600 0.0006 - - -
0.34 1700 0.0006 - - -
0.36 1800 0.0006 - - -
0.38 1900 0.0006 - - -
0.4 2000 0.0006 - - -
0.42 2100 0.0006 - - -
0.44 2200 0.0006 - - -
0.46 2300 0.0005 - - -
0.48 2400 0.0005 - - -
0.5 2500 0.0005 - - -
0.52 2600 0.0005 - - -
0.54 2700 0.0005 - - -
0.56 2800 0.0005 - - -
0.58 2900 0.0005 - - -
0.6 3000 0.0005 - - -
0.62 3100 0.0005 - - -
0.64 3200 0.0005 - - -
0.66 3300 0.0005 - - -
0.68 3400 0.0005 - - -
0.7 3500 0.0005 - - -
0.72 3600 0.0005 - - -
0.74 3700 0.0005 - - -
0.76 3800 0.0005 - - -
0.78 3900 0.0005 - - -
0.8 4000 0.0005 - - -
0.82 4100 0.0005 - - -
0.84 4200 0.0005 - - -
0.86 4300 0.0005 - - -
0.88 4400 0.0005 - - -
0.9 4500 0.0005 - - -
0.92 4600 0.0005 - - -
0.94 4700 0.0005 - - -
0.96 4800 0.0005 - - -
0.98 4900 0.0005 - - -
1.0 5000 0.0005 0.0004 0.8591 -0.0453
1.02 5100 0.0005 - - -
1.04 5200 0.0005 - - -
1.06 5300 0.0004 - - -
1.08 5400 0.0004 - - -
1.1 5500 0.0004 - - -
1.12 5600 0.0004 - - -
1.1400 5700 0.0004 - - -
1.16 5800 0.0004 - - -
1.18 5900 0.0004 - - -
1.2 6000 0.0004 - - -
1.22 6100 0.0004 - - -
1.24 6200 0.0004 - - -
1.26 6300 0.0004 - - -
1.28 6400 0.0004 - - -
1.3 6500 0.0004 - - -
1.32 6600 0.0004 - - -
1.34 6700 0.0004 - - -
1.3600 6800 0.0004 - - -
1.38 6900 0.0004 - - -
1.4 7000 0.0004 - - -
1.42 7100 0.0004 - - -
1.44 7200 0.0004 - - -
1.46 7300 0.0004 - - -
1.48 7400 0.0004 - - -
1.5 7500 0.0004 - - -
1.52 7600 0.0004 - - -
1.54 7700 0.0004 - - -
1.56 7800 0.0004 - - -
1.58 7900 0.0004 - - -
1.6 8000 0.0004 - - -
1.62 8100 0.0004 - - -
1.6400 8200 0.0004 - - -
1.6600 8300 0.0004 - - -
1.6800 8400 0.0004 - - -
1.7 8500 0.0004 - - -
1.72 8600 0.0004 - - -
1.74 8700 0.0004 - - -
1.76 8800 0.0004 - - -
1.78 8900 0.0004 - - -
1.8 9000 0.0004 - - -
1.8200 9100 0.0004 - - -
1.8400 9200 0.0004 - - -
1.8600 9300 0.0004 - - -
1.88 9400 0.0004 - - -
1.9 9500 0.0004 - - -
1.92 9600 0.0004 - - -
1.94 9700 0.0004 - - -
1.96 9800 0.0004 - - -
1.98 9900 0.0004 - - -
2.0 10000 0.0004 0.0004 0.8837 -0.0405
2.02 10100 0.0004 - - -
2.04 10200 0.0004 - - -
2.06 10300 0.0004 - - -
2.08 10400 0.0004 - - -
2.1 10500 0.0004 - - -
2.12 10600 0.0004 - - -
2.14 10700 0.0004 - - -
2.16 10800 0.0004 - - -
2.18 10900 0.0004 - - -
2.2 11000 0.0004 - - -
2.22 11100 0.0004 - - -
2.24 11200 0.0004 - - -
2.26 11300 0.0004 - - -
2.2800 11400 0.0004 - - -
2.3 11500 0.0004 - - -
2.32 11600 0.0004 - - -
2.34 11700 0.0004 - - -
2.36 11800 0.0004 - - -
2.38 11900 0.0004 - - -
2.4 12000 0.0004 - - -
2.42 12100 0.0004 - - -
2.44 12200 0.0004 - - -
2.46 12300 0.0004 - - -
2.48 12400 0.0004 - - -
2.5 12500 0.0004 - - -
2.52 12600 0.0004 - - -
2.54 12700 0.0004 - - -
2.56 12800 0.0004 - - -
2.58 12900 0.0004 - - -
2.6 13000 0.0004 - - -
2.62 13100 0.0004 - - -
2.64 13200 0.0004 - - -
2.66 13300 0.0004 - - -
2.68 13400 0.0004 - - -
2.7 13500 0.0004 - - -
2.7200 13600 0.0004 - - -
2.74 13700 0.0004 - - -
2.76 13800 0.0004 - - -
2.7800 13900 0.0004 - - -
2.8 14000 0.0004 - - -
2.82 14100 0.0004 - - -
2.84 14200 0.0004 - - -
2.86 14300 0.0004 - - -
2.88 14400 0.0004 - - -
2.9 14500 0.0004 - - -
2.92 14600 0.0004 - - -
2.94 14700 0.0004 - - -
2.96 14800 0.0004 - - -
2.98 14900 0.0004 - - -
3.0 15000 0.0004 0.0004 0.8894 -0.0390

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.34.2
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}
Downloads last month
0
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for alperctnkaya/bge-m3-distilled-en-tr

Base model

BAAI/bge-m3
Finetuned
(87)
this model

Evaluation results