Edit model card

This model is a version of the BGE-M3 model converted to ONNX weights with HF Optimum for compatibility with ONNX Runtime.

It is based on the conversion scripts and the documentation of the bge-m3-onnx model by Aapo Tanskanen.

This ONNX model outputs dense and ColBERT embedding representations all at once. The output is a list of numpy arrays in previously mentioned order of representations.

Note: dense and ColBERT embeddings are normalized like the default behavior in the original FlagEmbedding library.

Usage with ONNX Runtime (Python)

Install the necessary modules:

pip install huggingface-hub onnxruntime transformers

You can then use the model to compute embeddings, as follows:

from huggingface_hub import hf_hub_download
import onnxruntime as ort
from transformers import AutoTokenizer

hf_hub_download(
    repo_id="ddmitov/bge_m3_dense_colbert_onnx",
    filename="model.onnx",
    local_dir="/tmp",
    repo_type="model"
)

hf_hub_download(
    repo_id="ddmitov/bge_m3_dense_colbert_onnx",
    filename="model.onnx_data",
    local_dir="/tmp",
    repo_type="model"
)

tokenizer = AutoTokenizer.from_pretrained("ddmitov/bge_m3_dense_colbert_onnx")

ort_session = ort.InferenceSession("/tmp/model.onnx")

inputs = tokenizer(
  "BGE M3 is an embedding model supporting dense retrieval and lexical matching.",
  padding="longest",
  return_tensors="np"
)

inputs_onnx = {
  key: ort.OrtValue.ortvalue_from_numpy(value) for key, value in inputs.items()
}

outputs = ort_session.run(None, inputs_onnx)

print(f"Number of Dense Vectors: {len(outputs[0])}")
print(f"Dense Vector Length: {len(outputs[0][0])}")
print("")
print(f"Number of ColBERT Vectors: {len(outputs[1][0])}")
print(f"ColBERT vector length: {len(outputs[1][0][0])}")

# Expected output:

# Number of Dense Vectors: 1
# Dense Vector Length: 1024

# Number of ColBERT Vectors: 24
# ColBERT vector length: 1024
Downloads last month
42
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.