ddmitov/bge_m3_dense_colbert_onnx

This model is a version of the BGE-M3 model converted to ONNX weights with HF Optimum for compatibility with ONNX Runtime.

It is based on the conversion scripts and the documentation of the bge-m3-onnx model by Aapo Tanskanen.

This ONNX model outputs dense and ColBERT embedding representations all at once. The output is a list of numpy arrays in previously mentioned order of representations.

Note: dense and ColBERT embeddings are normalized like the default behavior in the original FlagEmbedding library.

Usage with ONNX Runtime (Python)

Install the necessary modules:

pip install huggingface-hub onnxruntime transformers

You can then use the model to compute embeddings, as follows:

from huggingface_hub import hf_hub_download
import onnxruntime as ort
from transformers import AutoTokenizer

hf_hub_download(
    repo_id="ddmitov/bge_m3_dense_colbert_onnx",
    filename="model.onnx",
    local_dir="/tmp",
    repo_type="model"
)

hf_hub_download(
    repo_id="ddmitov/bge_m3_dense_colbert_onnx",
    filename="model.onnx_data",
    local_dir="/tmp",
    repo_type="model"
)

tokenizer = AutoTokenizer.from_pretrained("ddmitov/bge_m3_dense_colbert_onnx")

ort_session = ort.InferenceSession("/tmp/model.onnx")

inputs = tokenizer(
  "BGE M3 is an embedding model supporting dense retrieval and lexical matching.",
  padding="longest",
  return_tensors="np"
)

inputs_onnx = {
  key: ort.OrtValue.ortvalue_from_numpy(value) for key, value in inputs.items()
}

outputs = ort_session.run(None, inputs_onnx)

print(f"Number of Dense Vectors: {len(outputs[0])}")
print(f"Dense Vector Length: {len(outputs[0][0])}")
print("")
print(f"Number of ColBERT Vectors: {len(outputs[1][0])}")
print(f"ColBERT vector length: {len(outputs[1][0][0])}")

# Expected output:

# Number of Dense Vectors: 1
# Dense Vector Length: 1024

# Number of ColBERT Vectors: 24
# ColBERT vector length: 1024