SeamlessM4T-v2 Bahnar-Vietnamese S2TT

This model is a fine-tuned version of facebook/seamless-m4t-v2-large for Speech-to-Text Translation (S2TT) from Bahnar to Vietnamese.

Model Details

  • Base model: facebook/seamless-m4t-v2-large
  • Task: Speech-to-Text Translation (S2TT)
  • Source language: Bahnar (bdq)
  • Target language: Vietnamese (vie)

Note This model only supports the Speech-to-Text Translation (S2TT) task.

Dataset

This model was trained on the Bahnar Speech Translation Dataset.

The dataset was curated from internet sources and processed using automatic alignment techniques. It contains Bahnar speech audio paired with Vietnamese translations.

For more details on the data creation process, please refer to the dataset README and repository below.

Usage with Transformers

import torch
import soundfile as sf
from transformers import AutoProcessor, SeamlessM4Tv2ForSpeechToText

model_id = "cuong06/seamlessm4t-v2-Bahnar-Vietnamese"

processor = AutoProcessor.from_pretrained(model_id)
model = SeamlessM4Tv2ForSpeechToText.from_pretrained(model_id)

audio, sampling_rate = sf.read("sample.wav")

inputs = processor(
    audio=audio,
    sampling_rate=sampling_rate,
    return_tensors="pt"
)

with torch.no_grad():
    predicted_ids = model.generate(
        **inputs,
        tgt_lang="vie"
    )

translation = processor.batch_decode(
    predicted_ids,
    skip_special_tokens=True
)[0]

print(translation)

Evaluation

Results on the test set using beam search (beam_size=5):

Metric Score
BLEU 24.58

sacreBLEU Signature

nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.6.

Limitations

  • The dataset was automatically collected and aligned from internet sources, so some noisy samples may remain.
  • Performance may degrade on unseen dialects, noisy audio, or long-form speech.
  • This model is intended only for Bahnar → Vietnamese speech translation.

Citation

If you use this model or the dataset, please cite the repository and dataset.

Repository

@misc{bahnar_vietnamese_s2tt,
  author = {Dam Cuong},
  title = {Bahnar-Vietnamese Speech-to-Text Translation},
  year = {2026},
  howpublished = {\url{https://github.com/damcuong8/Bahnar-Vietnamese-S2TT}}
}

Links

Downloads last month
101
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cuong06/seamlessm4t-v2-Bahnar-Vietnamese

Finetuned
(17)
this model

Evaluation results

  • BLEU (nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.6.) on Bahnar Speech Translation Dataset
    self-reported
    24.580