SeamlessM4T-v2 Bahnar-Vietnamese S2TT

This model is a fine-tuned version of facebook/seamless-m4t-v2-large for Speech-to-Text Translation (S2TT) from Bahnar to Vietnamese.

Model Details

Base model: facebook/seamless-m4t-v2-large
Task: Speech-to-Text Translation (S2TT)
Source language: Bahnar (bdq)
Target language: Vietnamese (vie)

Note This model only supports the Speech-to-Text Translation (S2TT) task.

Dataset

This model was trained on the Bahnar Speech Translation Dataset.

The dataset was curated from internet sources and processed using automatic alignment techniques. It contains Bahnar speech audio paired with Vietnamese translations.

For more details on the data creation process, please refer to the dataset README and repository below.

Usage with Transformers

import torch
import soundfile as sf
from transformers import AutoProcessor, SeamlessM4Tv2ForSpeechToText

model_id = "cuong06/seamlessm4t-v2-Bahnar-Vietnamese"

processor = AutoProcessor.from_pretrained(model_id)
model = SeamlessM4Tv2ForSpeechToText.from_pretrained(model_id)

audio, sampling_rate = sf.read("sample.wav")

inputs = processor(
    audio=audio,
    sampling_rate=sampling_rate,
    return_tensors="pt"
)

with torch.no_grad():
    predicted_ids = model.generate(
        **inputs,
        tgt_lang="vie"
    )

translation = processor.batch_decode(
    predicted_ids,
    skip_special_tokens=True
)[0]

print(translation)

Evaluation

Results on the test set using beam search (beam_size=5):

Metric	Score
BLEU	24.58

sacreBLEU Signature

nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.6.

Limitations

The dataset was automatically collected and aligned from internet sources, so some noisy samples may remain.
Performance may degrade on unseen dialects, noisy audio, or long-form speech.
This model is intended only for Bahnar → Vietnamese speech translation.

Citation

If you use this model or the dataset, please cite the repository and dataset.

Repository

@misc{bahnar_vietnamese_s2tt,
  author = {Dam Cuong},
  title = {Bahnar-Vietnamese Speech-to-Text Translation},
  year = {2026},
  howpublished = {\url{https://github.com/damcuong8/Bahnar-Vietnamese-S2TT}}
}

Model tree for cuong06/seamlessm4t-v2-Bahnar-Vietnamese

Base model

facebook/seamless-m4t-v2-large

Finetuned

(17)

this model

Evaluation results

BLEU (nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.6.) on Bahnar Speech Translation Dataset
self-reported

24.580

cuong06
/

seamlessm4t-v2-Bahnar-Vietnamese