MTEB(por) — Random Baseline Encoder

⚠️ This is NOT a trained model. It is the chance-level floor reference for the MTEB(por, v2) Brazilian-Portuguese embedding benchmark.

It maps each input text to a deterministic, L2-normalized random vector (seeded by a hash of the text). It carries zero semantic signal — two textually-different but semantically-similar sentences get unrelated vectors — so it scores at chance level on every task family (STS, retrieval, classification, clustering, reranking, regression).

Why a random baseline?

Interpretability — it anchors every number. Is 0.30 on a retrieval task good or near-random? Only the floor answers that.
Task discrimination — if a real model scores near the floor on a task, that task does not discriminate. A concrete empirical sanity check.
Convention — mirrors mteb/baseline-random-encoder from the upstream MTEB leaderboard.

Design

Each text t → rng = numpy.random.default_rng(sha256("42|" + t)) → v = rng.standard_normal(768) → v / ‖v‖.
Deterministic per text (fully reproducible), dim 768, seed 42.
No weights, no GPU, no training.

Reproduce

import hashlib
import numpy as np

DIM, SEED = 768, 42

def encode(texts: list[str]) -> np.ndarray:
    """Deterministic per-text L2-normalized random vectors (chance-level floor)."""
    out = np.empty((len(texts), DIM), dtype=np.float32)
    for i, t in enumerate(texts):
        h = int(hashlib.sha256((str(SEED) + "|" + (t or "")).encode()).hexdigest(), 16) % (2**32)
        v = np.random.default_rng(h).standard_normal(DIM).astype(np.float32)
        out[i] = v / (np.linalg.norm(v) + 1e-9)
    return out

The full evaluation script (run_random_baseline.py, using the same pinned-revision MTEB(por) tasks as the benchmarked models) is included in this repo.

Floor scores — MTEB(por, v2)

Retrieval (nDCG@10)

Task	Floor
MedPTRetrieval	0.0083
FaQuADIR	0.0235
Quati	0.0
FaqBacenRetrieval	0.0027
JurisTCU	0.0
BRTaxQAR	0.0129

Reranking (MAP)

Task	Floor
QuatiReranking	0.1804
JurisTCUReranking	0.1434
PortuLexRRIP	0.1415

STS (Spearman)

Task	Floor
AssinSTS	0.005
Assin2STS	-0.0288

Pair classification (AP)

Task	Floor
AssinRTE	0.2328
InferBR	0.3556

Classification (acc/AP)

Task	Floor
HateBR	0.5016
ToxSynPT	0.495
FactckBrClassification	0.322
OlidBrMultilabelClassification	0.2035
BrighterEmotionMultilabelClassification	0.2027

Clustering (V-measure)

Task	Floor
MedPTClustering	0.5289
WikipediaPTCategoriesClusteringP2P	0.3248
JurisTCUClusteringP2P	0.1225
SciELOClusteringP2P	0.0859
StackoverflowPtClustering	0.3353
CamaraProposicoesClustering	0.4912

Regression (Spearman)

Task	Floor
BrighterEmotionIntensityRegression	0.0223
EnemEssayRegression	-0.0783
NarrativeEssaysBRRegression	-0.0526

Floor is non-zero for clustering (the V-measure of a random partition is not 0) and for classification (chance ≈ 1/num-classes); real models score well above it on every task.

Citation

Part of the MTEB(por) benchmark by the mteb-pt project. The floor is computed with the identical pinned-SHA tasks used for every benchmarked model.

Downloads last month: -; Downloads are not tracked for this model. How to track