BERT Fine-Tuned for Named Entity Recognition (CoNLL-2003)

This model recognizes named entities in English text: People, Organizations, Locations, and Miscellaneous entities.

Model Details

Base model: bert-base-cased
Dataset: CoNLL-2003 (14,041 training sentences from Reuters news)
Task: Named Entity Recognition (token classification)
Framework: PyTorch + HuggingFace Transformers

Entity Types

Label	Meaning	Example
PER	Person names	Barack Obama, Elon Musk
ORG	Organizations	Apple Inc., United Nations
LOC	Locations	New York, Mount Everest
MISC	Miscellaneous	English, FIFA World Cup

Performance (CoNLL-2003 Test Set)

Metric	Score
F1 Score	0.9116
Precision	0.9041
Recall	0.9192
Accuracy	0.9827

How to Use

from transformers import pipeline

# Load the model
ner = pipeline(
    "token-classification",
    model="samandar1105/named_entity-recognition",
    aggregation_strategy="simple"
)

# Run inference
result = ner("Elon Musk founded SpaceX in Hawthorne, California.")
print(result)
# [
#   {'entity_group': 'PER', 'word': 'Elon Musk', 'score': 0.998},
#   {'entity_group': 'ORG', 'word': 'SpaceX', 'score': 0.997},
#   {'entity_group': 'LOC', 'word': 'Hawthorne', 'score': 0.995},
#   {'entity_group': 'LOC', 'word': 'California', 'score': 0.994},
# ]

Training Details

Learning rate: 2e-5
Epochs: 4
Batch size: 16
Max sequence length: 128
Warmup ratio: 0.1
Weight decay: 0.01
Label alignment: First-subword strategy with -100 for continuation subwords
Evaluation: seqeval (entity-level strict span matching)

Downloads last month: 58

Safetensors

Model size

0.1B params

Tensor type

F32

Dataset used to train samandar1105/named_entity-recognition

Space using samandar1105/named_entity-recognition 1

Evaluation results

f1 on CoNLL-2003
self-reported

0.912
precision on CoNLL-2003
self-reported

0.904
recall on CoNLL-2003
self-reported

0.919