ModernBERT Medical Safety Classifier

The ModernBERT Medical Safety Classifier is a transformer-based language model fine-tuned to assess the safety and ethical standards of medical texts across diverse medical domains. Built on top of the ModernBERT architecture, it leverages the powerful evaluations of Llama 3.1 (70B) to distill that model’s safety and ethical insights into a significantly smaller and faster classifier. Specifically, it was trained on a newly curated, balanced subset of The Blue Scrubs dataset (a total of 83,636 documents), each annotated by Llama 3.1 (70B) for safety and ethical adherence. By transferring these large-model evaluations into ModernBERT, the resulting classifier retains robust predictive accuracy while remaining lightweight enough for real-time or resource-constrained inference.

Model Details

Developed by: TheBlueScrubs
Model Type: Transformer-based language model
Language: English
License: Apache-2.0
Base Model: answerdotai/ModernBERT-base

ModernBERT is an advanced encoder-only model that incorporates recent innovations such as Rotary Positional Embeddings, local–global alternating attention, and Flash Attention, enabling efficient inference and an extended context window of up to 8,192 tokens.

Intended Uses & Limitations

Intended Uses

This model is designed to classify medical texts based on safety and ethical standards, particularly focusing on cancer-related content. It can be utilized to assess the safety of medical documents, ensuring compliance with established ethical guidelines.

Limitations

While the model has been trained on a substantial corpus of cancer-specific texts, its performance on medical domains outside of oncology has not been evaluated. Users should exercise caution when applying the model to non-cancer-related medical content.

How to Use

To utilize this model for safety classification, you can employ the HF中国镜像站 Transformers library as follows:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("TheBlueScrubs/ModernBERT-base-TBS")
model = AutoModelForSequenceClassification.from_pretrained("TheBlueScrubs/ModernBERT-base-TBS")

# Example text
text = "Your medical text here."

# Tokenize input
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=4096)

# Get model predictions
outputs = model(**inputs)
predictions = outputs.logits

# Interpret predictions
safety_score = predictions.item()
print(f"Safety Score: {safety_score}")

Training Data

Replace with (updated text):

The model was re-trained on a new, balanced subset drawn from The Blue Scrubs dataset to address the overrepresentation of high-safety texts. Specifically:

We scanned a total of 11,500,608 rows across all files and removed 112,330 rows for parse/NaN/0/out-of-range issues, leaving 11,388,278 valid rows.

Of these valid rows, 41,818 had a safety score ≤ 2, while 11,346,460 had a safety score > 2.

To balance the dataset, we randomly sampled documents so that unsafe (≤ 2) and safer (> 2) texts were equally represented. This yielded a final balanced set of 83,636 total rows.

Each row retained its original continuous safety score from Llama 3.1 (70B), ranging from 1 (least safe) to 5 (most safe). These scores again served as regression targets during training.

Training Procedure

Preprocessing

Texts were tokenized using the ModernBERT tokenizer with a maximum sequence length of 4,096 tokens. No additional filtering was applied, as the data was considered trustworthy.

Training Hyperparameters

Learning Rate: 2e-5
Number of Epochs: 5
Batch Size: 20 (per device)
Gradient Accumulation Steps: 8
Optimizer: AdamW
Weight Decay: 0.01
FP16 Training: Enabled
Total Training Steps: Now ~5 epochs over the final balanced set

All other hyperparameter settings (e.g., batch size, optimizer choice) remained the same as in the previous training. Only the learning rate, the number of epochs, and the balanced dataset were changed.

Evaluation

Testing Data

The model's performance was evaluated on an out-of-sample test set comprising cancer-related documents from The Blue Scrubs dataset that were not included in the training set.

Metrics

Mean Squared Error (MSE): Measures the average squared difference between the predicted and actual safety scores.
Accuracy: Determined by binarizing predictions (unsafe ≤ 2 vs. safe > 2).
ROC Analysis: Assesses the model's ability to distinguish between safe and unsafe content.

Results

MSE: 0.489
RMSE: 0.699
Accuracy: 0.9642
ROC Analysis: Demonstrated robust classification capability with high True Positive Rates and low False Positive Rates.

Bias, Risks, and Limitations

This model was trained on a curated subset of The Blue Scrubs dataset encompassing various medical domains, yet some areas may remain underrepresented. As with any model, there is a risk of bias stemming from data composition, and users should exercise caution when applying the classifier, especially in highly specialized contexts. Outputs should always be corroborated with expert opinion and current clinical guidelines to ensure safe, accurate medical usage.

Recommendations

Users should validate the model's performance on their specific datasets and consider fine-tuning the model on domain-specific data if necessary. Continuous monitoring and evaluation are recommended to ensure the model's predictions align with current medical standards and ethical guidelines.

Citation

If you utilize this model in your research or applications, please cite it as follows:

@misc{thebluescrubs2025modernbert,
  author = {TheBlueScrubs},
  title = {ModernBERT Medical Safety Classifier},
  year = {2025},
  publisher = {HF中国镜像站},
  url = {https://https://huggingface.co/TheBlueScrubs/ModernBERT-base-TBS}
}

Model Card Authors

TheBlueScrubs Team

TheBlueScrubs
/

ModernBERT-base-TBS