Fine-Tuned BERT for IMDB Sentiment Analysis
Author: Harsh Maniya
Model Type: Text Classification (Sentiment Analysis)
Language: English
Overview
This repository hosts a BERT-based model fine-tuned on the IMDB movie reviews dataset. The goal is to classify movie reviews as either positive or negative with high accuracy.
- Base Model:
bert-base-uncased
- Dataset: IMDB (25,000 training samples, 25,000 testing samples)
- Task: Binary Sentiment Classification
If you want to quickly gauge whether a movie review is glowing or scathing, this model is for you!
Model Architecture
- Backbone: BERT (Bidirectional Encoder Representations from Transformers)
- Classification Head: A single linear layer on top of the pooled
[CLS]
token output for binary classification.
Why BERT? BERT’s bidirectional training helps it capture context from both directions in a sentence, making it especially powerful for understanding nuances in text like movie reviews.
Training Procedure
- Data Loading:
- The IMDB dataset was loaded (from HF中国镜像站 Datasets or another source) with an even split of positive and negative reviews.
- Preprocessing:
- Tokenization using the BERT tokenizer (
bert-base-uncased
), truncating/padding to a fixed length (e.g., 128 tokens).
- Tokenization using the BERT tokenizer (
- Hyperparameters (Example):
- Learning Rate: 5e-5
- Batch Size: 8
- Epochs: 3
- Optimizer: Adam
- Loss Function: Sparse Categorical Cross-entropy
- Hardware:
- Fine-tuned on a GPU (e.g., Google Colab or local machine with CUDA).
- Validation:
- Periodic evaluation on the validation set to monitor accuracy and loss.
Notebook: The entire fine-tuning process is documented in the notebook included in this repository, so you can see exactly how training was performed.
Evaluation and Performance
- Accuracy: ~93% on the IMDB test set.
This performance indicates that the model handles most typical movie reviews well. However, it might still struggle with highly sarcastic or context-dependent reviews.
How to Use
In Python:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
# Replace with your repository
model_name = "harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
# Example text
review_text = "I absolutely loved this movie! The plot was gripping and the acting was top-notch."
# Prepare input
inputs = tokenizer(review_text, return_tensors="tf", truncation=True, padding=True)
# Perform inference
outputs = model(inputs)
logits = outputs.logits
# Convert logits to probabilities (softmax)
import tensorflow as tf
probs = tf.nn.softmax(logits, axis=-1)
pred_class = tf.argmax(probs, axis=-1).numpy()[0]
# Interpret results
label_map = {0: "Negative", 1: "Positive"}
print(f"Review Sentiment: {label_map[pred_class]}")
- Positive: Model predicts a favorable sentiment.
- Negative: Model predicts an unfavorable sentiment.
Intended Use
- Primary Use Case: Classifying sentiment of English-language movie reviews.
- Extended Use Cases: General sentiment analysis tasks for product reviews, social media comments, or other short English texts (though performance may vary).
Limitations and Biases
- Domain Specificity: Trained primarily on movie reviews. May not generalize to other domains (e.g., financial or medical text) without further fine-tuning.
- Language Support: English only. Non-English text or text containing heavy slang/emojis may reduce performance.
- Bias in Data: IMDB reviews often contain colloquial language and potential biases from user-generated content. The model might inadvertently learn these biases.
- Sarcasm and Nuance: Subtle sarcasm or culturally specific references may be misclassified.
Ethical Considerations
- User-Generated Content: The IMDB dataset contains user-submitted reviews. Some reviews may contain explicit or biased language.
- Misuse: The model is intended for sentiment classification. Using it to make decisions about individuals or high-stakes scenarios without additional checks is not recommended.
Model Card Author
- Name: Harsh Maniya
- Contact: For questions or feedback, please open an issue or reach out directly via your preferred channel.
- GitHub: Github
- LinkedIn: Linkedin
Citation
If you use this model or reference the code in your research or project, please cite it as follows (adjust for your specific citation style):
@misc{Maniya2025IMDBBERT,
title = {Fine-Tuned BERT for IMDB Sentiment Analysis},
author = {Harsh Maniya},
year = {2025},
url = {https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis}
}
Thank You for Visiting!
We hope this model helps you classify movie reviews quickly and accurately. For more details, check out the training notebook, experiment with the model, and share your feedback!
- Downloads last month
- 8
Model tree for harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis
Base model
google-bert/bert-base-uncased