---
license: mit
datasets:
- stanfordnlp/imdb
language:
- en
metrics:
- accuracy
base_model:
- google-bert/bert-base-uncased
---
# Fine-Tuned BERT for IMDB Sentiment Analysis

> **Author:** [Harsh Maniya](https://huggingface.co/harshhmaniya)  
> **Model Type:** Text Classification (Sentiment Analysis)  
> **Language:** English  

---

## Overview

This repository hosts a **BERT-based model** fine-tuned on the [IMDB movie reviews dataset](https://www.imdb.com/). The goal is to classify movie reviews as either **positive** or **negative** with high accuracy. 

- **Base Model:** `bert-base-uncased`  
- **Dataset:** IMDB (25,000 training samples, 25,000 testing samples)  
- **Task:** Binary Sentiment Classification  

If you want to quickly gauge whether a movie review is glowing or scathing, this model is for you!

---

## Model Architecture

- **Backbone:** [BERT](https://arxiv.org/abs/1810.04805) (Bidirectional Encoder Representations from Transformers)  
- **Classification Head:** A single linear layer on top of the pooled `[CLS]` token output for binary classification.

> **Why BERT?** BERT’s bidirectional training helps it capture context from both directions in a sentence, making it especially powerful for understanding nuances in text like movie reviews.

---

## Training Procedure

1. **Data Loading:**  
   - The IMDB dataset was loaded (from HF中国镜像站 Datasets or another source) with an even split of positive and negative reviews.
2. **Preprocessing:**  
   - Tokenization using the BERT tokenizer (`bert-base-uncased`), truncating/padding to a fixed length (e.g., 128 tokens).
3. **Hyperparameters (Example):**  
   - **Learning Rate:** 5e-5  
   - **Batch Size:** 8  
   - **Epochs:** 3  
   - **Optimizer:** Adam 
   - **Loss Function:** Sparse Categorical Cross-entropy
4. **Hardware:**  
   - Fine-tuned on a GPU (e.g., Google Colab or local machine with CUDA).
5. **Validation:**  
   - Periodic evaluation on the validation set to monitor accuracy and loss.

> **Notebook:** The entire fine-tuning process is documented in the [notebook](https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis/blob/main/imdb_reviews_bert.ipynb) included in this repository, so you can see exactly how training was performed.

---

## Evaluation and Performance

- **Accuracy:** ~**93%** on the IMDB test set. 
  
This performance indicates that the model handles most typical movie reviews well. However, it might still struggle with highly sarcastic or context-dependent reviews.

---

## How to Use

**In Python:**

```python
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

# Replace with your repository
model_name = "harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)

# Example text
review_text = "I absolutely loved this movie! The plot was gripping and the acting was top-notch."

# Prepare input
inputs = tokenizer(review_text, return_tensors="tf", truncation=True, padding=True)

# Perform inference
outputs = model(inputs)
logits = outputs.logits

# Convert logits to probabilities (softmax)
import tensorflow as tf
probs = tf.nn.softmax(logits, axis=-1)
pred_class = tf.argmax(probs, axis=-1).numpy()[0]

# Interpret results
label_map = {0: "Negative", 1: "Positive"}
print(f"Review Sentiment: {label_map[pred_class]}")
```

- **Positive:** Model predicts a favorable sentiment.
- **Negative:** Model predicts an unfavorable sentiment.

---

## Intended Use

- **Primary Use Case:** Classifying sentiment of English-language movie reviews.
- **Extended Use Cases:** General sentiment analysis tasks for product reviews, social media comments, or other short English texts (though performance may vary).

---

## Limitations and Biases

1. **Domain Specificity:** Trained primarily on movie reviews. May not generalize to other domains (e.g., financial or medical text) without further fine-tuning.  
2. **Language Support:** English only. Non-English text or text containing heavy slang/emojis may reduce performance.  
3. **Bias in Data:** IMDB reviews often contain colloquial language and potential biases from user-generated content. The model might inadvertently learn these biases.  
4. **Sarcasm and Nuance:** Subtle sarcasm or culturally specific references may be misclassified.

---

## Ethical Considerations

- **User-Generated Content:** The IMDB dataset contains user-submitted reviews. Some reviews may contain explicit or biased language.  
- **Misuse:** The model is intended for sentiment classification. Using it to make decisions about individuals or high-stakes scenarios without additional checks is **not recommended**.

---

## Model Card Author

- **Name:** Harsh Maniya  
- **Contact:** For questions or feedback, please open an [issue](https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis/discussions) or reach out directly via your preferred channel.
- **GitHub:** [Github](https://github.com/harshhmaniya)  
- **LinkedIn:** [Linkedin](https://www.linkedin.com/in/harsh-maniya/)


---

## Citation

If you use this model or reference the code in your research or project, please cite it as follows (adjust for your specific citation style):

```
@misc{Maniya2025IMDBBERT,
  title  = {Fine-Tuned BERT for IMDB Sentiment Analysis},
  author = {Harsh Maniya},
  year   = {2025},
  url    = {https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis}
}
```
---

### Thank You for Visiting!

We hope this model helps you classify movie reviews quickly and accurately. For more details, check out the [training notebook](https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis/blob/main/imdb_reviews_bert.ipynb), experiment with the model, and share your feedback!