--- license: mit datasets: - stanfordnlp/imdb language: - en metrics: - accuracy base_model: - google-bert/bert-base-uncased --- # Fine-Tuned BERT for IMDB Sentiment Analysis > **Author:** [Harsh Maniya](https://huggingface.co/harshhmaniya) > **Model Type:** Text Classification (Sentiment Analysis) > **Language:** English --- ## Overview This repository hosts a **BERT-based model** fine-tuned on the [IMDB movie reviews dataset](https://www.imdb.com/). The goal is to classify movie reviews as either **positive** or **negative** with high accuracy. - **Base Model:** `bert-base-uncased` - **Dataset:** IMDB (25,000 training samples, 25,000 testing samples) - **Task:** Binary Sentiment Classification If you want to quickly gauge whether a movie review is glowing or scathing, this model is for you! --- ## Model Architecture - **Backbone:** [BERT](https://arxiv.org/abs/1810.04805) (Bidirectional Encoder Representations from Transformers) - **Classification Head:** A single linear layer on top of the pooled `[CLS]` token output for binary classification. > **Why BERT?** BERT’s bidirectional training helps it capture context from both directions in a sentence, making it especially powerful for understanding nuances in text like movie reviews. --- ## Training Procedure 1. **Data Loading:** - The IMDB dataset was loaded (from HF中国镜像站 Datasets or another source) with an even split of positive and negative reviews. 2. **Preprocessing:** - Tokenization using the BERT tokenizer (`bert-base-uncased`), truncating/padding to a fixed length (e.g., 128 tokens). 3. **Hyperparameters (Example):** - **Learning Rate:** 5e-5 - **Batch Size:** 8 - **Epochs:** 3 - **Optimizer:** Adam - **Loss Function:** Sparse Categorical Cross-entropy 4. **Hardware:** - Fine-tuned on a GPU (e.g., Google Colab or local machine with CUDA). 5. **Validation:** - Periodic evaluation on the validation set to monitor accuracy and loss. > **Notebook:** The entire fine-tuning process is documented in the [notebook](https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis/blob/main/imdb_reviews_bert.ipynb) included in this repository, so you can see exactly how training was performed. --- ## Evaluation and Performance - **Accuracy:** ~**93%** on the IMDB test set. This performance indicates that the model handles most typical movie reviews well. However, it might still struggle with highly sarcastic or context-dependent reviews. --- ## How to Use **In Python:** ```python from transformers import AutoTokenizer, TFAutoModelForSequenceClassification # Replace with your repository model_name = "harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_name) model = TFAutoModelForSequenceClassification.from_pretrained(model_name) # Example text review_text = "I absolutely loved this movie! The plot was gripping and the acting was top-notch." # Prepare input inputs = tokenizer(review_text, return_tensors="tf", truncation=True, padding=True) # Perform inference outputs = model(inputs) logits = outputs.logits # Convert logits to probabilities (softmax) import tensorflow as tf probs = tf.nn.softmax(logits, axis=-1) pred_class = tf.argmax(probs, axis=-1).numpy()[0] # Interpret results label_map = {0: "Negative", 1: "Positive"} print(f"Review Sentiment: {label_map[pred_class]}") ``` - **Positive:** Model predicts a favorable sentiment. - **Negative:** Model predicts an unfavorable sentiment. --- ## Intended Use - **Primary Use Case:** Classifying sentiment of English-language movie reviews. - **Extended Use Cases:** General sentiment analysis tasks for product reviews, social media comments, or other short English texts (though performance may vary). --- ## Limitations and Biases 1. **Domain Specificity:** Trained primarily on movie reviews. May not generalize to other domains (e.g., financial or medical text) without further fine-tuning. 2. **Language Support:** English only. Non-English text or text containing heavy slang/emojis may reduce performance. 3. **Bias in Data:** IMDB reviews often contain colloquial language and potential biases from user-generated content. The model might inadvertently learn these biases. 4. **Sarcasm and Nuance:** Subtle sarcasm or culturally specific references may be misclassified. --- ## Ethical Considerations - **User-Generated Content:** The IMDB dataset contains user-submitted reviews. Some reviews may contain explicit or biased language. - **Misuse:** The model is intended for sentiment classification. Using it to make decisions about individuals or high-stakes scenarios without additional checks is **not recommended**. --- ## Model Card Author - **Name:** Harsh Maniya - **Contact:** For questions or feedback, please open an [issue](https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis/discussions) or reach out directly via your preferred channel. - **GitHub:** [Github](https://github.com/harshhmaniya) - **LinkedIn:** [Linkedin](https://www.linkedin.com/in/harsh-maniya/) --- ## Citation If you use this model or reference the code in your research or project, please cite it as follows (adjust for your specific citation style): ``` @misc{Maniya2025IMDBBERT, title = {Fine-Tuned BERT for IMDB Sentiment Analysis}, author = {Harsh Maniya}, year = {2025}, url = {https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis} } ``` --- ### Thank You for Visiting! We hope this model helps you classify movie reviews quickly and accurately. For more details, check out the [training notebook](https://huggingface.co/harshhmaniya/fine-tuned-bert-imdb-sentiment-analysis/blob/main/imdb_reviews_bert.ipynb), experiment with the model, and share your feedback!