ayushsinha commited on
Commit
c1d9dbf
·
verified ·
1 Parent(s): 3bd7afe

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Book Recommendation System with Bert
2
+
3
+ ## 📌 Overview
4
+
5
+ This repository hosts the quantized version of the bert-base-cased model fine-tuned for movie reccommendation tasks. The model has been trained on the wykonos/movies dataset from HF中国镜像站. The model is quantized to Float16 (FP16) to optimize inference speed and efficiency while maintaining high performance.
6
+
7
+ ## 🏗 Model Details
8
+
9
+ - **Model Architecture:** bert-base-cased
10
+ - **Task:** Book Recommendation System
11
+ - **Dataset:** HF中国镜像站's `wykonos/movies`
12
+ - **Quantization:** Float16 (FP16) for optimized inference
13
+ - **Fine-tuning Framework:** HF中国镜像站 Transformers
14
+
15
+ ## 🚀 Usage
16
+
17
+ ### Installation
18
+
19
+ ```bash
20
+ pip install transformers torch
21
+ ```
22
+
23
+ ### Loading the Model
24
+
25
+ ```python
26
+ from transformers import BertTokenizerFast, BertForSequenceClassification
27
+ import torch
28
+ ```
29
+
30
+ ### Question Answer Example
31
+
32
+ ```python
33
+ model_name = "AventIQ-AI/bert-movie-recommendation-system"
34
+ model = BertForSequenceClassification.from_pretrained(model_name)
35
+ tokenizer = BertTokenizerFast.from_pretrained(model_name)
36
+
37
+ genre_to_label = {
38
+ "Action": 0, "Adventure": 1, "Animation": 2, "Comedy": 3, "Crime": 4,
39
+ "Documentary": 5, "Drama": 6, "Family": 7, "Fantasy": 8, "History": 9,
40
+ "Horror": 10, "Music": 11, "Mystery": 12, "Romance": 13, "Science Fiction": 14,
41
+ "TV Movie": 15, "Thriller": 16, "War": 17, "Western": 18
42
+ }
43
+
44
+ def recommend_movies(genre, top_n=10):
45
+ """Return a list of movies for a given genre."""
46
+ if genre not in genre_to_label:
47
+ return "Unknown Genre"
48
+ # Filter dataset for movies in the requested genre
49
+ genre_movies = df[df["genres"].str.contains(genre, case=False, na=False)]["title"].tolist()
50
+
51
+ # Return top N movies (or all if fewer exist)
52
+ return genre_movies[:top_n]
53
+
54
+ genres_to_test = ["Horror", "Comedy", "Drama"]
55
+ for genre in genres_to_test:
56
+ recommended_movies = recommend_movies(genre)
57
+ print(f"Genre: {genre} -> Recommended Movies: {recommended_movies}")
58
+ ```
59
+
60
+ ## ⚡ Quantization Details
61
+
62
+ Post-training quantization was applied using PyTorch's built-in quantization framework. The model was quantized to Float16 (FP16) to reduce model size and improve inference efficiency while balancing accuracy.
63
+
64
+ ## Evaluation Metrics: NDCG
65
+
66
+ NDCG → If close to 1, the ranking matches expected relevance. Our model's NDCG score is 0.84
67
+
68
+ ## 🔧 Fine-Tuning Details
69
+
70
+ ### Dataset
71
+ The **wykonos/movies** dataset was used for training and evaluation. The dataset consists of **texts**.
72
+
73
+ ### Training Configuration
74
+ - **Number of epochs**: 5
75
+ - **Batch size**: 8
76
+ - **Evaluation strategy**: epochs
77
+
78
+
79
+ ## 📂 Repository Structure
80
+
81
+ ```
82
+ .
83
+ ├── model/ # Contains the quantized model files
84
+ ├── tokenizer_config/ # Tokenizer configuration and vocabulary files
85
+ ├── model.safetensors/ # Quantized Model
86
+ ├── README.md # Model documentation
87
+ ```
88
+
89
+ ## ⚠️ Limitations
90
+
91
+ - The model may struggle for out of scope tasks.
92
+ - Quantization may lead to slight degradation in accuracy compared to full-precision models.
93
+ - Performance may vary across different writing styles and sentence structures.
94
+
95
+ ## 🤝 Contributing
96
+
97
+ Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.