Uploaded model
- Developed by: suriya7
- License: apache-2.0
- Finetuned from model : AquilaX-AI/QnA-1.5B
This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
requirements
pip install gguf
pip install transformers
inference
import time
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "suriya7/qwen-1.5b-quantized"
filename = "unsloth.Q5_K_M.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename,token="")
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename,token="")
# Define the input messages
messages = [
[
{
"role": "system",
"content": "You are Securitron, a helpful AI assistant specialized in providing accurate and professional responses. Always prioritize clarity and precision in your answers."
},
{
"role": "user",
"content": "what is ai?"
},
],
]
# Tokenize the input messages
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to("cuda")
# Initialize the TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True)
# Measure the generation time
start_time = time.time()
model.to("cuda")
# Generate text with streaming
with torch.no_grad():
model.generate(**inputs, max_new_tokens=256, streamer=streamer,do_sample=False)
# Calculate total generation time
end_time = time.time()
total_time = end_time - start_time
print(f"\nTotal Generation Time: {total_time:.2f} seconds")
- Downloads last month
- 168
Hardware compatibility
Log In
to view the estimation
4-bit
5-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for suriya7/qwen-1.5b-quantized
Base model
AquilaX-AI/QnA-1.5B