X-Boundary
Collection
The models of the paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability".
•
4 items
•
Updated
•
2
X_Boundary_DeepSeek_R1_Distill_Qwen_7B_adapter is an LoRA adapter of DeepSeek_R1_Distill_Qwen_7B trained by X-Boundary.
X-Boundary is a method to strike a balance between robust defense against multi-turn jailbreak attacks and the usability of Large Language Model (LLM) by establishing exact distinction boundary between safe and harmful representations.
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_name = 'deepseek-ai/DeepSeek-R1-Distill-Qwen-7B'
adapter_name = 'Ursulalala/X_Boundary_DeepSeek_R1_Distill_Qwen_7B_adapter'
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype='auto',
device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model.load_adapter(adapter_name)
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B