alphagyuu's picture
Update README.md
00c580a verified
metadata
license: apache-2.0
language:
  - ko
base_model:
  - beomi/kcbert-base
pipeline_tag: token-classification
tags:
  - Korean
  - PII
  - KoreanPII
  - PIIMasking
  - Anonymization
  - Privacy

Korean-PII-Masking-BERT

GitHub Repository: alphagyuu/Korean-PII-Masking-BERT

Korean-PII-Masking-BERT is a token classification model fine-tuned on KcBERT’s TokenClassifier using a processed version of "Korean SNS" dataset from AI-Hub.

🖥️ Python Implementation

  • Tokenizer:

    BertTokenizer.from_pretrained('beomi/kcbert-base', do_lower_case=False)
    
  • Model:

    TFBertForTokenClassification.from_pretrained('alphagyuu/Korean-PII-Masking-BertForTokenClassification', num_labels=len(tag2idx))
    
  • LabelMap:

    LabelMAP = {
      'O': 'LABEL0',
      'B-URL': 'LABEL1',
      'I-URL': 'LABEL2',
      'B-계정': 'LABEL3',
      'I-계정': 'LABEL4',
      'B-금융': 'LABEL5',
      'I-금융': 'LABEL6',
      'B-번호': 'LABEL7',
      'I-번호': 'LABEL8',
      'B-소속': 'LABEL9',
      'I-소속': 'LABEL10',
      'B-신원': 'LABEL11',
      'I-신원': 'LABEL12',
      'B-이름': 'LABEL13',
      'I-이름': 'LABEL14',
      'B-주소': 'LABEL15',
      'I-주소': 'LABEL16'
    }