Update README.md
Browse files
README.md
CHANGED
@@ -93,6 +93,58 @@ The output will be a list of recognized entities with their entity type, score,
|
|
93 |
]
|
94 |
```
|
95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
**Use Cases:**
|
97 |
- Extracting clinical information from unstructured text in medical records.
|
98 |
- Structuring data for downstream biomedical research or applications.
|
|
|
93 |
]
|
94 |
```
|
95 |
|
96 |
+
In some cases, we are getting multiple same entity groups so to join please use below code:
|
97 |
+
|
98 |
+
```python
|
99 |
+
|
100 |
+
def merge_consecutive_entities(entities):
|
101 |
+
entities = sorted(entities, key=lambda x: x['start'])
|
102 |
+
merged_entities = []
|
103 |
+
current_entity = None
|
104 |
+
|
105 |
+
for entity in entities:
|
106 |
+
if current_entity is None:
|
107 |
+
current_entity = entity
|
108 |
+
elif (
|
109 |
+
entity['entity_group'] == current_entity['entity_group'] and
|
110 |
+
(entity['start'] <= current_entity['end'])
|
111 |
+
):
|
112 |
+
new_word = entity['word']
|
113 |
+
if not current_entity['word'].endswith(new_word):
|
114 |
+
current_entity['word'] += " " + new_word
|
115 |
+
current_entity['end'] = max(current_entity['end'], entity['end'])
|
116 |
+
current_entity['score'] = (current_entity['score'] + entity['score']) / 2
|
117 |
+
else:
|
118 |
+
merged_entities.append(current_entity)
|
119 |
+
current_entity = entity
|
120 |
+
if current_entity:
|
121 |
+
merged_entities.append(current_entity)
|
122 |
+
|
123 |
+
return merged_entities
|
124 |
+
|
125 |
+
|
126 |
+
from transformers import pipeline
|
127 |
+
|
128 |
+
# Load the model
|
129 |
+
model_path = "Helios9/BIOMed_NER"
|
130 |
+
pipe = pipeline(
|
131 |
+
task="token-classification",
|
132 |
+
model=model_path,
|
133 |
+
tokenizer=model_path,
|
134 |
+
aggregation_strategy="simple"
|
135 |
+
)
|
136 |
+
|
137 |
+
# Test the pipeline
|
138 |
+
text = ("A 48-year-old female presented with vaginal bleeding and abnormal Pap smears. "
|
139 |
+
"Upon diagnosis of invasive non-keratinizing SCC of the cervix, she underwent a radical "
|
140 |
+
"hysterectomy with salpingo-oophorectomy which demonstrated positive spread to the pelvic "
|
141 |
+
"lymph nodes and the parametrium.")
|
142 |
+
result = pipe(text)
|
143 |
+
final_result=merge_consecutive_entities(result)
|
144 |
+
print(final_result)
|
145 |
+
|
146 |
+
```
|
147 |
+
|
148 |
**Use Cases:**
|
149 |
- Extracting clinical information from unstructured text in medical records.
|
150 |
- Structuring data for downstream biomedical research or applications.
|