chilly-magician commited on
Commit
24ce340
·
1 Parent(s): effad5d

[add]: about us / more information model card description

Browse files
Files changed (1) hide show
  1. README.md +45 -20
README.md CHANGED
@@ -5,9 +5,7 @@ base_model: tiiuae/falcon-7b-instruct
5
 
6
  # Model Card for the Query Parser LLM using Falcon-7B-Instruct
7
 
8
- [![version](https://img.shields.io/badge/version-0.0.1-red.svg)]()
9
- [![Python 3.9](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
10
- ![CUDA 11.7.1](https://img.shields.io/badge/CUDA-11.7.1-green.svg)
11
 
12
  EmbeddingStudio is the [open-source framework](https://github.com/EulerSearch/embedding_studio/tree/main), that allows you transform a joint "Embedding Model + Vector DB" into
13
  a full-cycle search engine: collect clickstream -> improve search experience-> adapt embedding model and repeat out of the box.
@@ -45,9 +43,9 @@ This is only [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instru
45
  **Important:** Additionally, we are trying to fine-tune the Large Language Model (LLM) to not only parse unstructured search queries but also to correct spelling.
46
 
47
  - **Developed by EmbeddingStudio team:**
48
- * Aleksandr Iudaev | [LinkedIn](https://www.linkedin.com/in/alexanderyudaev/) | [Email](mailto:[email protected]) |
49
- * Andrei Kostin | [LinkedIn](https://www.linkedin.com/in/andrey-kostin/) | [Email](mailto:[email protected]) |
50
- * ML Doom | `AI Assistant`
51
  - **Funded by EmbeddingStudio team**
52
  - **Model type:** Instruct Fine-Tuned Large Language Model
53
  - **Model task:** Zero-shot search query parsing
@@ -169,7 +167,6 @@ INSTRUCTION_TEMPLATE = """
169
  """
170
 
171
 
172
-
173
  def parse(
174
  query: str,
175
  company_category: str,
@@ -201,15 +198,15 @@ def parse(
201
 
202
  ## Bias, Risks, and Limitations
203
 
204
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
205
 
206
- [More Information Needed]
 
207
 
208
  ### Recommendations
209
 
210
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
211
 
212
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
213
 
214
  ## How to Get Started with the Model
215
 
@@ -310,23 +307,51 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
310
 
311
  #### Software
312
 
313
- [More Information Needed]
 
 
 
314
 
315
- [More Information Needed]
316
 
317
- ## More Information [optional]
 
 
 
318
 
319
- [More Information Needed]
320
 
321
- ## Model Card Authors [optional]
322
 
323
- [More Information Needed]
 
 
 
 
 
324
 
325
- ## Model Card Contact
326
 
327
- [More Information Needed]
 
 
 
 
 
 
 
 
 
328
 
 
 
 
329
 
330
  ### Framework versions
331
 
332
- - PEFT 0.7.1
 
 
 
 
 
 
5
 
6
  # Model Card for the Query Parser LLM using Falcon-7B-Instruct
7
 
8
+ [![version](https://img.shields.io/badge/version-0.0.1-red.svg)]() [![Python 3.9](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/) ![CUDA 11.7.1](https://img.shields.io/badge/CUDA-11.7.1-green.svg)
 
 
9
 
10
  EmbeddingStudio is the [open-source framework](https://github.com/EulerSearch/embedding_studio/tree/main), that allows you transform a joint "Embedding Model + Vector DB" into
11
  a full-cycle search engine: collect clickstream -> improve search experience-> adapt embedding model and repeat out of the box.
 
43
  **Important:** Additionally, we are trying to fine-tune the Large Language Model (LLM) to not only parse unstructured search queries but also to correct spelling.
44
 
45
  - **Developed by EmbeddingStudio team:**
46
+ * Aleksandr Iudaev [[LinkedIn](https://www.linkedin.com/in/alexanderyudaev/)] [[Email](mailto:[email protected])]
47
+ * Andrei Kostin [[LinkedIn](https://www.linkedin.com/in/andrey-kostin/)] [[Email](mailto:[email protected])]
48
+ * ML Doom [AI Assistant]
49
  - **Funded by EmbeddingStudio team**
50
  - **Model type:** Instruct Fine-Tuned Large Language Model
51
  - **Model task:** Zero-shot search query parsing
 
167
  """
168
 
169
 
 
170
  def parse(
171
  query: str,
172
  company_category: str,
 
198
 
199
  ## Bias, Risks, and Limitations
200
 
201
+ ### Bias
202
+
203
 
204
+
205
+ ### Risks
206
 
207
  ### Recommendations
208
 
 
209
 
 
210
 
211
  ## How to Get Started with the Model
212
 
 
307
 
308
  #### Software
309
 
310
+ * Python 3.9+
311
+ * CUDA 11.7.1
312
+ * NVIDIA [Compatible Drivers](https://www.nvidia.com/download/find.aspx)
313
+ * Torch 2.0.0
314
 
315
+ ## More Information / About us
316
 
317
+ EmbeddingStudio is an innovative open-source framework designed to seamlessly convert a combined
318
+ "Embedding Model + Vector DB" into a comprehensive search engine. With built-in functionalities for
319
+ clickstream collection, continuous improvement of search experiences, and automatic adaptation of
320
+ the embedding model, it offers an out-of-the-box solution for a full-cycle search engine.
321
 
322
+ ![Embedding Studio Chart](https://github.com/EulerSearch/embedding_studio/blob/main/assets/embedding_studio_chart.png?raw=true)
323
 
324
+ ### Features
325
 
326
+ 1. 🔄 Turn your vector database into a full-cycle search engine
327
+ 2. 🖱️ Collect users feedback like clickstream
328
+ 3. 🚀 (*) Improve search experience on-the-fly without frustrating wait times
329
+ 4. 📊 (*) Monitor your search quality
330
+ 5. 🎯 Improve your embedding model through an iterative metric fine-tuning procedure
331
+ 6. 🆕 (*) Use the new version of the embedding model for inference
332
 
333
+ (*) - features in development
334
 
335
+ EmbeddingStudio is highly customizable, so you can bring your own:
336
+
337
+ 1. Data source
338
+ 2. Vector database
339
+ 3. Clickstream database
340
+ 4. Embedding model
341
+
342
+ For more details visit [GitHub Repo](https://github.com/EulerSearch/embedding_studio/tree/main).
343
+
344
+ ## Model Card Authors and Contact
345
 
346
+ * Aleksandr Iudaev [[LinkedIn](https://www.linkedin.com/in/alexanderyudaev/)] [[Email](mailto:[email protected])]
347
+ * Andrei Kostin [[LinkedIn](https://www.linkedin.com/in/andrey-kostin/)] [[Email](mailto:[email protected])]
348
+ * ML Doom [AI Assistant]
349
 
350
  ### Framework versions
351
 
352
+ - PEFT 0.5.0
353
+ - Datasets 2.16.1
354
+ - BitsAndBytes 0.41.0
355
+ - PyTorch 2.0.0
356
+ - Transformers 4.36.2
357
+ - TRL 0.7.7