EmbeddingStudio
/

query-parser-falcon-7b-instruct

@@ -5,9 +5,7 @@ base_model: tiiuae/falcon-7b-instruct
 # Model Card for the Query Parser LLM using Falcon-7B-Instruct
-[![version](https://img.shields.io/badge/version-0.0.1-red.svg)]()
-[![Python 3.9](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
-![CUDA 11.7.1](https://img.shields.io/badge/CUDA-11.7.1-green.svg)
 EmbeddingStudio is the [open-source framework](https://github.com/EulerSearch/embedding_studio/tree/main), that allows you transform a joint "Embedding Model + Vector DB" into
 a full-cycle search engine: collect clickstream -> improve search experience-> adapt embedding model and repeat out of the box.
@@ -45,9 +43,9 @@ This is only [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instru
 **Important:** Additionally, we are trying to fine-tune the Large Language Model (LLM) to not only parse unstructured search queries but also to correct spelling.
 - **Developed by EmbeddingStudio team:**
-  * Aleksandr Iudaev | [LinkedIn](https://www.linkedin.com/in/alexanderyudaev/) | [Email](mailto:[email protected]) |
-  * Andrei Kostin | [LinkedIn](https://www.linkedin.com/in/andrey-kostin/) | [Email](mailto:[email protected]) |
-  * ML Doom | `AI Assistant`
 - **Funded by EmbeddingStudio team**
 - **Model type:** Instruct Fine-Tuned Large Language Model
 - **Model task:** Zero-shot search query parsing
@@ -169,7 +167,6 @@ INSTRUCTION_TEMPLATE = """
 """
 def parse(
         query: str,
         company_category: str,
@@ -201,15 +198,15 @@ def parse(
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
 ### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
@@ -310,23 +307,51 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 #### Software
-[More Information Needed]
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
 ### Framework versions
-- PEFT 0.7.1

 # Model Card for the Query Parser LLM using Falcon-7B-Instruct
+[![version](https://img.shields.io/badge/version-0.0.1-red.svg)]() [![Python 3.9](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/) ![CUDA 11.7.1](https://img.shields.io/badge/CUDA-11.7.1-green.svg)
 EmbeddingStudio is the [open-source framework](https://github.com/EulerSearch/embedding_studio/tree/main), that allows you transform a joint "Embedding Model + Vector DB" into
 a full-cycle search engine: collect clickstream -> improve search experience-> adapt embedding model and repeat out of the box.
 **Important:** Additionally, we are trying to fine-tune the Large Language Model (LLM) to not only parse unstructured search queries but also to correct spelling.
 - **Developed by EmbeddingStudio team:**
+  * Aleksandr Iudaev [[LinkedIn](https://www.linkedin.com/in/alexanderyudaev/)] [[Email](mailto:[email protected])]
+  * Andrei Kostin [[LinkedIn](https://www.linkedin.com/in/andrey-kostin/)] [[Email](mailto:[email protected])]
+  * ML Doom [AI Assistant]
 - **Funded by EmbeddingStudio team**
 - **Model type:** Instruct Fine-Tuned Large Language Model
 - **Model task:** Zero-shot search query parsing
 """
 def parse(
         query: str,
         company_category: str,
 ## Bias, Risks, and Limitations
+### Bias
+### Risks
 ### Recommendations
 ## How to Get Started with the Model
 #### Software
+* Python 3.9+
+* CUDA 11.7.1
+* NVIDIA [Compatible Drivers](https://www.nvidia.com/download/find.aspx)
+* Torch 2.0.0
+## More Information / About us
+EmbeddingStudio is an innovative open-source framework designed to seamlessly convert a combined
+"Embedding Model + Vector DB" into a comprehensive search engine. With built-in functionalities for
+clickstream collection, continuous improvement of search experiences, and automatic adaptation of
+the embedding model, it offers an out-of-the-box solution for a full-cycle search engine.
+![Embedding Studio Chart](https://github.com/EulerSearch/embedding_studio/blob/main/assets/embedding_studio_chart.png?raw=true)
+### Features
+1. 🔄 Turn your vector database into a full-cycle search engine
+2. 🖱️ Collect users feedback like clickstream
+3. 🚀 (*) Improve search experience on-the-fly without frustrating wait times
+4. 📊 (*) Monitor your search quality
+5. 🎯 Improve your embedding model through an iterative metric fine-tuning procedure
+6. 🆕 (*) Use the new version of the embedding model for inference
+(*) - features in development
+EmbeddingStudio is highly customizable, so you can bring your own:
+1. Data source
+2. Vector database
+3. Clickstream database
+4. Embedding model
+For more details visit [GitHub Repo](https://github.com/EulerSearch/embedding_studio/tree/main).
+## Model Card Authors and Contact
+* Aleksandr Iudaev [[LinkedIn](https://www.linkedin.com/in/alexanderyudaev/)] [[Email](mailto:[email protected])]
+* Andrei Kostin [[LinkedIn](https://www.linkedin.com/in/andrey-kostin/)] [[Email](mailto:[email protected])]
+* ML Doom [AI Assistant]
 ### Framework versions
+- PEFT 0.5.0
+- Datasets 2.16.1
+- BitsAndBytes 0.41.0
+- PyTorch 2.0.0
+- Transformers 4.36.2
+- TRL 0.7.7