legolasyiu commited on
Commit
aa21037
·
verified ·
1 Parent(s): 1521bc3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +216 -0
README.md CHANGED
@@ -11,6 +11,222 @@ tags:
11
  - trl
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  # Uploaded model
15
 
16
  - **Developed by:** EpistemeAI
 
11
  - trl
12
  ---
13
 
14
+ # Agent LLama
15
+
16
+ Experimental and revolutionary fine-tune technique to allow LLama 3.1 8B to be agentic. It has some build-in agent features:
17
+ - search
18
+ - calculator
19
+ - ReAct. [Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)
20
+ - fine tuned ReAct for better responses
21
+
22
+ Other noticable features:
23
+ - Self learning using unsloth. (in progress)
24
+ - can be used in RAG applications
25
+ - Memory. [**please use Langchain memory , section Message persistence**](https://python.langchain.com/docs/tutorials/chatbot/)
26
+
27
+ It is perfectly use for Langchain or LLamaIndex and Ollama.
28
+
29
+ - Check out Ollama Llama Agent in out github page,
30
+ -
31
+ Context Window: 128K
32
+
33
+ # Agent LLama series
34
+ - For coding agent, please go to [EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code](https://huggingface.co/EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code)
35
+
36
+
37
+ ### Installation
38
+ ```bash
39
+ pip install --upgrade "transformers>=4.43.2"
40
+ ```
41
+
42
+ Developers can easily integrate EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K into their projects using popular libraries like Transformers and vLLM. The following sections illustrate the usage with simple hands-on examples:
43
+
44
+ Optional: to use build in tool, please add to system prompt: "Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 27 Auguest 2024\n"
45
+
46
+ #### ToT - Tree of Thought
47
+ - Use system prompt:
48
+ ```python
49
+ "Imagine three different experts are answering this question.
50
+ All experts will write down 1 step of their thinking,
51
+ then share it with the group.
52
+ Then all experts will go on to the next step, etc.
53
+ If any expert realises they're wrong at any point then they leave.
54
+ The question is..."
55
+ ```
56
+ #### ReAct
57
+ example from langchain agent - [langchain React agent](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/react/agent.py)
58
+ - Use system prompt:
59
+ ```python
60
+ """
61
+ Answer the following questions as best you can. You have access to the following tools:
62
+
63
+ {tools}
64
+
65
+ Use the following format:
66
+
67
+ Question: the input question you must answer
68
+ Thought: you should always think about what to do
69
+ Action: the action to take, should be one of [{tool_names}]
70
+ Action Input: the input to the action
71
+ Observation: the result of the action
72
+ ... (this Thought/Action/Action Input/Observation can repeat N times)
73
+ Thought: I now know the final answer
74
+ Final Answer: the final answer to the original input question
75
+
76
+ Begin!
77
+
78
+ Question: {input}
79
+ Thought:{agent_scratchpad}
80
+ """
81
+ ```
82
+
83
+ ### Conversational Use-case
84
+ #### Use with [Transformers](https://github.com/huggingface/transformers)
85
+ ##### Using `transformers.pipeline()` API , best use for 4bit for fast response.
86
+ ```python
87
+ import transformers
88
+ import torch
89
+ from langchain_community.llms import HuggingFaceEndpoint
90
+ from langchain_community.chat_models.huggingface import ChatHuggingFace
91
+
92
+ from transformers import BitsAndBytesConfig
93
+
94
+ quantization_config = BitsAndBytesConfig(
95
+ load_in_4bit=True,
96
+ bnb_4bit_quant_type="nf4",
97
+ bnb_4bit_compute_dtype="float16",
98
+ bnb_4bit_use_double_quant=True,
99
+ )
100
+
101
+ model_id = "EpistemeAI2/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code-math"
102
+ pipeline = transformers.pipeline(
103
+ "text-generation",
104
+ model=model_id,
105
+ model_kwargs={"quantization_config": quantization_config}, #for fast response. For full 16bit inference, remove this code.
106
+ device_map="auto",
107
+ )
108
+ messages = [
109
+ {"role": "system", "content": """
110
+ Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n
111
+ You are a coding assistant with expert with everything\n
112
+ Ensure any code you provide can be executed \n
113
+ with all required imports and variables defined. List the imports. Structure your answer with a description of the code solution. \n
114
+ write only the code. do not print anything else.\n
115
+ debug code if error occurs. \n
116
+ Here is the user question: {question}
117
+ """},
118
+ {"role": "user", "content": "Create a bar plot showing the market capitalization of the top 7 publicly listed companies using matplotlib"}
119
+ ]
120
+ outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
121
+ print(outputs[0]["generated_text"][-1])
122
+ ```
123
+
124
+ # Example:
125
+ Please go to Colab for sample of the code using Langchain [Colab](https://colab.research.google.com/drive/129SEHVRxlr24r73yf34BKnIHOlD3as09?authuser=1)
126
+
127
+ # Unsloth Fast
128
+
129
+ ```python
130
+ %%capture
131
+ # Installs Unsloth, Xformers (Flash Attention) and all other packages!
132
+ !pip install unsloth
133
+ # Get latest Unsloth
134
+ !pip install --upgrade --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
135
+ !pip install langchain_experimental
136
+
137
+ from unsloth import FastLanguageModel
138
+ from google.colab import userdata
139
+
140
+
141
+ # 4bit pre quantized models we support for 4x faster downloading + no OOMs.
142
+ fourbit_models = [
143
+ "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
144
+ "unsloth/gemma-7b-it-bnb-4bit",
145
+ ] # More models at https://huggingface.co/unsloth
146
+
147
+ model, tokenizer = FastLanguageModel.from_pretrained(
148
+ model_name = "EpistemeAI2/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code-math",
149
+ max_seq_length = 128000,
150
+ load_in_4bit = True,
151
+ token =userdata.get('HF_TOKEN')
152
+ )
153
+ def chatbot(query):
154
+ messages = [
155
+ {"from": "system", "value":
156
+ """
157
+ Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n
158
+ You are a coding assistant with expert with everything\n
159
+ Ensure any code you provide can be executed \n
160
+ with all required imports and variables defined. List the imports. Structure your answer with a description of the code solution. \n
161
+ write only the code. do not print anything else.\n
162
+ use ipython for search tool. \n
163
+ debug code if error occurs. \n
164
+ Here is the user question: {question}
165
+ """
166
+ },
167
+ {"from": "human", "value": query},
168
+ ]
169
+ inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt").to("cuda")
170
+
171
+ text_streamer = TextStreamer(tokenizer)
172
+ _ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048, use_cache = True)
173
+ ```
174
+
175
+
176
+
177
+ # Execute code (Make sure to use virtual environments)
178
+ ```bash
179
+ python3 -m venv env
180
+ source env/bin/activate
181
+ ```
182
+
183
+ ## Execution code responses from Llama
184
+ #### Please use execute python code function for local. For langchain, please use Python REPL() to execute code
185
+
186
+ execute code funciton locally in python:
187
+ ```python
188
+ def execute_Python_code(code):
189
+ # A string stream to capture the outputs of exec
190
+ output = io.StringIO()
191
+ try:
192
+ # Redirect stdout to the StringIO object
193
+ with contextlib.redirect_stdout(output):
194
+ # Allow imports
195
+ exec(code, globals())
196
+ except Exception as e:
197
+ # If an error occurs, capture it as part of the output
198
+ print(f"Error: {e}", file=output)
199
+ return output.getvalue()
200
+ ```
201
+
202
+ Langchain python Repl
203
+ - Install
204
+
205
+ ```bash
206
+ !pip install langchain_experimental
207
+ ```
208
+
209
+ Code:
210
+ ```python
211
+ from langchain_core.tools import Tool
212
+ from langchain_experimental.utilities import PythonREPL
213
+
214
+ python_repl = PythonREPL()
215
+
216
+ # You can create the tool to pass to an agent
217
+ repl_tool = Tool(
218
+ name="python_repl",
219
+ description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
220
+ func=python_repl.run,
221
+ )
222
+ repl_tool(outputs[0]["generated_text"][-1])
223
+ ```
224
+
225
+ # Safety inputs/ outputs procedures
226
+ Fo all inputs, please use Llama-Guard: meta-llama/Llama-Guard-3-8B for safety classification.
227
+ Go to model card [Llama-Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B)
228
+
229
+
230
  # Uploaded model
231
 
232
  - **Developed by:** EpistemeAI