Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,222 @@ tags:
|
|
11 |
- trl
|
12 |
---
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
# Uploaded model
|
15 |
|
16 |
- **Developed by:** EpistemeAI
|
|
|
11 |
- trl
|
12 |
---
|
13 |
|
14 |
+
# Agent LLama
|
15 |
+
|
16 |
+
Experimental and revolutionary fine-tune technique to allow LLama 3.1 8B to be agentic. It has some build-in agent features:
|
17 |
+
- search
|
18 |
+
- calculator
|
19 |
+
- ReAct. [Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)
|
20 |
+
- fine tuned ReAct for better responses
|
21 |
+
|
22 |
+
Other noticable features:
|
23 |
+
- Self learning using unsloth. (in progress)
|
24 |
+
- can be used in RAG applications
|
25 |
+
- Memory. [**please use Langchain memory , section Message persistence**](https://python.langchain.com/docs/tutorials/chatbot/)
|
26 |
+
|
27 |
+
It is perfectly use for Langchain or LLamaIndex and Ollama.
|
28 |
+
|
29 |
+
- Check out Ollama Llama Agent in out github page,
|
30 |
+
-
|
31 |
+
Context Window: 128K
|
32 |
+
|
33 |
+
# Agent LLama series
|
34 |
+
- For coding agent, please go to [EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code](https://huggingface.co/EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code)
|
35 |
+
|
36 |
+
|
37 |
+
### Installation
|
38 |
+
```bash
|
39 |
+
pip install --upgrade "transformers>=4.43.2"
|
40 |
+
```
|
41 |
+
|
42 |
+
Developers can easily integrate EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K into their projects using popular libraries like Transformers and vLLM. The following sections illustrate the usage with simple hands-on examples:
|
43 |
+
|
44 |
+
Optional: to use build in tool, please add to system prompt: "Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 27 Auguest 2024\n"
|
45 |
+
|
46 |
+
#### ToT - Tree of Thought
|
47 |
+
- Use system prompt:
|
48 |
+
```python
|
49 |
+
"Imagine three different experts are answering this question.
|
50 |
+
All experts will write down 1 step of their thinking,
|
51 |
+
then share it with the group.
|
52 |
+
Then all experts will go on to the next step, etc.
|
53 |
+
If any expert realises they're wrong at any point then they leave.
|
54 |
+
The question is..."
|
55 |
+
```
|
56 |
+
#### ReAct
|
57 |
+
example from langchain agent - [langchain React agent](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/react/agent.py)
|
58 |
+
- Use system prompt:
|
59 |
+
```python
|
60 |
+
"""
|
61 |
+
Answer the following questions as best you can. You have access to the following tools:
|
62 |
+
|
63 |
+
{tools}
|
64 |
+
|
65 |
+
Use the following format:
|
66 |
+
|
67 |
+
Question: the input question you must answer
|
68 |
+
Thought: you should always think about what to do
|
69 |
+
Action: the action to take, should be one of [{tool_names}]
|
70 |
+
Action Input: the input to the action
|
71 |
+
Observation: the result of the action
|
72 |
+
... (this Thought/Action/Action Input/Observation can repeat N times)
|
73 |
+
Thought: I now know the final answer
|
74 |
+
Final Answer: the final answer to the original input question
|
75 |
+
|
76 |
+
Begin!
|
77 |
+
|
78 |
+
Question: {input}
|
79 |
+
Thought:{agent_scratchpad}
|
80 |
+
"""
|
81 |
+
```
|
82 |
+
|
83 |
+
### Conversational Use-case
|
84 |
+
#### Use with [Transformers](https://github.com/huggingface/transformers)
|
85 |
+
##### Using `transformers.pipeline()` API , best use for 4bit for fast response.
|
86 |
+
```python
|
87 |
+
import transformers
|
88 |
+
import torch
|
89 |
+
from langchain_community.llms import HuggingFaceEndpoint
|
90 |
+
from langchain_community.chat_models.huggingface import ChatHuggingFace
|
91 |
+
|
92 |
+
from transformers import BitsAndBytesConfig
|
93 |
+
|
94 |
+
quantization_config = BitsAndBytesConfig(
|
95 |
+
load_in_4bit=True,
|
96 |
+
bnb_4bit_quant_type="nf4",
|
97 |
+
bnb_4bit_compute_dtype="float16",
|
98 |
+
bnb_4bit_use_double_quant=True,
|
99 |
+
)
|
100 |
+
|
101 |
+
model_id = "EpistemeAI2/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code-math"
|
102 |
+
pipeline = transformers.pipeline(
|
103 |
+
"text-generation",
|
104 |
+
model=model_id,
|
105 |
+
model_kwargs={"quantization_config": quantization_config}, #for fast response. For full 16bit inference, remove this code.
|
106 |
+
device_map="auto",
|
107 |
+
)
|
108 |
+
messages = [
|
109 |
+
{"role": "system", "content": """
|
110 |
+
Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n
|
111 |
+
You are a coding assistant with expert with everything\n
|
112 |
+
Ensure any code you provide can be executed \n
|
113 |
+
with all required imports and variables defined. List the imports. Structure your answer with a description of the code solution. \n
|
114 |
+
write only the code. do not print anything else.\n
|
115 |
+
debug code if error occurs. \n
|
116 |
+
Here is the user question: {question}
|
117 |
+
"""},
|
118 |
+
{"role": "user", "content": "Create a bar plot showing the market capitalization of the top 7 publicly listed companies using matplotlib"}
|
119 |
+
]
|
120 |
+
outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
|
121 |
+
print(outputs[0]["generated_text"][-1])
|
122 |
+
```
|
123 |
+
|
124 |
+
# Example:
|
125 |
+
Please go to Colab for sample of the code using Langchain [Colab](https://colab.research.google.com/drive/129SEHVRxlr24r73yf34BKnIHOlD3as09?authuser=1)
|
126 |
+
|
127 |
+
# Unsloth Fast
|
128 |
+
|
129 |
+
```python
|
130 |
+
%%capture
|
131 |
+
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
|
132 |
+
!pip install unsloth
|
133 |
+
# Get latest Unsloth
|
134 |
+
!pip install --upgrade --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
|
135 |
+
!pip install langchain_experimental
|
136 |
+
|
137 |
+
from unsloth import FastLanguageModel
|
138 |
+
from google.colab import userdata
|
139 |
+
|
140 |
+
|
141 |
+
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
|
142 |
+
fourbit_models = [
|
143 |
+
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
|
144 |
+
"unsloth/gemma-7b-it-bnb-4bit",
|
145 |
+
] # More models at https://huggingface.co/unsloth
|
146 |
+
|
147 |
+
model, tokenizer = FastLanguageModel.from_pretrained(
|
148 |
+
model_name = "EpistemeAI2/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code-math",
|
149 |
+
max_seq_length = 128000,
|
150 |
+
load_in_4bit = True,
|
151 |
+
token =userdata.get('HF_TOKEN')
|
152 |
+
)
|
153 |
+
def chatbot(query):
|
154 |
+
messages = [
|
155 |
+
{"from": "system", "value":
|
156 |
+
"""
|
157 |
+
Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n
|
158 |
+
You are a coding assistant with expert with everything\n
|
159 |
+
Ensure any code you provide can be executed \n
|
160 |
+
with all required imports and variables defined. List the imports. Structure your answer with a description of the code solution. \n
|
161 |
+
write only the code. do not print anything else.\n
|
162 |
+
use ipython for search tool. \n
|
163 |
+
debug code if error occurs. \n
|
164 |
+
Here is the user question: {question}
|
165 |
+
"""
|
166 |
+
},
|
167 |
+
{"from": "human", "value": query},
|
168 |
+
]
|
169 |
+
inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt").to("cuda")
|
170 |
+
|
171 |
+
text_streamer = TextStreamer(tokenizer)
|
172 |
+
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048, use_cache = True)
|
173 |
+
```
|
174 |
+
|
175 |
+
|
176 |
+
|
177 |
+
# Execute code (Make sure to use virtual environments)
|
178 |
+
```bash
|
179 |
+
python3 -m venv env
|
180 |
+
source env/bin/activate
|
181 |
+
```
|
182 |
+
|
183 |
+
## Execution code responses from Llama
|
184 |
+
#### Please use execute python code function for local. For langchain, please use Python REPL() to execute code
|
185 |
+
|
186 |
+
execute code funciton locally in python:
|
187 |
+
```python
|
188 |
+
def execute_Python_code(code):
|
189 |
+
# A string stream to capture the outputs of exec
|
190 |
+
output = io.StringIO()
|
191 |
+
try:
|
192 |
+
# Redirect stdout to the StringIO object
|
193 |
+
with contextlib.redirect_stdout(output):
|
194 |
+
# Allow imports
|
195 |
+
exec(code, globals())
|
196 |
+
except Exception as e:
|
197 |
+
# If an error occurs, capture it as part of the output
|
198 |
+
print(f"Error: {e}", file=output)
|
199 |
+
return output.getvalue()
|
200 |
+
```
|
201 |
+
|
202 |
+
Langchain python Repl
|
203 |
+
- Install
|
204 |
+
|
205 |
+
```bash
|
206 |
+
!pip install langchain_experimental
|
207 |
+
```
|
208 |
+
|
209 |
+
Code:
|
210 |
+
```python
|
211 |
+
from langchain_core.tools import Tool
|
212 |
+
from langchain_experimental.utilities import PythonREPL
|
213 |
+
|
214 |
+
python_repl = PythonREPL()
|
215 |
+
|
216 |
+
# You can create the tool to pass to an agent
|
217 |
+
repl_tool = Tool(
|
218 |
+
name="python_repl",
|
219 |
+
description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
|
220 |
+
func=python_repl.run,
|
221 |
+
)
|
222 |
+
repl_tool(outputs[0]["generated_text"][-1])
|
223 |
+
```
|
224 |
+
|
225 |
+
# Safety inputs/ outputs procedures
|
226 |
+
Fo all inputs, please use Llama-Guard: meta-llama/Llama-Guard-3-8B for safety classification.
|
227 |
+
Go to model card [Llama-Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B)
|
228 |
+
|
229 |
+
|
230 |
# Uploaded model
|
231 |
|
232 |
- **Developed by:** EpistemeAI
|