Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,48 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
## Warning: this model (state) is simply a plugin for the RWKV-6-7B-v2.1 model, therefore it cannot be run independently.
|
6 |
+
|
7 |
+
Please download the corresponding model at: https://huggingface.co/BlinkDL/rwkv-6-world/blob/main/RWKV-x060-World-7B-v2.1-20240507-ctx4096.pth
|
8 |
+
|
9 |
+
To understand how RWKV states work, see https://zhuanlan.zhihu.com/p/695005541 (written in Chinese).
|
10 |
+
|
11 |
+
## Some code to load this state:
|
12 |
+
|
13 |
+
```python
|
14 |
+
# please use with https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_CHAT.py and paste this code at line 59.
|
15 |
+
state = [None] * args.n_layer * 3
|
16 |
+
state_raw = torch.load("/where/you/put/bo.pth")
|
17 |
+
for i in range(args.n_layer):
|
18 |
+
dd = model.strategy[i]
|
19 |
+
dev = dd.device
|
20 |
+
atype = dd.atype
|
21 |
+
state[i*3+0] = torch.zeros(args.n_embd, dtype=atype, requires_grad=False, device=dev).contiguous()
|
22 |
+
state[i*3+1] = state_raw[f'blocks.{i}.att.time_state'].transpose(1,2).to(dtype=torch.float, device=dev).requires_grad_(False).contiguous()
|
23 |
+
state[i*3+2] = torch.zeros(args.n_embd, dtype=atype, requires_grad=False, device=dev).contiguous()
|
24 |
+
```
|
25 |
+
|
26 |
+
## Training details:
|
27 |
+
|
28 |
+
The training data consists of Bo Peng's Zhihu answers and articles (https://www.zhihu.com/people/bopengbopeng) since April 2022, as well as a tiny part of general AI conversation data (Ruozhiba, Alpaca, RWKV - self cognition, etc).
|
29 |
+
|
30 |
+
This state is trained via (https://github.com/JL-er/RWKV-PEFT) for one epoch.
|
31 |
+
|
32 |
+
## Chat example:
|
33 |
+
|
34 |
+
```
|
35 |
+
User: RWKV-7会比Mamba强吗?
|
36 |
+
|
37 |
+
Assistant: 我有一个朋友做了一个RWKV-7的Demo,测试了Mamba在RWKV-7的对线能力。结果如下表,可见RWKV-7更强。
|
38 |
+
另外,现在大家都炼多模态数据,包括动态去读文档。但是,这只适合长期记忆。而我认为,语言的精髓是对话(它可以代替各种搜索),因此把对话作为语料训练模型:
|
39 |
+
将所有互联网消息(包括公司公告、论文、博客、Github issue等等)输入模型。
|
40 |
+
然后让模型自动生成新的内容(注意这里不包括个人邮件和聊天记录等等),并且保证每次有新内容。
|
41 |
+
最后把所有互联网消息整合为一个大型知识库(类似维基百科),并且保证每条信息只出现一次。
|
42 |
+
然后让模型自动推演所有知识库的内容,看看哪些是矛盾的。如果没矛盾,输出;如果有矛盾,就让模型更新。
|
43 |
+
再用同样方法找到所有已经被删除的信息(包括删除了其中一条)。然后让模型重新生成该信息。
|
44 |
+
这样就变成了全部从互联网中学习,无任何漏洞。相当于永远在互联网上生存。
|
45 |
+
那么这个方案会彻底淘汰Mamba吗?当然不会。Mamba很强,尤其在小而细的细分领域(例如通过学习2023年前的所有数据可以强于99%论文作者),但是,对于大规模语言建模任务仍然只能压箱底。
|
46 |
+
目前唯一能和ChatGPT竞争的组队是OpenAI 和 DeepMind 的全部大科学家(因为他们都在挑RWKV六层魔法)。
|
47 |
+
```
|
48 |
+
|