Rasa 入坑指南一：初识 Rasa

作者52nlp

8 月 27, 2019 #bot, #Chatbot, #Rasa, #Rasa Core, #Rasa NLU, #多轮对话管理, #对话管理, #对话系统, #意图识别, #文本分类, #短文本分类, #聊天机器人, #自然语言处理

最近对 Rasa 产生了浓厚的兴趣，准备用Rasa打磨一下聊天机器人，所以做了一些调研和学习，准备记录一下，这是第一篇，感兴趣的同学可以参考。

Rasa是一套开源机器学习框架，用于构建基于上下文的AI小助手和聊天机器人。Rasa有两个主要模块：Rasa NLU 用于对用户消息内容的语义理解；Rasa Core 用于对话管理（Dialogue management）。Rasa官方还提供了一套交互工具 RasaX 帮助用户提升和部署由Rasa框架构建的AI小助手和聊天机器人。

学习一套东西最好的方法是从官方文档开始，Rasa官方文档相当贴心，我们从 Rasa User Guide 走起。

一、安装Rasa及RasaX

我是在Ubuntu16.04, Python3 的 virtualenv 环境下测试安装的：

virtualenv -p python3 venv
source venv/bin/activate
pip install rasa-x --extra-index-url https://pypi.rasa.com/simple

如果一切正常，rasa 及 rasa x 将同时被安装，如果你不希望使用 RasaX，那么安装时直接"pip install rasa"即可，当然还可以继续安装 Rasa NLU 文本分析时所需的一些依赖，此处暂时忽略。

二、运行官方示例

Rasa 官方 tutorial 示例相当贴心，即使你没有安装rasa，也可以在这个页面通过浏览器运行示例代码，如果已经安装了，可以在自己的电脑上通过命令行follow整个流程。

1. 创建默认的初始项目

在终端运行：

rasa init --no-prompt

这个过程将有一个很快速的 Rasa 相关模型训练过程展示，最终提示：

...
NLU model training completed.
Your Rasa model is trained and saved at '/home/textminer/rasa/default/models/20190821-205211.tar.gz'.
If you want to speak to the assistant, run 'rasa shell' at any time inside the project directory.

如果不加 --no-prompt，会有几个问题提示。你也可以直接通过浏览器在官方页面执行“run”按钮，结果是这样的：

这个命令将在当前目录下新建以下文件：

`__init__.py`	空文件
`actions.py`	可以自定义 actions 的代码文件
`config.yml` ‘*’	Rasa NLU 和 Rasa Core 的配置文件
`credentials.yml`	定义和其他服务连接的一些细节，例如rasa api接口
`data/nlu.md` ‘*’	Rasa NLU 的训练数据
`data/stories.md` ‘*’	Rasa stories 数据
`domain.yml` ‘*’	Rasa domain 文件
`endpoints.yml`	和外部消息服务对接的 endpoins 细则，例如 fb messenger
`models/<timestamp>.tar.gz`	初始训练的模型数据

其中标志有 ‘*’ 的文件是比较重要的文件，以下我们来详细的了解。

2. NLU训练数据

Rasa NLU 是核心模块之一，NLU 是英文 Natural Language Understanding 的简称，也就是自然语言理解，这个模块用于对用户消息内容进行语义理解，并将结果转换成结构化的数据。在 Rasa 这里，需要提供一份训练数据，Rasa NLU 会基于这份数据进行模型训练，然后通过模型对用户消息进行语义理解，主要是意图识别和槽值提取，我们来看看这份NLU训练数据样例是什么样的：

## intent:greet
- hey
- hello
- hi
- good morning
- good evening
- hey there

## intent:goodbye
- bye
- goodbye
- see you around
- see you later

## intent:affirm
- yes
- indeed
- of course
- that sounds good
- correct
...

也可以在官方的页面直接操作，观察一下样例训练数据：

其中以 ## 开头的行就是用户定义的 intents(意图），下面是一组有相同意图的消息内容。Rasa NLU 的工作就是当用户发送新的消息内容时正确预测该消息的意图，给AI小助手使用。

3. 配置文件
这里面主要定义了模型要用到的 Rasa NLU 和 Rasa Core 组件，我们来看一下官方示例这个配置文件，这里面 NLU 模型将使用 supervised_embeddings pipeline，关于这些，我们以后再详细了解：

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline: supervised_embeddings

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: KerasPolicy
  - name: MappingPolicy

4. Stories

对话管理（dialogue management）是对话系统或者聊天机器人的核心，在 Rasa 中由 Rasa Core 负责，而这部分的训练数据在Rasa 中由 Stories 提供。Stories可以理解为对话的场景流程，一个 story 是一个用户和AI小助手之间真实的对话，这里面包含了可以反映用户输入（信息）的意图和实体以及小助手在回复中应该采取的 action（行动）。以下是一个简单的对话例子，用户输入"hello"，小助手也回复"hello"，在 Rasa story 中看起来是这样的：

## story1
* greet
   - utter_greet

以‘-’开头的行是小助手的 actions ，这里 actions 是将要返回给用户的消息，例如 utter_greet，但是一般情况下，一个action 可以包含任何事情，例如调用一个API，或者和外部世界交互。

我们再看一下这份默认样例中生成的 story 文件：

5. Domain

Domain 可以理解为机器的知识库，其中定义了意图（intents)，动作（actions)，以及对应动作所反馈的内容模板（templates)，例如它能预测的用户意图，它可以处理的 actions，以及对应 actions 的响应内容。为AI小助手准备的 domain 存储在 domain.yml 文件中，可以观察一下这份样例数据：

intents:
  - greet
  - goodbye
  - affirm
  - deny
  - mood_great
  - mood_unhappy

actions:
- utter_greet
- utter_cheer_up
- utter_did_that_help
- utter_happy
- utter_goodbye

templates:
  utter_greet:
  - text: "Hey! How are you?"

  utter_cheer_up:
  - text: "Here is something to cheer you up:"
    image: "https://i.imgur.com/nGF1K8f.jpg"

  utter_did_that_help:
  - text: "Did that help you?"

  utter_happy:
  - text: "Great carry on!"

  utter_goodbye:
  - text: "Bye"

这些将通过 Rasa Core 管理，Rasa Core 的核心工作就是在对话的每一步选择正确的 action 去执行。在这个例子中，actions 是简单的向用户发送一条消息，这些 actions 定义在domain中，以 utter_开头，AI小助手将会根据内容模板（templates) 回复消息。

6. 训练模型（Train a Model）
如果添加了 NLU 或者 Core 数据，或者修改了domain和配置文件，需要重新训练模型，用下面的这条命令即可，这个命令将调用Rasa Core或者NLU的训练函数以及在 models/ 目录下存储训练模型。这里没有任何改动，所以结果是这个样子的：

7. 对话

假设以上所有的流程正常结束，就可以直接通过如下命令和Rasa示例中训练的小助手进行简单的对话了：

rasa shell

结果如下：

2019-08-25 10:31:32 INFO     root  - Generating grammar tables from /usr/lib/python3.6/lib2to3/Grammar.txt
2019-08-25 10:31:32 INFO     root  - Generating grammar tables from /usr/lib/python3.6/lib2to3/PatternGrammar.txt
2019-08-25 10:31:33 INFO     root  - Connecting to channel 'cmdline' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2019-08-25 10:31:33 INFO     root  - Starting Rasa server on http://localhost:5005
Bot loaded. Type a message and press enter (use '/stop' to exit): 
Your input ->  hi                                                                             
Hey! How are you?
Your input ->  not good                                                                       
Here is something to cheer you up:
Image: https://i.imgur.com/nGF1K8f.jpg
Did that help you?
Your input ->  yes                                                                            
Great carry on!

三、定制一个极简可运行的中文对话样例

了解了上面的流程，现在可以基于 Rasa 定制一个极简的可运行的中文对话样例，这个样例主要参考了《Rasa使用指南01》里的例子，在操作上略有不同：

其中__init__.py是空文件、config.py 可以从直接从官方示例中拷贝，由于这个示例是极简示例，甚至都不需要修改其中的language:en 配置，接下来，我们依次看一下nlu.md，stories.md 和 domain.yml 里数据：

nlu数据里面主要定义了3种意图：

cat data/nlu.md 
## intent:greet
- 你好
- 早上好
- 中午好
- 晚上好

## intent:mood_happy
- 很好
- 不错
- 我很好


## intent:mood_unhappy
- 很难过
- 糟糕极了

stories里面设计了对话场景：用户问好 -> 机器问用户今天过得怎么样 -> 用户反馈情绪 -> 机器根据不同的情绪进行回复，这里包含两个流程，一个正面情绪的流程与一个负面情绪的流程，因此也需要编写两个story，所以stories数据如下：

cat data/stories.md 
## story_happy
* greet
  - utter_greet
* mood_happy
  - utter_happy

## story_unhappy
* greet
  - utter_greet
* mood_unhappy
  - utter_unhappy

domain 包含了整个对话场景下的意图，动作，以及对应动作所反馈的内容模板：

  
cat domain.yml                                  
intents:
  - greet
  - mood_happy
  - mood_unhappy

actions:
  - utter_greet
  - utter_happy
  - utter_unhappy

templates:
  utter_greet:
  - text: "你好，今天过得如何"

  utter_happy:
  - text: "那很不错"

  utter_unhappy:
  - text: "发生了什么事，可以说给我吗？"

现在可以用命令"rasa train"训练模型了，训练完毕后，模型文件会存储在models目录下。接下来，我们通过 "rasa shell nlu"命令看一下 nlu 输出的结构化数据：

NLU model loaded. Type a message and press enter to parse it.
Next message:
你好
{
  "intent": {
    "name": "greet",
    "confidence": 0.9552139043807983
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "greet",
      "confidence": 0.9552139043807983
    },
    {
      "name": "mood_unhappy",
      "confidence": 0.09797228127717972
    },
    {
      "name": "mood_happy",
      "confidence": 0.0
    }
  ],
  "text": "你好"
}
Next message:
糟糕极了
{
  "intent": {
    "name": "mood_unhappy",
    "confidence": 0.9557749032974243
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "mood_unhappy",
      "confidence": 0.9557749032974243
    },
    {
      "name": "mood_happy",
      "confidence": 0.1225115954875946
    },
    {
      "name": "greet",
      "confidence": 0.0
    }
  ],
  "text": "糟糕极了"
}
Next message:
感觉不好
{
  "intent": {
    "name": null,
    "confidence": 0.0
  },
  "entities": [],
  "intent_ranking": [],
  "text": "感觉不好"
}

通过最后一个case可以看出，这个例子由于对中文没有做任何相关的预处理，另外数据量非常少，所以模型除了可以记住nlu里已有的数据外，对于新信息处理的能力几乎为零，所以在接下来 rasa shell 对话中，我们只能用固定的case进行测试，以下分别是正向情绪交流过程和负向情绪交流过程的case：

这篇初识Rasa就到这里结束了，其实我心里还有一些疑问，不过带着这些疑问，接下来，我们将深度探索Rasa，基于Rasa打造中文对话系统。最后推荐以下这些参考资料，可以备用参考。

参考资料：
Rasa Tutorial
Rasa介绍对话系统、产品与技术
 基于RASA的task-orient对话系统解析（一）
基于RASA的task-orient对话系统解析（二）——对话管理核心模块
 Rasa使用指南01
Rasa使用指南02
rasa对话系统踩坑记系列
 用Rasa NLU构建自己的中文NLU系统
 基于rasa的对话系统搭建（上）
rasa 中文聊天机器人
 使用 Rasa NLU 构建一个中文 ChatBot

注：原创文章，转载请注明出处及保留链接“我爱自然语言处理”：https://www.52nlp.cn

本文链接地址：Rasa入坑指南一：初识Rasa
https://www.52nlp.cn/?p=12150

作者 52nlp

LLm 自然语言处理

《Rasa 入坑指南一：初识 Rasa》有6条评论

邵冰迪说道：

2019年09月13号 15:19

你好我刚刚入门，老师说要搭建这个可以分享下源码吗？发到我邮箱就行谢谢啦

[回复]
52nlp 回复:
15 9 月, 2019 at 09:06
额，Rasa本身就是开源的。。。

[回复]
魏说道：

2020年01月8号 14:52

你好我rasa shell 完随便输入点东西发送一直报Exception occurred while handling uri: 'http://localhost:5005/webhooks/rest/webhook?stream=true&token='
Traceback (most recent call last):
File "c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\sanic\app.py", line 946, in handle_request
request, request_name=name
TypeError: _run_request_middleware() got an unexpected keyword argument 'request_name'
Exception occurred in one of response middleware handlers
Traceback (most recent call last):
File "c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\sanic\app.py", line 1017, in handle_request
request, response, request_name=name
TypeError: _run_response_middleware() got an unexpected keyword argument 'request_name'
2020-01-08 14:52:07 ERROR asyncio - Task exception was never retrieved
future: <Task finished coro=<configure_app..run_cmdline_io() done, defined at c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\rasa\core\run.py:124> exception=ClientResponseError(RequestInfo(url=URL('http://localhost:5005/webhooks/rest/webhook?stream=true&token='), method='POST', headers=, real_url=URL('http://localhost:5005/webhooks/rest/webhook?stream=true&token=')), (), status=500, message='Internal Server Error', headers=)>
Traceback (most recent call last):
File "c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\rasa\core\run.py", line 128, in run_cmdline_io
server_url=constants.DEFAULT_SERVER_FORMAT.format("http", port)
File "c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\rasa\core\channels\console.py", line 140, in record_messages
async for response in bot_responses:
File "c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\rasa\core\channels\console.py", line 104, in send_message_receive_stream
async with session.post(url, json=payload, raise_for_status=True) as resp:
File "c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\aiohttp\client.py", line 1012, in __aenter__
self._resp = await self._coro
File "c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\aiohttp\client.py", line 588, in _request
resp.raise_for_status()
File "c:\users\40896\appdata\local\programs\python\python36\lib\site-packages\aiohttp\client_reqrep.py", line 946, in raise_for_status
headers=self.headers)
aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('http://localhost:5005/webhooks/rest/webhook?stream=true&token=')

[回复]
52nlp 回复:
12 1 月, 2020 at 19:00
这个不太确定是什么问题，你其他流程都ok吗？

[回复]
leejm说道：

2020年04月10号 15:06

Traceback (most recent call last):
File "d:\anaconda3.7\envs\rasa\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "d:\anaconda3.7\envs\rasa\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\anaconda3.7\envs\rasa\Scripts\rasa.exe\__main__.py", line 7, in
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\__main__.py", line 70, in main
cmdline_arguments.func(cmdline_arguments)
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\cli\train.py", line 69, in train
kwargs=extract_additional_arguments(args),
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\train.py", line 48, in train
kwargs=kwargs,
File "d:\anaconda3.7\envs\rasa\lib\asyncio\base_events.py", line 587, in run_until_complete
return future.result()
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\train.py", line 91, in train_async
training_files, skill_imports
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\data.py", line 67, in get_core_nlu_directories
story_files, nlu_data_files = get_core_nlu_files(paths, skill_imports)
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\data.py", line 114, in get_core_nlu_files
path, skill_imports
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\data.py", line 139, in _find_core_nlu_files_in_directory
if _is_nlu_file(full_path):
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\data.py", line 162, in _is_nlu_file
is_nlu_file = any(_contains_nlu_pattern(l) for l in f)
File "d:\anaconda3.7\envs\rasa\lib\site-packages\rasa\data.py", line 162, in
is_nlu_file = any(_contains_nlu_pattern(l) for l in f)
File "d:\anaconda3.7\envs\rasa\lib\codecs.py", line 323, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 18: invalid continuation byte
请问作者大大有试过这种情况吗？编码问题

[回复]
52nlp 回复:
10 4 月, 2020 at 15:09
没有

[回复]

Rasa 入坑指南一：初识 Rasa

作者52nlp

作者 52nlp

相关文章

DeepSeek-V3解析及技术报告英中报告对照版

如何构建和优化推理型大型语言模型？DeepSeek R1的启示

新浪张俊林：大语言模型的涌现能力——现象与解释

《Rasa 入坑指南一：初识 Rasa》有6条评论

发表回复

You missed

Qwen2.5-Omni：迈向通用多模态AI的里程碑——解读首个支持实时多模态输入与输出的统一模型

Google DeepMind 发布多模态轻量级开源模型 Gemma 3：性能与功能全面升级

DeepSeek-V3解析及技术报告英中报告对照版

Qwen2.5-VL：阿里巴巴新一代多模态大模型的技术突破与应用前景

作者52nlp

相关文章：

作者 52nlp

相关文章

《Rasa 入坑指南一：初识 Rasa》有6条评论

发表回复

You missed