Solution 1: Use "Xiaoai Open Platform - Custom Skills" to forward speech text to OpenClaw
(Most like "access speaker")
Target effect:
You say a word to the Xiaomi speaker→ Xiao Ai sends the recognized text to your server, → the server calls OpenClaw /v1/chat/completions → to reply to Xiao Ai → speaker broadcast.
Architecture
Xiaomi Speaker (ASR) → Xiaoai Skill Cloud Callback (HTTPS) → your webhook service → OpenClaw HTTP → return text → Xiaoai (TTS) broadcast
Preparation is required
- Xiao Ai Open Platform Account & Create Custom Skill ("Skill Development → Custom Skill" in the console)
- An HTTPS service accessible on the public network (domain name + certificate, platform callback generally requires HTTPS)
- OpenClaw turns on OpenAI-compatible HTTP endpoints:
POST /v1/chat/completions
Key Implementation Points (Minimal Integration Steps)
- Create a "custom skill" on the Xiaoai platform and configure:
- Wake word/skill name
- Intent/slot (simplest: treat the user as query text in the whole sentence)
- server-side callback URL (your)
https://xxx.com/xiaoai/webhook
- When your webhook service receives a request from Xiaoai:
- Verification Signature/Token (according to the documentation requirements of the Xiaoai platform)
- Take out what the user said
query - Tune OpenClaw:
POST http://<openclaw-gateway-host>:<port>/v1/chat/completions([OpenClaw][2])
- Spell the returned OpenClaw
contentback into the JSON response format requested by Xiao Ai (let it go TTS)
A minimal example of a "bridge service" (Node/Express)
The following example only demonstrates "transferring text to OpenClaw and then returning text", and you need to fill in the signature verification/response format fields on Xiao Ai's side according to the platform documentation.
import express from "express";
const app = express();
app.use(express.json());
app.post("/xiaoai/webhook", async (req, res) => {
// 1) TODO: 校验小爱签名/鉴权(按平台文档)
const userText =
req.body?.request?.intent?.query ||
req.body?.query ||
req.body?.request?.query ||
"";
// 2) 调 OpenClaw OpenAI 兼容端点
const r = await fetch("http://127.0.0.1:18789/v1/chat/completions", {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({
model: "auto",
messages: [
{ role: "system", content: "你是简洁的中文语音助手,回答尽量短。" },
{ role: "user", content: userText },
],
}),
});
const data = await r.json();
const answer = data?.choices?.[0]?.message?.content?.trim() || "我没听清。";
// 3) TODO: 按小爱自定义技能要求组织响应 JSON(这里先用占位)
return res.json({
// 你需要替换成小爱平台规定的字段
reply: answer,
});
});
app.listen(3000, () => console.log("xiaoai bridge on :3000"));
Solution 2: Use Home Assistant to connect the "Xiaomi Ecosystem" and use automation to drop the text to OpenClaw
(More suitable for "voice-triggered linkage")
Target effect: You say "turn on the living room light/execute a certain scene" to the speaker → Xiaomi ecological device/scene status into Home Assistant → HA automation trigger → tune OpenClaw (do summary/broadcast/record/perform additional actions).
Preparation is required
- Home Assistant
- Xiaomi official Xiaomi Home integration (supports OAuth2 login)
- Optionally, use HA's webhook/REST command to forward the event to OpenClaw
Advantages of this road: you don't need to do the Xiao Ai skill review process to play a lot of linkages; Disadvantages: More like "smart home linkage", not real "arbitrary Q&A chat".
Selection Recommendations (Direct Conclusion)
- What you want is: "Use the speaker as a voice entrance, be able to chat freely with OpenClaw" → Select A (Xiao Ai custom skill).
- What you want is: "Xiaomi device/scene into the hub, and then let OpenClaw participate in automation" → Select B (Home Assistant).
Yes, the idea of using Home Assistant is probably to connect the "Xiaomi Ecosystem" to HA and use it as a trigger source/actuator; OpenClaw is used as a "brain"; The speaker is responsible for "listening/speaking" or "announcing". (Note, however: many Xiaomi speakers may not be able to broadcast TTS directly as standard media_player in HA, which is model/protocol dependent.) )
Two typical ways to play with Home Assistant
Gameplay A: Xiao Ai is responsible for voice (cloud), HA is for automation, and OpenClaw is for decision-making/text generation
You say to the speaker: "Turn on the living room light and tell me what to do today"
- Xiao Ai -> triggers the Mijia scene/device
- HA listens for device state changes / scene triggers
- HA tunes OpenClaw to generate a "broadcast text/summary" sentence
- HA then "casts" the text to a broadcastable device (Chromecast/Google speaker/ESPHome speaker, etc.)
Suitable for: You already have an HA hub and want to turn OpenClaw into an "interpreter/summarizer/logger".
Gameplay B: HA Local Assist as a voice entry (more like a "local voice assistant")
- STT/TTS with HA Assist (local/semi-local voice pipeline).
- OpenClaw as a conversational backend (you write a middleware to transfer text to OpenClaw)
- HA is responsible for broadcasting the reply TTS (to HA's voice satellite or media_player)
Suitable for: You want to "localize" the entire link as much as possible, but with a higher amount of work.
2) Access to Xiaomi devices (HA side)
Now the mainstream is to use Xiaomi Home Integration (official/official cooperation route, OAuth2 login, no password saved).
Instructions (very short):
- HA: Settings → Devices & services → Add Integration → Xiaomi Home (OAuth login will be directed)
- After logging in, the device appears as an entity (light, socket, sensor, air conditioner, etc.)
3) How to "call OpenClaw" in HA
At its core: HA automates → sends HTTP requests to OpenClaw. It's common practice rest_commandin HA (and automated actions to call it), and it's often used in the community to "make webhooks/HTTP requests".
For example: configuration.yaml
rest_command:
openclaw_chat:
url: "http://127.0.0.1:18789/v1/chat/completions"
method: POST
headers:
content-type: "application/json"
payload: >
{
"model":"auto",
"messages":[
{"role":"system","content":"你是简洁的中文家庭助手。"},
{"role":"user","content":"{{ prompt }}"}
]
}
Then in the automation:
- Trigger: Change in the status of a certain Xiaomi device (e.g. door magnetic opening)
- Action: Call
rest_command.openclaw_chatandpromptfill in - Then broadcast/notify the returned content (see next section)
For basic concepts of automation triggers, refer to the HA documentation: state changes, events, and more can be triggered.
4) How OpenClaw's reply "aired"
This step is the easiest to step on: whether the Xiaomi speaker can be used media_player as a broadcast TTS by HA is often unstable/uncommon (there is a lot of discussion in the community about Xiaomi Smart Speaker / network speaker, but there is no "unified official feasible way" to adapt it to all models).
So there are generally three "pragmatic" routes:
Route 1: Broadcast to "Broadcast Equipment Already Stable in HA"
- Chromecast / Google Nest / Sonos / ESPHome speaker / HA Voice Assistant satellite, etc
(This type of TTS/playback link is more mature in HA)
Route 2: Don't broadcast TTS, change to notifications
- HA mobile app push
- Telegram/Feishu (you already have a bot system)
Route 3: The speaker continues to follow the "Little Love Cloud Broadcast"
- Let HA call Mijia scene/Xiao Ai skills to broadcast (depending on whether you can make "broadcast action" into a scene/service)
5) How to choose
(OpenClaw target now)
- What you want is "speaker = free chat entrance": Prioritize "Xiaoai Custom Skill → OpenClaw" (no HA required)
- What you want is "whole house automation + OpenClaw participation in decision-making/summary": use HA (device access, triggering, and linkage are all in HA)