English

How the Xiaomi speaker is linked to OpenClaw

Solution 1: Use "Xiaoai Open Platform - Custom Skills" to forward speech text to OpenClaw

(Most like "access speaker")

Target effect:
You say a word to the Xiaomi speaker→ Xiao Ai sends the recognized text to your server, → the server calls OpenClaw /v1/chat/completions → to reply to Xiao Ai → speaker broadcast.

Architecture

Xiaomi Speaker (ASR) → Xiaoai Skill Cloud Callback (HTTPS) → your webhook service → OpenClaw HTTP → return text → Xiaoai (TTS) broadcast

Preparation is required

  1. Xiao Ai Open Platform Account & Create Custom Skill ("Skill Development → Custom Skill" in the console)
  2. An HTTPS service accessible on the public network (domain name + certificate, platform callback generally requires HTTPS)
  3. OpenClaw turns on OpenAI-compatible HTTP endpoints:POST /v1/chat/completions

Key Implementation Points (Minimal Integration Steps)

  1. Create a "custom skill" on the Xiaoai platform and configure:
    • Wake word/skill name
    • Intent/slot (simplest: treat the user as query text in the whole sentence)
    • server-side callback URL (your) https://xxx.com/xiaoai/webhook
  2. When your webhook service receives a request from Xiaoai:
    • Verification Signature/Token (according to the documentation requirements of the Xiaoai platform)
    • Take out what the user saidquery
    • Tune OpenClaw:
      • POST http://<openclaw-gateway-host>:<port>/v1/chat/completions([OpenClaw][2])
    • Spell the returned OpenClaw content back into the JSON response format requested by Xiao Ai (let it go TTS)

A minimal example of a "bridge service" (Node/Express)

The following example only demonstrates "transferring text to OpenClaw and then returning text", and you need to fill in the signature verification/response format fields on Xiao Ai's side according to the platform documentation.

import express from "express";

const app = express();
app.use(express.json());

app.post("/xiaoai/webhook", async (req, res) => {
  // 1) TODO: 校验小爱签名/鉴权(按平台文档)
  const userText =
    req.body?.request?.intent?.query ||
    req.body?.query ||
    req.body?.request?.query ||
    "";

  // 2) 调 OpenClaw OpenAI 兼容端点
  const r = await fetch("http://127.0.0.1:18789/v1/chat/completions", {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({
      model: "auto",
      messages: [
        { role: "system", content: "你是简洁的中文语音助手,回答尽量短。" },
        { role: "user", content: userText },
      ],
    }),
  });

  const data = await r.json();
  const answer = data?.choices?.[0]?.message?.content?.trim() || "我没听清。";

  // 3) TODO: 按小爱自定义技能要求组织响应 JSON(这里先用占位)
  return res.json({
    // 你需要替换成小爱平台规定的字段
    reply: answer,
  });
});

app.listen(3000, () => console.log("xiaoai bridge on :3000"));

Solution 2: Use Home Assistant to connect the "Xiaomi Ecosystem" and use automation to drop the text to OpenClaw

(More suitable for "voice-triggered linkage")

Target effect: You say "turn on the living room light/execute a certain scene" to the speaker → Xiaomi ecological device/scene status into Home Assistant → HA automation trigger → tune OpenClaw (do summary/broadcast/record/perform additional actions).

Preparation is required

  • Home Assistant
  • Xiaomi official Xiaomi Home integration (supports OAuth2 login)
  • Optionally, use HA's webhook/REST command to forward the event to OpenClaw

Advantages of this road: you don't need to do the Xiao Ai skill review process to play a lot of linkages; Disadvantages: More like "smart home linkage", not real "arbitrary Q&A chat".

Selection Recommendations (Direct Conclusion)

  • What you want is: "Use the speaker as a voice entrance, be able to chat freely with OpenClaw" → Select A (Xiao Ai custom skill).
  • What you want is: "Xiaomi device/scene into the hub, and then let OpenClaw participate in automation" → Select B (Home Assistant).

Yes, the idea of using Home Assistant is probably to connect the "Xiaomi Ecosystem" to HA and use it as a trigger source/actuator; OpenClaw is used as a "brain"; The speaker is responsible for "listening/speaking" or "announcing". (Note, however: many Xiaomi speakers may not be able to broadcast TTS directly as standard media_player in HA, which is model/protocol dependent.) )

Two typical ways to play with Home Assistant

Gameplay A: Xiao Ai is responsible for voice (cloud), HA is for automation, and OpenClaw is for decision-making/text generation

You say to the speaker: "Turn on the living room light and tell me what to do today"

  • Xiao Ai -> triggers the Mijia scene/device
  • HA listens for device state changes / scene triggers
  • HA tunes OpenClaw to generate a "broadcast text/summary" sentence
  • HA then "casts" the text to a broadcastable device (Chromecast/Google speaker/ESPHome speaker, etc.)

Suitable for: You already have an HA hub and want to turn OpenClaw into an "interpreter/summarizer/logger".

Gameplay B: HA Local Assist as a voice entry (more like a "local voice assistant")

  • STT/TTS with HA Assist (local/semi-local voice pipeline).
  • OpenClaw as a conversational backend (you write a middleware to transfer text to OpenClaw)
  • HA is responsible for broadcasting the reply TTS (to HA's voice satellite or media_player)

Suitable for: You want to "localize" the entire link as much as possible, but with a higher amount of work.

2) Access to Xiaomi devices (HA side)

Now the mainstream is to use Xiaomi Home Integration (official/official cooperation route, OAuth2 login, no password saved).

Instructions (very short):

  • HA: Settings → Devices & services → Add Integration → Xiaomi Home (OAuth login will be directed)
  • After logging in, the device appears as an entity (light, socket, sensor, air conditioner, etc.)

3) How to "call OpenClaw" in HA

At its core: HA automates → sends HTTP requests to OpenClaw. It's common practice rest_commandin HA (and automated actions to call it), and it's often used in the community to "make webhooks/HTTP requests".

For example: configuration.yaml

rest_command:
  openclaw_chat:
    url: "http://127.0.0.1:18789/v1/chat/completions"
    method: POST
    headers:
      content-type: "application/json"
    payload: >
      {
        "model":"auto",
        "messages":[
          {"role":"system","content":"你是简洁的中文家庭助手。"},
          {"role":"user","content":"{{ prompt }}"}
        ]
      }

Then in the automation:

  • Trigger: Change in the status of a certain Xiaomi device (e.g. door magnetic opening)
  • Action: Call rest_command.openclaw_chatand prompt fill in
  • Then broadcast/notify the returned content (see next section)

For basic concepts of automation triggers, refer to the HA documentation: state changes, events, and more can be triggered.

4) How OpenClaw's reply "aired"

This step is the easiest to step on: whether the Xiaomi speaker can be used media_player as a broadcast TTS by HA is often unstable/uncommon (there is a lot of discussion in the community about Xiaomi Smart Speaker / network speaker, but there is no "unified official feasible way" to adapt it to all models).

So there are generally three "pragmatic" routes:

Route 1: Broadcast to "Broadcast Equipment Already Stable in HA"

  • Chromecast / Google Nest / Sonos / ESPHome speaker / HA Voice Assistant satellite, etc
    (This type of TTS/playback link is more mature in HA)

Route 2: Don't broadcast TTS, change to notifications

  • HA mobile app push
  • Telegram/Feishu (you already have a bot system)

Route 3: The speaker continues to follow the "Little Love Cloud Broadcast"

  • Let HA call Mijia scene/Xiao Ai skills to broadcast (depending on whether you can make "broadcast action" into a scene/service)

5) How to choose

(OpenClaw target now)

  • What you want is "speaker = free chat entrance": Prioritize "Xiaoai Custom Skill → OpenClaw" (no HA required)
  • What you want is "whole house automation + OpenClaw participation in decision-making/summary": use HA (device access, triggering, and linkage are all in HA)
Scroll to Top