Solution 1: Use "Xiaoai Open Platform - Custom Skills" to forward speech text to OpenClaw

(Most like "access speaker")

Target effect:
You say a word to the Xiaomi speaker→ Xiao Ai sends the recognized text to your server, → the server calls OpenClaw /v1/chat/completions → to reply to Xiao Ai → speaker broadcast.

Architecture

Xiaomi Speaker (ASR) → Xiaoai Skill Cloud Callback (HTTPS) → your webhook service → OpenClaw HTTP → return text → Xiaoai (TTS) broadcast

Preparation is required

Xiao Ai Open Platform Account & Create Custom Skill ("Skill Development → Custom Skill" in the console)
An HTTPS service accessible on the public network (domain name + certificate, platform callback generally requires HTTPS)
OpenClaw turns on OpenAI-compatible HTTP endpoints:POST /v1/chat/completions

Key Implementation Points (Minimal Integration Steps)

Create a "custom skill" on the Xiaoai platform and configure:
- Wake word/skill name
- Intent/slot (simplest: treat the user as query text in the whole sentence)
- server-side callback URL (your) https://xxx.com/xiaoai/webhook
When your webhook service receives a request from Xiaoai:
- Verification Signature/Token (according to the documentation requirements of the Xiaoai platform)
- Take out what the user saidquery
- Tune OpenClaw:
  - POST http://<openclaw-gateway-host>:<port>/v1/chat/completions([OpenClaw][2])
- Spell the returned OpenClaw content back into the JSON response format requested by Xiao Ai (let it go TTS)

A minimal example of a "bridge service" (Node/Express)

The following example only demonstrates "transferring text to OpenClaw and then returning text", and you need to fill in the signature verification/response format fields on Xiao Ai's side according to the platform documentation.

import express from "express";

const app = express();
app.use(express.json());

app.post("/xiaoai/webhook", async (req, res) => {
  // 1) TODO: 校验小爱签名/鉴权（按平台文档）
  const userText =
    req.body?.request?.intent?.query ||
    req.body?.query ||
    req.body?.request?.query ||
    "";

  // 2) 调 OpenClaw OpenAI 兼容端点
  const r = await fetch("http://127.0.0.1:18789/v1/chat/completions", {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({
      model: "auto",
      messages: [
        { role: "system", content: "你是简洁的中文语音助手，回答尽量短。" },
        { role: "user", content: userText },
      ],
    }),
  });

  const data = await r.json();
  const answer = data?.choices?.[0]?.message?.content?.trim() || "我没听清。";

  // 3) TODO: 按小爱自定义技能要求组织响应 JSON（这里先用占位）
  return res.json({
    // 你需要替换成小爱平台规定的字段
    reply: answer,
  });
});

app.listen(3000, () => console.log("xiaoai bridge on :3000"));

Solution 2: Use Home Assistant to connect the "Xiaomi Ecosystem" and use automation to drop the text to OpenClaw

(More suitable for "voice-triggered linkage")

Target effect: You say "turn on the living room light/execute a certain scene" to the speaker → Xiaomi ecological device/scene status into Home Assistant → HA automation trigger → tune OpenClaw (do summary/broadcast/record/perform additional actions).

Preparation is required

Home Assistant
Xiaomi official Xiaomi Home integration (supports OAuth2 login)
Optionally, use HA's webhook/REST command to forward the event to OpenClaw

Advantages of this road: you don't need to do the Xiao Ai skill review process to play a lot of linkages; Disadvantages: More like "smart home linkage", not real "arbitrary Q&A chat".

Selection Recommendations (Direct Conclusion)

What you want is: "Use the speaker as a voice entrance, be able to chat freely with OpenClaw" → Select A (Xiao Ai custom skill).
What you want is: "Xiaomi device/scene into the hub, and then let OpenClaw participate in automation" → Select B (Home Assistant).

Yes, the idea of using Home Assistant is probably to connect the "Xiaomi Ecosystem" to HA and use it as a trigger source/actuator; OpenClaw is used as a "brain"; The speaker is responsible for "listening/speaking" or "announcing". (Note, however: many Xiaomi speakers may not be able to broadcast TTS directly as standard media_player in HA, which is model/protocol dependent.) ）

Two typical ways to play with Home Assistant

Gameplay A: Xiao Ai is responsible for voice (cloud), HA is for automation, and OpenClaw is for decision-making/text generation

You say to the speaker: "Turn on the living room light and tell me what to do today"

Xiao Ai -> triggers the Mijia scene/device
HA listens for device state changes / scene triggers
HA tunes OpenClaw to generate a "broadcast text/summary" sentence
HA then "casts" the text to a broadcastable device (Chromecast/Google speaker/ESPHome speaker, etc.)

Suitable for: You already have an HA hub and want to turn OpenClaw into an "interpreter/summarizer/logger".

Gameplay B: HA Local Assist as a voice entry (more like a "local voice assistant")

STT/TTS with HA Assist (local/semi-local voice pipeline).
OpenClaw as a conversational backend (you write a middleware to transfer text to OpenClaw)
HA is responsible for broadcasting the reply TTS (to HA's voice satellite or media_player)

Suitable for: You want to "localize" the entire link as much as possible, but with a higher amount of work.

2) Access to Xiaomi devices (HA side)

Now the mainstream is to use Xiaomi Home Integration (official/official cooperation route, OAuth2 login, no password saved).

Instructions (very short):

HA: Settings → Devices & services → Add Integration → Xiaomi Home (OAuth login will be directed)
After logging in, the device appears as an entity (light, socket, sensor, air conditioner, etc.)

3) How to "call OpenClaw" in HA

At its core: HA automates → sends HTTP requests to OpenClaw. It's common practice rest_commandin HA (and automated actions to call it), and it's often used in the community to "make webhooks/HTTP requests".

For example: configuration.yaml

rest_command:
  openclaw_chat:
    url: "http://127.0.0.1:18789/v1/chat/completions"
    method: POST
    headers:
      content-type: "application/json"
    payload: >
      {
        "model":"auto",
        "messages":[
          {"role":"system","content":"你是简洁的中文家庭助手。"},
          {"role":"user","content":"{{ prompt }}"}
        ]
      }

Then in the automation:

Trigger: Change in the status of a certain Xiaomi device (e.g. door magnetic opening)
Action: Call rest_command.openclaw_chatand prompt fill in
Then broadcast/notify the returned content (see next section)

For basic concepts of automation triggers, refer to the HA documentation: state changes, events, and more can be triggered.

4) How OpenClaw's reply "aired"

This step is the easiest to step on: whether the Xiaomi speaker can be used media_player as a broadcast TTS by HA is often unstable/uncommon (there is a lot of discussion in the community about Xiaomi Smart Speaker / network speaker, but there is no "unified official feasible way" to adapt it to all models).

So there are generally three "pragmatic" routes:

Route 1: Broadcast to "Broadcast Equipment Already Stable in HA"

Chromecast / Google Nest / Sonos / ESPHome speaker / HA Voice Assistant satellite, etc
(This type of TTS/playback link is more mature in HA)

Route 2: Don't broadcast TTS, change to notifications

HA mobile app push
Telegram/Feishu (you already have a bot system)

Route 3: The speaker continues to follow the "Little Love Cloud Broadcast"

Let HA call Mijia scene/Xiao Ai skills to broadcast (depending on whether you can make "broadcast action" into a scene/service)

5) How to choose

(OpenClaw target now)

What you want is "speaker = free chat entrance": Prioritize "Xiaoai Custom Skill → OpenClaw" (no HA required)
What you want is "whole house automation + OpenClaw participation in decision-making/summary": use HA (device access, triggering, and linkage are all in HA)