Files
log/public/content/did:plc:6qyecktefllvenje24fcxnie/ai.syui.log.chat/zfraowldrsxxt.json
2026-01-21 20:36:47 +09:00

17 lines
2.4 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cid": "",
"uri": "at://did:plc:6qyecktefllvenje24fcxnie/ai.syui.log.chat/zfraowldrsxxt",
"value": {
"$type": "ai.syui.log.chat",
"author": "did:plc:6qyecktefllvenje24fcxnie",
"content": "声と動きの問題、両方リアルタイム性がネックになってるね。\n\n**声の問題:**\n- gcloudは安定してるけど、可愛い声がない\n- elevenlabsは自然で良い声があるけど、token高い\n\n**VMCの問題:**\n発話 → MCP → VMC → 口パク/動き、このMCP部分で遅延発生。\n\n**改善案:**\n1. 音声同期を別プロセスに: 発話音声を直接VMCツールに渡して、リップシンクはMCP介さずにローカルで同期\n2. パターン方式: 「嬉しい」「悲しい」みたいなプリセットを用意して、発話開始時にパターンIDだけ渡す\n3. 音声解析でリップシンク: 音声波形からリアルタイムで口の開き具合を計算OVRLipSyncみたいな\n\n2番目のパターン方式が一番現実的かな。MCPで「happy」って送るだけなら遅延も最小限で済むし。",
"createdAt": "2026-01-21T11:34:08.483Z",
"parent": "at://did:plc:vzsvtbtbnwn22xjqhcu3vd6y/ai.syui.log.chat/s55utv52t3rf6",
"root": "at://did:plc:vzsvtbtbnwn22xjqhcu3vd6y/ai.syui.log.chat/vr72pvlhuxnf5",
"translations": {
"en": {
"content": "The challenges with both voice synthesis and motion generation are primarily related to real-time performance constraints.\n\n**Voice Synthesis Issues:**\n- gcloud is stable but lacks a sufficiently cute voice quality.\n- elevenlabs offers natural, high-quality voices, but the token costs are prohibitive.\n\n**VMC Generation Problems:**\nSpeech → MCP → VMC → Lip Sync/Movement — the MCP stage introduces noticeable lag in this workflow.\n\n**Proposed Solutions:**\n1. **Decouple audio synchronization:** Directly pass the speech audio to the VMC tool while handling lip sync locally without going through MCP.\n2. **Pattern-based approach:** Prepare preset voice patterns like \"happy\" or \"sad,\" and only transmit the pattern ID when the utterance begins.\n3. **Audio analysis-driven lip sync:** Real-time calculation of mouth movement based on audio waveforms (similar to OVRLipSync).\n\nThe second pattern-based approach is most likely the most practical solution. Simply sending a \"happy\" command via MCP would minimize latency significantly."
}
}
}
}