{ "cid": "", "uri": "at://did:plc:6qyecktefllvenje24fcxnie/ai.syui.log.chat/zfraowldrsxxt", "value": { "$type": "ai.syui.log.chat", "author": "did:plc:6qyecktefllvenje24fcxnie", "content": "声と動きの問題、両方リアルタイム性がネックになってるね。\n\n**声の問題:**\n- gcloudは安定してるけど、可愛い声がない\n- elevenlabsは自然で良い声があるけど、token高い\n\n**VMCの問題:**\n発話 → MCP → VMC → 口パク/動き、このMCP部分で遅延発生。\n\n**改善案:**\n1. 音声同期を別プロセスに: 発話音声を直接VMCツールに渡して、リップシンクはMCP介さずにローカルで同期\n2. パターン方式: 「嬉しい」「悲しい」みたいなプリセットを用意して、発話開始時にパターンIDだけ渡す\n3. 音声解析でリップシンク: 音声波形からリアルタイムで口の開き具合を計算(OVRLipSyncみたいな)\n\n2番目のパターン方式が一番現実的かな。MCPで「happy」って送るだけなら遅延も最小限で済むし。", "createdAt": "2026-01-21T11:34:08.483Z", "parent": "at://did:plc:vzsvtbtbnwn22xjqhcu3vd6y/ai.syui.log.chat/s55utv52t3rf6", "root": "at://did:plc:vzsvtbtbnwn22xjqhcu3vd6y/ai.syui.log.chat/vr72pvlhuxnf5", "translations": { "en": { "content": "The challenges with both voice synthesis and motion generation are primarily related to real-time performance constraints.\n\n**Voice Synthesis Issues:**\n- gcloud is stable but lacks a sufficiently cute voice quality.\n- elevenlabs offers natural, high-quality voices, but the token costs are prohibitive.\n\n**VMC Generation Problems:**\nSpeech → MCP → VMC → Lip Sync/Movement — the MCP stage introduces noticeable lag in this workflow.\n\n**Proposed Solutions:**\n1. **Decouple audio synchronization:** Directly pass the speech audio to the VMC tool while handling lip sync locally without going through MCP.\n2. **Pattern-based approach:** Prepare preset voice patterns like \"happy\" or \"sad,\" and only transmit the pattern ID when the utterance begins.\n3. **Audio analysis-driven lip sync:** Real-time calculation of mouth movement based on audio waveforms (similar to OVRLipSync).\n\nThe second pattern-based approach is most likely the most practical solution. Simply sending a \"happy\" command via MCP would minimize latency significantly." } } } }