The slide presents a four-stage workflow for designing a reward function in unmanned boat control. It covers defining components (trajectory error, collision penalties, speed rewards, energy costs), formulating a weighted linear equation with thresholds, tuning weights experimentally, and validating via simulations for convergence, obstacle avoidance, and energy optimization.
奖励函数设计
{ "headers": [ "阶段", "任务描述", "关键细节" ], "rows": [ [ "1. 定义奖励组件", "识别无人船控制的关键因素", "轨迹误差、碰撞惩罚、速度奖励、能耗惩罚" ], [ "2. 设计奖励公式", "构建加权线性组合及分段机制", "r = w1(-轨迹误差) + w2(-碰撞) + w3(速度) + w4(-能耗);航行+100,碰撞-1000" ], [ "3. 参数调优", "调整权重系数w1~w4", "通过初步实验平衡各组件影响" ], [ "4. 验证迭代", "仿真测试奖励函数效果", "确保训练收敛、避障优先、节能优化" ] ] }
Source: r = w1 (-轨迹误差) + w2 (-碰撞惩罚) + w3 (速度奖励) + w4 (-能耗)。分段设计:航行+100,碰撞-1000。