MOPI-HFRS · A2C extension

Multi-objective RL for healthier food recommendations

Joint work with Dr. Chuxu Zhang. The original MOPI-HFRS paper reports A2C-style improvements on a static recommender — but it (a) used a high-variance REINFORCE policy gradient, (b) had three compounding evaluation bugs that misreported accuracy by over 50%, and (c) lacked any warm-start.

I rebuilt the system with an A2C critic baseline, added behavioral-cloning pretraining, and designed a per-step multi-objective reward combining relevance and health-tag overlap. The corrected baseline matters as much as the result.

Highlights

Won 1st in UConn School of Computing College of Engineering Senior Design Competition (1 of 72).
Discovered & corrected 3 evaluation bugs in original codebase.
Replaced REINFORCE → A2C with critic baseline + behavior-cloning warm start.
Per-step multi-objective reward: relevance + health-tag overlap.
Health alignment: 0.39 → 0.73 (slight tradeoff on preference alignment).

View on GitHub → ← back to all work