Junhao Fu / John 付俊豪 / John

Agent RL Researcher · LLM Post-training · RLVR.
Building long-horizon agents that train themselves. Agent RL 算法研究员 · LLM 后训练 · RLVR
构建能自我训练的长程智能体。

Algorithm Researcher 算法研究员 @ Sorcara US · 8 years in AI 8 年 AI 经验

✉ Email ✉ 邮箱 ⌨ GitHub 📄 Latest Paper 📄 最新论文

Open to AI Agent / LLM Algorithm Expert roles 寻求 AI Agent / LLM 算法专家机会

About 关于

I research algorithms for AI agents end-to-end — from RLVR and Agentic-RL closed-loop training, long-horizon credit assignment, and reward design, to the agent harness engineering that makes those algorithms ship in production: tool use, memory & reflection, RAG / GraphRAG, multi-agent orchestration.

我研究 AI 智能体的全栈算法 — 从 RLVR 与 Agentic-RL 闭环训练、长程信用分配、 reward 设计,到让算法在生产中跑起来的智能体工程:工具调用、记忆与反思、 RAG / GraphRAG、Multi-Agent 协同。

Currently leading three Agent product lines: a Shopify coding agent that cut store delivery from 3-5 weeks to under 1 day, a shopping guide agent built on Graph + Vector hybrid RAG, and an MCP-based smart customer service agent. Earlier: 3 years on multi-modal product governance and multi-agent inspection at e-commerce scale.

目前主导三条 Agent 产品线:Shopify 建站 Coding Agent(将建站周期从 3-5 周压缩到 1 天内)、智能导购 Agent(基于 Graph + Vector Hybrid RAG)、以及 MCP 协议的智能客服 Agent。早期 3 年在电商业务上做多模态商品治理与 Multi-Agent 抽检。

Most recent work: Compiler-as-Reward — using process feedback from compilers as RL training signal for coding agents.

最近工作:Compiler-as-Reward — 把编译器的过程反馈作为 Coding Agent 的 RL 训练信号。

Research & Patents 研究与专利

Compiler-as-Reward: Process Feedback for Coding Agent RL Training Compiler-as-Reward:面向 Coding Agent 强化学习训练的过程反馈

2026 Junhao Fu 付俊豪

Proposes Compiler-OPD and Error-Branch — two process reward mechanisms that turn compiler diagnostics into dense RL training signals. The only method that maintains non-zero task success rate at convergence on a 22-task Shopify Horizon coding agent benchmark.

提出 Compiler-OPD 与 Error-Branch 两项过程奖励机制,将编译器诊断信息转化为稠密的 RL 训练信号。在 22-task Shopify Horizon Coding Agent benchmark 上, 是收敛阶段唯一维持非零任务成功率的方法。

code 代码 · paper coming 论文待发

Sparse Black-Box Multimodal Attack for Vision-Language Adversary Generation

EMNLP 2023 Main Conference 主会

A sparse black-box attack method against vision-language models, generating adversarial samples with limited model access while preserving semantic plausibility.

一种针对视觉-语言模型的稀疏黑盒攻击方法,在有限模型访问下生成对抗样本, 同时保持语义合理性。

Knowledge-Guided Adversarial Mutation Learning for Psoriasis-Like Listing Detection

Patent Alibaba Group · Core Patent 阿里巴巴集团 · 核心专利

Multimodal product recognition robust to adversarial evasion patterns on e-commerce platforms. Deployed in production for Taobao/Tmall risk control.

针对电商平台对抗规避模式具备鲁棒性的多模态商品识别方法。部署于淘宝/天猫风控生产环境。

Open Source 开源

mlx-agent-rl

Author · Maintainer 作者 · 维护者

Native multi-turn Agent RL training framework on Apple Silicon, built on the MLX ecosystem. Implements the GRPO family of critic-free algorithms: GRPO, Dr.GRPO, DAPO, and GiGPO (with NeurIPS 2025 two-level advantage estimation). Multi-turn rollout integrates Policy + Environment + SlidingMemory, updated with PPO-clip. Ships with four built-in tool environments: calculator, search, sql, web shopping. Supports 4/8-bit quantization and LoRA.

Apple Silicon 原生的多轮 Agent RL 训练框架,基于 MLX 生态。实现 GRPO 家族 4 个 critic-free 算法:GRPO、Dr.GRPO、DAPO、GiGPO(含 NeurIPS 2025 两级优势估计)。多轮 rollout 集成 Policy + Environment + SlidingMemory,用 PPO-clip 更新。内置 4 类工具环境:calculator / search / sql / web shopping。支持 4/8-bit 量化与 LoRA。

MLX Apple Silicon GRPO Multi-turn RL LoRA Python

github →

Selected Projects 代表项目

Shopify Store Coding Agent Shopify 建站 Coding Agent

Architect & Lead · 2024.07–now 架构师 & 主导 · 2024.07–至今

End-to-end coding agent that autonomously generates production-grade Shopify themes (Liquid/HTML/CSS/JS). Three-layer Agent Harness: deterministic layer (Aider-style RepoMap + Tree-Sitter AST + PageRank symbol ranking, Auto Harness self-evolving rule base), LLM layer (multi-expert agents for clarification, code generation, theme QA), and runtime control (file-system Plan & Memory persistence, 4-breakpoint Prompt Caching, JSONPatch incremental edits, 50-round iteration with error recovery). Built on top: an RLVR closed loop where GRPO trains four binary rewards (compilation / schema / image_config / lint) signalled by a compiler API, an auto harness rule engine, and a live theme dev server.

端到端 Coding Agent,自主生成生产级 Shopify 主题(Liquid/HTML/CSS/JS)。三层 Agent Harness:确定性层(Aider 风 RepoMap + Tree-Sitter AST + PageRank 符号排序, Auto Harness 自维护规则库)、LLM 层(多专家智能体协同 — 需求澄清、代码生成、主题 QA)、运行时控制(文件系统 Plan & Memory 持久化、4 断点 Prompt Caching、JSONPatch 增量编辑、50 轮迭代 + 错误恢复)。其上是 RLVR 闭环:GRPO 训练 4 个二值 Reward (编译 / Schema / 图像配置 / Lint),信号来自编译器 API、Auto Harness 规则引擎与 Theme Dev Server。

60% → 95%+ Liquid executability Liquid 可执行率

$2k-5k → $30 Cost per store 单店成本

3-5 weeks → 1 day 3-5 周 → 1 天 Build time 建站周期

10+ verticals 10+ 行业 Industries covered 覆盖行业

Aider RepoMap Tree-Sitter PageRank GRPO Prompt Caching LLM-as-Judge

Shopping Guide Agent 智能导购 Agent

Architect & Lead · 2024.07–2025.03 架构师 & 主导 · 2024.07–2025.03

ReAct-driven product recommendation agent over a hybrid Graph + Vector RAG stack. Auto-built knowledge graph: LLM extracts 30+ relation types from product descriptions, reviews and Q&A, stored across NebulaGraph (scale) + Neo4j (Cypher querying) + NetworkX (in-memory graph algorithms). Three-way parallel retrieval — Graph RAG (LLM-generated Cypher for multi-hop traversal) + Vector RAG (Milvus 768d rerank embedding) + keyword — deduped and re-ranked by LLM. Seven-intent ReAct controller dynamically orchestrates ProductRecall, CategoryTool, ComplianceRAG, SpecificationTool, and ProductRecommender. 20-turn dialog context with explainable match_score output.

面向商品推荐的 ReAct 决策流,叠加 Graph + Vector Hybrid RAG。自动构建 KG:LLM 从产品描述、评价、Q&A 中抽取 30+ 关系类型,三层图存储 — NebulaGraph(规模) + Neo4j(Cypher 查询)+ NetworkX(内存图算法)。三路并发检索:Graph RAG(LLM 生成 Cypher 做多跳遍历)+ Vector RAG(Milvus 768d rerank embedding)+ 关键词, 去重后 LLM Rerank 精排。七意图 ReAct controller 动态编排 ProductRecall、 CategoryTool、ComplianceRAG、SpecificationTool、ProductRecommender。20 轮对话上下文 + 可解释的 match_score 输出。

days → minutes 数天 → 分钟级 Decision turnaround 决策耗时

30+ types 30+ 种 KG relation schema KG 关系类型

20 turns 20 轮 Context retention 上下文深度

ReAct GraphRAG VectorRAG NebulaGraph Neo4j Milvus Cypher

MCP Smart Customer Service Agent MCP 智能客服 Agent

Architect · 2025.09–now 架构师 · 2025.09–至今

Root-cause analysis on 100+ real conversations identified the bottleneck as information silos, not model capability. Built 7 MCP-protocol data connectors (subscriptions, orders, logistics, accounts, products, Shopify sync, publish state) to feed the Fin AI Agent. Two-layer routing: keyword fast-path for high-frequency intents (orders/logistics/subscriptions) + LLM Agent fallback for complex intents (complaints/refunds). Strict isolation (order ownership validation) and PII redaction (no internal IDs, wholesale prices, or domestic-segment logistics exposed).

对 100+ 真实对话做根因分析,定位瓶颈是信息孤岛而非模型能力。基于 MCP 协议构建 7 个数据连接器(订阅 / 订单 / 物流 / 账户 / 产品 / Shopify 同步 / 发布状态), 为 Fin AI Agent 提供 context。双层路由:关键词快排高频意图(订单 / 物流 / 订阅) + LLM Agent 兜底复杂意图(投诉 / 退款)。严格隔离(订单归属验证)+ PII 脱敏 (禁暴露内部 ID / 批发价 / 国内段物流)。

12% → 3% Human handoff (expected) 转人工率(预期)

7 tools 7 个 MCP data connectors MCP 数据连接器

MCP Router Context Engineering Data Redaction

Multi-Agent Product Inspection Multi-Agent 商品抽检平台

Senior Algorithm Engineer · 2021.03–2024.03 高级算法工程师 · 2021.03–2024.03

Multimodal product governance for tens of millions of SKUs. Built a knowledge base covering 50+ categories by parsing 200+ national standards documents via PP-Structure (layout analysis + multi-modal LLMs extracting tables/charts). Fine-tuned Qwen 7B for review-based intent recognition and attribute-level sentiment. Autogen orchestrated 4-agent workflow (product parse → risk assess → compliance retrieve → inspection recommend). BGE-based RAG matched standards documents to product attributes. Bad-case feedback loop drove rule self-update.

面向千万级 SKU 的多模态商品治理。基于 PP-Structure(版面解析 + 多模态 LLM 提取表格 / 图表)解析 200+ 国标文件,构建覆盖 50+ 品类的合规检测知识库。微调 Qwen 7B 做评价意图识别与属性级情感分析。Autogen 编排 4 个 Agent 工作流 (商品解析 → 风险评估 → 合规检索 → 抽检推荐)。BGE 表征做国标文档与商品属性的 RAG 匹配。Bad-case 反馈闭环驱动规则自更新。

90% / 92% Intent / sentiment acc. 意图 / 情感准确率

85% Recommendation accuracy 推荐准确率

−70% Manual audit volume 人工审核量

PP-Structure Qwen-7B Autogen BGE Multi-Agent

Writing 写作

Technical notes on Agent RL, LLM post-training, and agent systems. Read the blog →

关于 Agent RL、LLM 后训练、智能体系统的技术笔记。查看博客 →

Experience 工作经历

2024.07 — now 2024.07 — 至今

Algorithm Researcher 算法研究员

Sorcara US · Agent product lines (Coding / Decision / Customer Service) Sorcara US · Agent 产品线(Coding / Decision / Customer Service)

2021.03 — 2024.03

Senior Algorithm Engineer 高级算法工程师

Alibaba Group · Taobao/Tmall product governance & multi-modal AI 阿里巴巴 · 淘宝/天猫商品治理 & 多模态 AI

2018.03 — 2021.02

Algorithm Engineer 算法工程师

Beijing Shumei · Content safety & image retrieval at scale 北京数美时代 · 内容安全 & 大规模图片检索

2013 — 2017

B.Eng. in Optoelectronic Information Science & Engineering 光电信息科学与工程学士

Changzhou Institute of Technology 常州工学院

Contact 联系

Email 邮箱: john.hao.fu@gmail.com
GitHub: @johnhaofu
Status 状态: Open to AI Agent Algorithm Engineer / LLM Algorithm Expert roles. 寻求 AI Agent 算法工程师 / LLM 算法专家 机会。