王老师运营实战

2025-10-15

710

导读：我们从2025-09-24到2025-10-15的369篇文章中精选出10篇优秀的工作分享给读者，主要研究方向包括：生成式推荐系统, 生成式推荐系统, 自然语言约束感知路径推荐, 自然语言控制推荐系统

点击蓝字关注我们

我们从2025-09-24到2025-10-15的369篇文章中精选出10篇优秀的工作分享给读者，主要研究方向包括：生成式推荐系统, 生成式推荐系统, 自然语言约束感知路径推荐, 自然语言控制推荐系统, 用户偏好对齐, 语音对话大模型中的偏见评估, 冷启动推荐中的继承性流行度偏见, 冷启动条件下的下一个兴趣点推荐, 生成式推荐系统, 可微的高效Top-K选择

OneRec-Think: In-Text Reasoning for Generative Recommendation
AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents
Constraint-Aware Route Recommendation from Natural Language via Hierarchical LLM Agents
CTRL-Rec: Controlling Recommender Systems With Natural Language
MTRec: Learning to Align with User Preferences via Mental Reward Models
Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations
On Inherited Popularity Bias in Cold-Start Item Recommendation
Do We Really Need SFT? Prompt-as-Policy over Knowledge Graphs for Cold-start Next POI Recommendation
Reinforced Preference Optimization for Recommendation
Differentiable Fast Top-K Selection for Large-Scale Recommendation

1.OneRec-Think: In-Text Reasoning for Generative Recommendation

Authors: Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai , et al. (1 additional authors not shown)

Affiliations: Kuaishou Inc., Beijing, China

https://arxiv.org/abs/2510.11639

论文摘要

The powerful generative capacity of Large Language Models (LLMs) has instigated a paradigm shift in recommendation. However, existing generative models (e.g., OneRec) operate as implicit predictors, critically lacking the capacity for explicit and controllable reasoning-a key advantage of LLMs. To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation. OneRec-Think incorporates: (1) Itemic Alignment: cross-modal Item-Textual Alignment for semantic grounding; (2) Reasoning Activation: Reasoning Scaffolding to activate LLM reasoning within the recommendation context; and (3) Reasoning Enhancement, where we design a recommendation-specific reward function that accounts for the multi-validity nature of user preferences. Experiments across public benchmarks show state-of-the-art performance. Moreover, our proposed "Think-Ahead" architecture enables effective industrial deployment on Kuaishou, achieving a 0.159% gain in APP Stay Time and validating the practical efficacy of the model's explicit reasoning capability.

论文简评: 这篇论文提出了一种新的生成式推荐框架，名为OneRec-Think，旨在解决现有生成式推荐系统中缺乏显式和可控推理能力的问题。论文的动机在于利用大语言模型（LLMs）的生成能力，通过集成对话、推理和个性化推荐来改进推荐系统的解释性和准确性。方法上，OneRec-Think包括三个阶段：项目语义对齐、推理激活以及推理增强，特别是设计了一个推荐特定的奖励函数来捕捉用户偏好的多有效性特征。实验结果表明，该模型在多个公共基准上实现了最新的性能，并且在快手的工业部署中实现了APP停留时间提升0.159%，验证了模型显式推理能力的实际效用。

2.AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents

Authors: Mingdai Yang, Nurendra Choudhary, Jiangshu Du, Edward W. Huang, Philip S. Yu, Karthik Subbian, Danai Kourta

Affiliations: University of Illinois at Chicago; Amazon; University of Michigan

https://arxiv.org/abs/2510.05598

论文摘要

Recent agent-based recommendation frameworks aim to simulate user behaviors by incorporating memory mechanisms and prompting strategies, but they struggle with hallucinating non-existent items and full-catalog ranking. Besides, a largely underexplored opportunity lies in leveraging LLMs'commonsense reasoning to capture user intent through substitute and complement relationships between items, which are usually implicit in datasets and difficult for traditional ID-based recommenders to capture. In this work, we propose a novel LLM-agent framework, AgenDR, which bridges LLM reasoning with scalable recommendation tools. Our approach delegates full-ranking tasks to traditional models while utilizing LLMs to (i) integrate multiple recommendation outputs based on personalized tool suitability and (ii) reason over substitute and complement relationships grounded in user history. This design mitigates hallucination, scales to large catalogs, and enhances recommendation relevance through relational reasoning. Through extensive experiments on three public grocery datasets, we show that our framework achieves superior full-ranking performance, yielding on average a twofold improvement over its underlying tools. We also introduce a new LLM-based evaluation metric that jointly measures semantic alignment and ranking correctness.

论文简评: 这篇论文提出了一种新的生成式推荐系统框架AgentDR，旨在解决现有推荐系统在模拟用户行为时存在的虚构非真实物品和无法处理大规模物品排名的问题。作者提出通过结合大语言模型（LLM）的常识推理能力和传统推荐工具的可扩展性来增强推荐系统。AgentDR通过将完整排名任务委托给传统模型，并利用LLM进行个性化工具适应性整合和基于用户历史的替代和互补关系推理，从而提高推荐的相关性和准确性。在三个公开的杂货数据集上进行的实验表明，该框架在完整排名性能上取得了显著的提升，平均比其基础工具提高了两倍。此外，论文还引入了一种新的LLM评价指标，能够共同评估语义对齐和排序正确性。

3.Constraint-Aware Route Recommendation from Natural Language via Hierarchical LLM Agents

Authors: Tao Zhe, Rui Liu, Fateme Memar, Xiao Luo, Wei Fan, Xinyue Ye, Zhongren Peng, Dongjie Wang

Affiliations: University of Kansas; UW–Madison; University of Auckland; University of Alabama; University of Florida

https://arxiv.org/abs/2510.06078

论文摘要

Route recommendation aims to provide users with optimal travel plans that satisfy diverse and complex requirements. Classical routing algorithms (e.g., shortest-path and constraint-aware search) are efficient but assume structured inputs and fixed objectives, limiting adaptability to natural-language queries. Recent LLM-based approaches enhance flexibility but struggle with spatial reasoning and the joint modeling of route-level and POI-level preferences. To address these limitations, we propose RouteLLM, a hierarchical multi-agent framework that grounds natural-language intents into constraint-aware routes. It first parses user queries into structured intents including POIs, paths, and constraints. A manager agent then coordinates specialized sub-agents: a constraint agent that resolves and formally check constraints, a POI agent that retrieves and ranks candidate POIs, and a path refinement agent that refines routes via a routing engine with preference-conditioned costs. A final verifier agent ensures constraint satisfaction and produces the final route with an interpretable rationale. This design bridges linguistic flexibility and spatial structure, enabling reasoning over route feasibility and user preferences. Experiments show that our method reliably grounds textual preferences into constraint-aware routes, improving route quality and preference satisfaction over classical methods.

论文简评: 本论文探讨了如何通过自然语言为用户提供符合多样化和复杂需求的最优旅行计划。传统的路径推荐算法由于依赖结构化输入和固定目标，无法灵活适应自然语言查询。虽然基于大型语言模型（LLM）的最新方法增强了灵活性，但在空间推理和路径级别与兴趣点（POI）级别偏好建模方面存在困难。为了解决这些问题，本文提出了RouteLLM，这是一个分层多代理框架，能够将自然语言意图转化为约束感知路径。RouteLLM首先解析用户查询为包括POI、路径和约束的结构化意图。然后，管理代理协调专门的子代理来解决这些子任务，最终确保满足约束并生成带有可解释理由的最终路径。实验结果表明，该方法能够可靠地将文本偏好转化为约束感知的路径，提高了路径质量和偏好满足度。

4.CTRL-Rec: Controlling Recommender Systems With Natural Language

Authors: Micah Carroll, Adeline Foote, Kevin Feng, Marcus Williams, Anca Dragan, W. Bradley Knox, Smitha Milli

Affiliations: MATS; University of Washington; UC Berkeley; UT Austin; FAIR at Meta

https://arxiv.org/abs/2510.12742

论文摘要

When users are dissatisfied with recommendations from a recommender system, they often lack fine-grained controls for changing them. Large language models (LLMs) offer a solution by allowing users to guide their recommendations through natural language requests (e.g., "I want to see respectful posts with a different perspective than mine"). We propose a method, CTRL-Rec, that allows for natural language control of traditional recommender systems in real-time with computational efficiency. Specifically, at training time, we use an LLM to simulate whether users would approve of items based on their language requests, and we train embedding models that approximate such simulated judgments. We then integrate these user-request-based predictions into the standard weighting of signals that traditional recommender systems optimize. At deployment time, we require only a single LLM embedding computation per user request, allowing for real-time control of recommendations. In experiments with the MovieLens dataset, our method consistently allows for fine-grained control across a diversity of requests. In a study with 19 Letterboxd users, we find that CTRL-Rec was positively received by users and significantly enhanced users' sense of control and satisfaction with recommendations compared to traditional controls.

论文简评: CTRL-Rec通过将自然语言控制融入传统推荐系统，解决了用户无法细粒度控制推荐内容的问题。该方法利用大型语言模型模拟用户对项目的评价，并训练嵌入模型来近似这些模拟判断。实验表明，CTRL-Rec在MovieLens数据集上能够实现多样化请求的实时控制，并在Letterboxd用户研究中显著提升了用户对推荐系统的满意度和控制感，而不降低用户参与度。

5.MTRec: Learning to Align with User Preferences via Mental Reward Models

Authors: Mengchen Zhao, Yifan Gao, Yaqing Hou, Xiangyang Li, Pengjie Gu, Zhenhua Dong, Ruiming Tang, Yi Cai

Affiliations: South China University of Technology; Dalian University of Technology; Huawei Noah’s Ark Lab; Nanyang Technological University

https://arxiv.org/abs/2509.22807

论文摘要

Recommendation models are predominantly trained using implicit user feedback, since explicit feedback is often costly to obtain. However, implicit feedback, such as clicks, does not always reflect users' real preferences. For example, a user might click on a news article because of its attractive headline, but end up feeling uncomfortable after reading the content. In the absence of explicit feedback, such erroneous implicit signals may severely mislead recommender systems. In this paper, we propose MTRec, a novel sequential recommendation framework designed to align with real user preferences by uncovering their internal satisfaction on recommended items. Specifically, we introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it. The learned mental reward model is then used to guide recommendation models to better align with users' real preferences. Our experiments show that MTRec brings significant improvements to a variety of recommendation models. We also deploy MTRec on an industrial short video platform and observe a 7 percent increase in average user viewing time.

论文简评: 该论文提出了一种名为MTRec的推荐框架，旨在解决推荐系统与用户真实偏好不对齐的问题。由于显性用户反馈稀少，推荐系统通常依赖隐性反馈（如点击），但这些信号可能误导系统，未能反映用户的真实满意度。MTRec通过引入心理奖励模型来量化用户的内在满意度，并利用分布式逆向强化学习方法进行学习，指导推荐系统更好地对齐用户的真实偏好。实验结果表明，MTRec在多种推荐模型上显著提高了性能，并在实际短视频平台上线测试中观察到用户平均观看时间增加了7%。

6.Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations

Authors: Yihao Wu, Tianrui Wang, Yizhou Peng, Yi-Wen Chao, Xuyi Zhuang, Xinsheng Wang, Shunshun Yin, Ziyang Ma

Affiliations: Nanyang Technological University; Soul AI Lab

https://arxiv.org/abs/2510.02352

论文摘要

While biases in large language models (LLMs), such as stereotypes and cultural tendencies in outputs, have been examined and identified, their presence and characteristics in spoken dialogue models (SDMs) with audio input and output remain largely unexplored. Paralinguistic features, such as age, gender, and accent, can affect model outputs; when compounded by multi-turn conversations, these effects may exacerbate biases, with potential implications for fairness in decision-making and recommendation tasks. In this paper, we systematically evaluate biases in speech LLMs and study the impact of multi-turn dialogues with repeated negative feedback. Bias is measured using Group Unfairness Score (GUS) for decisions and similarity-based normalized statistics rate (SNSR) for recommendations, across both open-source models like Qwen2.5-Omni and GLM-4-Voice, as well as closed-source APIs such as GPT-4o Audio and Gemini-2.5-Flash. Our analysis reveals that closed-source models generally exhibit lower bias, while open-source models are more sensitive to age and gender, and recommendation tasks tend to amplify cross-group disparities. We found that biased decisions may persist in multi-turn conversations. This work provides the first systematic study of biases in end-to-end spoken dialogue models, offering insights towards fair and reliable audio-based interactive systems. To facilitate further research, we release the FairDialogue dataset and evaluation code.

论文简评: 这篇论文探讨了语音对话大模型（SDMs）中存在的偏见问题，尤其关注多轮对话中偏见的影响。尽管文本大模型的偏见已被广泛研究，语音对话模型由于涉及音频输入输出，其偏见评估仍不够充分。论文通过系统地评估开放源码和闭源模型在决策和推荐任务中的偏见，发现闭源模型通常表现出较低的偏见，而开放源码模型对年龄和性别更敏感。实验结果表明，偏见在多轮对话中可能持续存在。为促进进一步研究，作者发布了FairDialogue数据集和评估代码。

7.On Inherited Popularity Bias in Cold-Start Item Recommendation

Authors: Gregor Meehan, Johan Pauwel

Affiliations: Queen Mary University of London

https://arxiv.org/abs/2510.11402

论文摘要

Collaborative filtering (CF) recommender systems struggle with making predictions on unseen, or 'cold', items. Systems designed to address this challenge are often trained with supervision from warm CF models in order to leverage collaborative and content information from the available interaction data. However, since they learn to replicate the behavior of CF methods, cold-start models may therefore also learn to imitate their predictive biases. In this paper, we show that cold-start systems can inherit popularity bias, a common cause of recommender system unfairness arising when CF models overfit to more popular items, thereby maximizing user-oriented accuracy but neglecting rarer items. We demonstrate that cold-start recommenders not only mirror the popularity biases of warm models, but are in fact affected more severely: because they cannot infer popularity from interaction data, they instead attempt to estimate it based solely on content features. This leads to significant over-prediction of certain cold items with similar content to popular warm items, even if their ground truth popularity is very low. Through experiments on three multimedia datasets, we analyze the impact of this behavior on three generative cold-start methods. We then describe a simple post-processing bias mitigation method that, by using embedding magnitude as a proxy for predicted popularity, can produce more balanced recommendations with limited harm to user-oriented cold-start accuracy.
论文简评: 这篇论文探讨了在冷启动项目推荐中，生成模型如何继承监督模型的流行度偏见。动机在于解决协同过滤模型对流行项目的偏见问题，导致冷启动模型过度推荐与流行项目内容相似的新项目。研究通过分析三种生成冷启动方法在多个多媒体数据集上的表现，提出了一种简单的后处理偏见缓解方法，通过调整嵌入向量的幅度来平衡推荐的公平性，并在不显著影响用户准确性的情况下，改善了项目的推荐平衡性。

8.Do We Really Need SFT? Prompt-as-Policy over Knowledge Graphs for Cold-start Next POI Recommendation

Authors: Jinze Wang, Lu Zhang, Yiyang Cui, Zhishu Shen, Xingjun Ma, Jiong Jin, Tiehua Zhang

Affiliations: Swinburne University of Technology; Wuhan University of Technology; Chengdu University of Information Technology; Fudan University; Tongji University

https://arxiv.org/abs/2510.08012

论文摘要

Next point-of-interest (POI) recommendation is crucial for smart urban services such as tourism, dining, and transportation, yet most approaches struggle under cold-start conditions where user-POI interactions are sparse. Recent efforts leveraging large language models (LLMs) address this challenge through either supervised fine-tuning (SFT) or in-context learning (ICL). However, SFT demands costly annotations and fails to generalize to inactive users, while static prompts in ICL cannot adapt to diverse user contexts. To overcome these limitations, we propose Prompt-as-Policy over knowledge graphs, a reinforcement-guided prompting framework that learns to construct prompts dynamically through contextual bandit optimization. Our method treats prompt construction as a learnable policy that adaptively determines (i) which relational evidences to include, (ii) the number of evidence per candidate, and (iii) their organization and ordering within prompts. More specifically, we construct a knowledge graph (KG) to discover candidates and mine relational paths, which are transformed into evidence cards that summarize rationales for each candidate POI. The frozen LLM then acts as a reasoning engine, generating recommendations from the KG-discovered candidate set based on the policy-optimized prompts. Experiments on three real-world datasets demonstrate that Prompt-as-Policy consistently outperforms state-of-the-art baselines, achieving average 7.7% relative improvements in Acc@1 for inactive users, while maintaining competitive performance on active users, without requiring model fine-tuning.

论文简评: 这篇论文探讨了冷启动条件下的下一个兴趣点（POI）推荐问题，传统方法在用户与POI交互稀疏的情况下表现不佳。作者提出了一种名为Prompt-as-Policy的框架，通过知识图谱和上下文多臂老虎机优化动态构建提示，而不是依赖于昂贵的有监督微调（SFT）或静态提示。实验结果表明，该方法在不需要模型微调的情况下，在不同活跃度的用户组中，特别是在不活跃用户中，性能优于现有的最先进基线。

9.Reinforced Preference Optimization for Recommendation

Authors: Junfei Tan, Yuxin Chen, An Zhang, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Xiang Wang

Affiliations: Taobao & Tmall Group of Alibaba; National University of Singapore

https://arxiv.org/abs/2510.12211

论文摘要

Recent breakthroughs in large language models (LLMs) have fundamentally shifted recommender systems from discriminative to generative paradigms, where user behavior modeling is achieved by generating target items conditioned on historical interactions. Yet current generative recommenders still suffer from two core limitations: the lack of high-quality negative modeling and the reliance on implicit rewards. Reinforcement learning with verifiable rewards (RLVR) offers a natural solution by enabling on-policy sampling of harder negatives and grounding optimization in explicit reward signals. However, applying RLVR to generative recommenders remains non-trivial. Its unique generation space often leads to invalid or repetitive items that undermine sampling efficiency, and ranking supervision is sparse since most items receive identical zero rewards. To address these challenges, we propose Reinforced Preference Optimization for Recommendation (ReRe), a reinforcement-based paradigm tailored to LLM-based recommenders, an important direction in generative recommendation. ReRe incorporates constrained beam search to improve sampling efficiency and diversify hard negatives, while augmenting rule-based accuracy rewards with auxiliary ranking rewards for finer-grained supervision. Extensive experiments on three real-world datasets demonstrate that ReRe consistently outperforms both traditional and LLM-based recommenders in ranking performance. Further analysis shows that ReRe not only enhances performance across both base and SFT-initialized models but also generalizes robustly across different backbone families and scales. Beyond empirical gains, we systematically investigate the design space of RLVR in recommendation across generation, sampling strategy, reward modeling, and optimization algorithm, offering insights for future research.

论文简评: 这篇论文提出了ReRe（Reinforced Preference Optimization for Recommendation），一种专为基于大语言模型（LLM）的推荐系统设计的强化学习范式。当前生成式推荐系统面临高质量负样本建模不足和隐性奖励依赖的问题，ReRe通过约束束搜索提高采样效率，增加负样本多样性，并通过规则准确性奖励和辅助排序奖励提供更细致的监督。实验表明，ReRe在三个真实世界数据集上的排序性能优于传统和LLM推荐系统，并能在不同模型和规模上稳健泛化。该研究为未来的推荐系统设计提供了有价值的见解。

10.Differentiable Fast Top-K Selection for Large-Scale Recommendation

Authors: Yanjie Zhu, Zhen Zhang, Yunli Wang, Zhiqiang Wang, Yu Li, Rufan Zhou, Shiyang Wen, Peng Jiang, Chenhao Lin, Jian Yang

Affiliations: Xi’an Jiaotong University; Kuaishou Technology; M-A-P

https://arxiv.org/abs/2510.11472

论文摘要

Cascade ranking is a widely adopted paradigm in large-scale information retrieval systems for Top-K item selection. However, the Top-K operator is non-differentiable, hindering end-to-end training. Existing methods include Learning-to-Rank approaches (e.g., LambdaLoss), which optimize ranking metrics like NDCG and suffer from objective misalignment, and differentiable sorting-based methods (e.g., ARF, LCRON), which relax permutation matrices for direct Top-K optimization but introduce gradient conflicts through matrix aggregation. A promising alternative is to directly construct a differentiable approximation of the Top-K selection operator, bypassing the use of soft permutation matrices. However, even state-of-the-art differentiable Top-K operator (e.g., LapSum) require complexity due to their dependence on sorting for solving the threshold. Thus, we propose DFTopK, a novel differentiable Top-K operator achieving optimal time complexity. By relaxing normalization constraints, DFTopK admits a closed-form solution and avoids sorting. DFTopK also avoids the gradient conflicts inherent in differentiable sorting-based methods. We evaluate DFTopK on both the public benchmark RecFLow and an industrial system. Experimental results show that DFTopK significantly improves training efficiency while achieving superior performance, which enables us to scale up training samples more efficiently. In the online A/B test, DFTopK yielded a +1.77% revenue lift with the same computational budget compared to the baseline. To the best of our knowledge, this work is the first to introduce differentiable Top-K operators into recommendation systems and the first to achieve theoretically optimal linear-time complexity for Top-K selection. We have open-sourced our implementation to facilitate future research in both academia and industry.
论文简评: 本文提出了一种新的可微分Top-K选择算子DFTopK，用于大规模推荐系统中的级联排序。传统的Top-K操作符不可微，使得端到端训练困难，而现有方法存在梯度冲突和复杂度高的问题。DFTopK通过放松归一化约束，实现了最优的O(n)时间复杂度，避免了排序操作和梯度冲突。在RecFlow基准和实际工业系统的实验中，DFTopK显著提高了训练效率，并在相同计算预算下在线A/B测试中实现了1.77%的收入提升。