大数跨境
0
0

Large language models for patch review

Large language models for patch review 跨境人老刘
2025-10-17
3
导读:拿着提示词先嗨起来
October 16, 2025 

TL;DR
自由软件社区正讨论如何利用大型语言模型(LLM)辅助内核开发。Chris Mason 提议用 LLM 审查补丁以减轻维护者负担,并编写了模拟邮件式评审提示集。多数开发者支持该思路,认为比生成代码更安全,但有人担忧专有模型带来锁定风险。讨论聚焦工具应由维护者还是提交者使用,以及如何整合进审核流程。Linus 认可潜力但警惕误判与工作量增加。总体看,LLM 审查工具或将成为内核开发流程的一部分。

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!  

订阅 LWN 的主要好处是帮助我们保持发布,但除此之外,订阅者还可以立即访问所有网站内容和许多额外的网站功能。请今天注册!

Reference:
  • Subscribe to LWN, https://lwn.net/subscribe/

  • [MAINTAINERS / KERNEL SUMMIT] AI patch review tools

  • https://github.com/masoncl/review-prompts

  • https://github.com/masoncl/review-prompts/blob/main/review-core.md

  • https://github.com/masoncl/review-prompts/blob/main/locking.md

  • https://github.com/masoncl/review-prompts/blob/main/networking.md

  • https://en.wikipedia.org/wiki/Fifth_Generation_Computer_Systems

  • https://github.com/masoncl/review-prompts/blob/main/README.md

  • On the use of LLM assistants for kernel development

  • The kernel and BitKeeper part ways, https://lwn.net/Articles/130746/

  • https://www.intel.com/content/www/us/en/developer/topic-technology/open/linux-kernel-performance/overview.html

  • Fighting the AI scraperbot scourge, https://lwn.net/Articles/1008897/

  • https://events.linuxfoundation.org/linux-kernel-maintainer-summit/



There have been many discussions in the free-software community about the role of large language models (LLMs) in software development. For the most part, though, those conversations have focused on whether projects should be accepting code output by those models, and under what conditions. But there are other ways in which these systems might participate in the development processChris Mason recently started a discussion on the Kernel Summit discussion list about how these models can be used to review patches, rather than create them.


在自由软件社区中,关于大型语言模型(LLMs)在软件开发中作用的讨论很多。虽然,但这些对话大多集中在项目是否应该接受这些模型的代码输出,以及在什么条件下。但这些系统还可以以其他方式参与开发过程。Chris Mason 最近在内核峰会讨论列表上发起了一个讨论,关于如何使用这些模型来审查补丁,而不是创建它们。


Mason's focus was on how LLMs might reduce the load on kernel maintainers by catching errors before they hit the mailing lists, and by helping contributors increase the quality of their submissions. To that end, he has put together a set of prompts that will produce reviews in a format that maintainers are used to: "The reviews are meant to look like emails on lkml, and even when wildly wrong they definitely succeed there". He included a long list of sample reviews, some of which hit the mark and others of which did not.


Mason 的关注点在于 LLMs 如何通过在错误到达邮件列表之前捕获它们来减轻内核维护者的负担,并通过帮助贡献者提高他们提交的质量。为此,他准备了一套提示,这些提示将产生维护者习惯的格式的审查:"审查的目的是看起来像 lkml 上的邮件,即使它们完全错误,也绝对在那里成功"。他包括了一个长列表的样本审查,其中一些击中了目标,而另一些则没有。


The prompts are interesting in their own right; they can be seen as constituting the sort of comprehensive patch-review documentation that nobody ever quite got around to writing for humans to use. Perhaps that reflects a higher level of confidence that the LLM will actually read all of this material. These prompts add up to thousands of lines of material, starting with core guidance like:


这些提示本身就很有趣;它们可以被看作是构成了一种面向人类的全面补丁审查文档,但似乎从未有人真正去编写。也许这反映了人们对 LLM 能够实际阅读所有这些材料的更高信心。这些提示加起来有数千行材料,从核心指导开始,例如:


Struct changes → verify all users use the new struct correctly

结构变化 → 验证所有用户正确使用新结构


Public API changes → verify documentation updates [...]

公共 API 变更 → 验证文档更新 [...]


Tone Requirements:  


  • Conversational: Target kernel experts, not beginners

    对话式:目标为内核专家,而非初学者

  • Factual: No drama, just technical observations

    事实性:没有戏剧性,只是技术性观察

  • Questions: Frame as questions about the code, not accusations

    问题式:将问题表述为关于代码的疑问,而非指控


Most of the prompts consist of guidance specific to subsystems like locking ("You're not smart enough to understand smp_mb(), smp_rmb(), or smp_wmb() bugs yet") and networking ("Socket can outlive its file descriptor"). All told, it resembles the sort of rule collection one saw in the expert systems that were going to take over the world in the 1980s. As noted in the README file, "the false positive rate is pretty high right now, at ~50%", so there is still some room for improvement.


大多数提示包含针对特定子系统的指导,例如锁定("你还没有足够聪明去理解 smp_mb ()、smp_rmb () 或 smp_wmb () 的错误")和网络("套接字可以比其文件描述符存活更久")。总而言之,这类似于 1980 年代那些旨在统治世界的专家系统中的规则集合。正如 README 文件中提到的,"当前的误报率相当高,约为 50%",因此仍有改进的空间。


In the ensuing discussion, nobody seemed to think that using LLMs in this way was a bad idea. Sasha Levin called it "a really great subject to discuss", and said that, in the previous discussions on LLM use by kernel developers, the concerns that were raised about LLMs drowned out out any attempt to find the places where they could be useful. Paul McKenney remarked that using this technology to review code written by others "seems much safer than using it to generate actual code". Krzysztof Kozlowski noted that Qualcomm has created a similar system and made it available.


在接下来的讨论中,似乎没有人认为以这种方式使用 LLMs 是个坏主意。Sasha Levin 称其为 “一个真正值得讨论的主题”,并说,在内核开发人员之前关于 LLM 使用的讨论中,人们提出的关于 LLMs 的担忧淹没了任何试图找到它们可能有用之处的尝试。Paul McKenney 评论说,使用这项技术来审查他人编写的代码 “似乎比用它来生成实际代码要安全得多”。Krzysztof Kozlowski 指出,高通已经创建了一个类似的系统并将其公开。


There were some concerns raised about the proprietary nature of these systems; Konstantin Ryabitsev was just one of a few who drew parallels with the BitKeeper experience that (briefly) brought kernel development to a halt just over 20 years ago. Laurent Pinchart stated clearly that there are limits to how much proprietary tools can be used or required:


有人对这些系统的专有性质表示了担忧;Konstantin Ryabitsev 只是少数几个将之与 BitKeeper 经历相提并论的人之一,那次经历(短暂地)使内核开发停顿, 在20 多年前。Laurent Pinchart 明确表示,专有工具的使用或要求存在限制:


Forcing contributors to pay for access to proprietary tools is not acceptable. Forcing contributors to even run proprietary tools is not acceptable.


强迫贡献者为使用专有工具付费是不可接受的。强迫贡献者运行专有工具也是不可接受的。


He also expressed concerns that the companies behind LLMs would make them available to developers for free to encourage adoption — until the community is well locked in, at which point access could quickly become expensive. Mason, though, was unworried about lock-in, saying that the prompts are sufficiently generic to be adaptable to any systemJames Bottomley suggested that LLMs would not be proprietary forever, but Pinchart argued against relying on proprietary software in the hope that there will eventually be free alternatives.


他也表达了对 LLM 背后公司可能会免费向开发者提供以鼓励采用的关注 —— 直到社区完全锁定,届时访问可能会迅速变得昂贵。然而,梅森并不担心锁定问题,他说提示词足够通用,可以适应任何系统。詹姆斯・博特利建议 LLM 不会永远专有,但平卡尔特反对依赖专有软件,希望最终会出现免费替代品。


There was some disagreement over who an LLM-based review tool should be created for. Mason's target was maintainers, but Andrew Lunn argued that the plan should be for developers to run these tools themselves before posting code for review. That, he said, would further reduce the workload on maintainers, who would only need to run LLM review to verify the the submitter had already done so.


对于应该为谁创建基于 LLM 的审查工具存在一些分歧。梅森的目标是维护者,但安德鲁・伦认为计划应该是开发者在发布代码进行审查之前自己运行这些工具。他说,这将进一步减少维护者的工作量,他们只需要运行 LLM 审查来验证提交者已经完成。


Pinchart, along with others, pointed out that getting developers to use the tools (such as checkpatch.pl) that exist now is difficult; he wondered how submitters could be encouraged to run any new tools. Tim Bird suggested annotating patches with a list of the tools that have been run on them so that maintainers could see that history. Bottomley, instead, said that these tools should be run automatically on patches sent to the mailing lists, much like the checks that the 0day robot runs on posted patches now. Bird, though, said that running the tools should be expected of submitters. "It then becomes a cost for the contributor instead of the upstream community, which is going to scale better."


Pinchart 等人指出,让开发者使用现有的工具(例如 checkpatch.pl )很困难;他想知道如何鼓励提交者运行任何新工具。Tim Bird 建议在补丁中标注已运行的工具列表,以便维护者可以看到这些历史记录。Bottomley 则建议,应该自动在发送到邮件列表的补丁上运行这些工具,就像 0day 机器人现在在已发布的补丁上运行的检查一样。然而,Bird 表示,应该期望提交者运行这些工具。“这样一来,成本就变成了贡献者而不是上游社区,这将更容易扩展。”


Mason was clear in his belief that LLM-generated reviews should happen in public as part of the submission process:

Mason 明确表示,LLM 生成的评审应该在提交过程中作为公开的一部分进行:


I think it's also important to remember that AI is sometimes wildly wrong. Having the reviews show up on a list where more established developers can call bullshit definitely helps protect against wasting people's time.


我认为也要记住,人工智能有时会完全错误。评审出现在一个可以由更成熟的开发者叫板胡说八道的列表上,这确实有助于防止浪费时间。


Linus Torvalds, in his one contribution to the discussion, agreed. He was about the only one to express concerns about the technology, saying "I think we've all seen the garbage end of AI, and how it can generate more work rather than less". Mason agreed that Torvalds's concerns were relevant, based on his own experience:


Linus Torvalds 在他的讨论贡献中表示同意。他是唯一一个对这项技术表示担忧的人,他说:“我认为我们都见过人工智能的垃圾结果,以及它如何产生更多而不是更少的工作”。Mason 同意 Torvalds 的担忧是相关的,这是基于他自己的经验:


My first prompts told AI to assume the patches had bugs, and it would consistently just invent bugs. That's not the end of the world, but the explanations are always convincing enough that you'd waste a bunch of time tracking it down.


我的第一个提示告诉人工智能假设补丁有错误,它就会一直编造错误。这并不是世界末日,但解释总是足够有说服力,以至于你会浪费大量时间来追踪它。


Torvalds mentioned the scraper problem as well. His concerns notwithstanding, he believes that this technology will prove helpful, but he feels that its initial adoption has to be aimed at making life easier for maintainers. "So I think that only once any AI tools are actively helping maintainers in a day-to-day workflow should people even *look* at having non-maintainers use them".


Torvalds 也提到了抓取器问题。尽管他有所担忧,但他相信这项技术将证明是有帮助的,但他认为其初始采用必须旨在让维护者生活更轻松。“所以我认为只有当任何人工智能工具在日常工作流程中积极帮助维护者时,人们才应该考虑让非维护者使用它们”。


The conversation wound down shortly after that. One clear conclusion, though, is that these tools seem destined to play an increasing role in the kernel-development process. At some point, we will likely start seeing machine-generated reviews showing up on the mailing lists; then, perhaps, the real value of LLM-based patch-review tools will start to become clear. It will be interesting to see how the inevitable related discussion at the 2025 Maintainer Summit in December plays out.


对话很快就结束了。但有一个明确的结论,那就是这些工具似乎注定要在内核开发过程中发挥越来越重要的作用。在某个时刻,我们可能会开始在邮件列表上看到机器生成的评论;然后,也许基于 LLM 的补丁审查工具的真正价值才会开始变得清晰。看到 2025 年 12 月维护者峰会上不可避免的相关讨论如何展开,将是一件有趣的事情。

【声明】内容源于网络
0
0
跨境人老刘
跨境分享录 | 长期输出专业干货
内容 40156
粉丝 3
跨境人老刘 跨境分享录 | 长期输出专业干货
总阅读254.7k
粉丝3
内容40.2k