

30秒出NeurIPS级插图：NanoBananaPro科研绘图全流程实测

跨境大白

2025-11-26

导读：别再手搓Visio了

MLNLP社区是国内外知名的机器学习与自然语言处理社区，受众覆盖国内外NLP硕博生、高校老师以及企业研究人员。

社区的愿景是促进国内外自然语言处理，机器学习学术界、产业界和广大爱好者之间的交流和进步，特别是初学者同学们的进步。

来源 | PaperWeekly

当所有人都在用 Nano Banana Pro 生成网红图时，我用它干了件正经事：搞科研。

结果？它画的方法图，直接把我过去三年的手搓水平按在地上摩擦。

我们都知道，科研绘图是时间的黑洞。

逻辑图对不齐、模块越画越乱、配色土得掉渣……Poster 做了一整天，导师看一眼眉头紧锁。

这种“代码能跑通，图却画不动”的日子，今天可以翻篇了。

我让 Nano Banana Pro 覆盖了科研绘图里最关键的三个场景：

方法图（Method）：考逻辑结构是否清晰
实验图（Experiment）：看数据呈现是否专业
示意图（Concept / Idea Figure）：看抽象内容能否被直观表达

测完我只想说，这是对传统绘图工具的降维打击。

它的出图水准，就是奔着 NeurIPS、ICLR 接收标准去的。

文末给大家扒出了一套「万能科研绘图 Prompt」，不想看测评的可以直接滑到底部抄作业。

第一个测试，我直接上了硬菜：Cambrian-S。

这篇论文可谓大佬云集（LeCun + 李飞飞 + 谢赛宁），但方法部分偏偏少了一张全局架构总览图。

文本怎么进？视觉怎么融？Mamba-Transformer 主干怎么接？全靠脑补。

我把论文里的方法描述拆，按顺序拆成了一份结构化模块清单，直接丢给 NBP，连草图都没给。

MAIN ARCHITECTURE (from the Method section):

1. Inputs:
   - Image frames I ∈ ℝ^{H×W×3}
   - Instruction prompt p

2. Encoders:
   - Image Encoder:
       • Extracts visual feature map F_i from input images.
   - Text Encoder:
       • Tokenizes prompt p into embeddings T ∈ ℝ^{T_p×D}

3. Feature Projection & Fusion:
   - Visual Feature Projector:
       • Projects F_i into V ∈ ℝ^{T_v×D}
   - Multi-Modal Mixer:
       • Concatenates V and T into Z ∈ ℝ^{(T_v+T_p)×D}
       • Applies mixer layers to unify modalities

4. Core Backbone:
   - Transformer Stack (L layers)
       • Each layer contains:
           – Multi-Head Self-Attention (MHSA)
           – Feed-Forward Network (FFN)
           – Residual + LayerNorm

5. Multi-Scale Routing Module:
   - Occurs at predefined stages s₁ and s₂
   - Token routing:
       • Split Z into Active Tokens and Idle Tokens
       • Only Active Tokens pass through deeper layers
       • Idle Tokens are temporarily held

   - Merge Unit:
       • Idle Tokens rejoin Active Tokens after deeper blocks

6. Memory Retrieval Module:
   - Memory Bank M ∈ ℝ^{N_m×D}
   - Query generation: Q = Z_q W_q
   - Key matching: attention weights = softmax(Q Mᵀ)
   - Retrieval: R = weighted sum of memory vectors
   - Fusion: Z ← Z + R (before block s₃)

7. Output Head:
   - Task-specific head depending on target task:
       • token outputs O ∈ ℝ^{T_o×D}
       • or class logits

DATA FLOW:
Images → Image Encoder → Projector → V  
Prompt → Text Encoder → T  
V + T → Mixer → Z  
Z → Transformer + Routing + Memory Retrieval → Output Head → Final output

30 秒后，我愣住了。

这不仅是画对了，它是画“懂”了。

Multi-modal Mixer 的双流输入，Memory Retrieval 的层级结构，Active/Idle Routing 的分叉路径……

它不仅把逻辑理顺了，还自动匹配了那种“清爽、扁平、克制”的顶会审美。

不用调线宽，不用对齐网格。你只要给它逻辑，它就还你专业。

搞定了架构图，第二关我直接试论文里最抽象的那类——概念图。

这种图最难画。画得太实像说明书，画得太虚像玄学。

我用 Cambrian-S 最经典的 Figure 1（五阶段认知框架）做测试。

从语言到空间，再到世界模型。这种抽象概念，人类画都要构思半天。

我把这张图的结构逻辑完全写进 Prompt，让 NBP 按结构复刻、按风格提升。

这就是传说中的像样。低饱和度的配色（Pastel Color），干净的间距，还有底部那个 3D 视频长廊的空间感。

以前这种图我要找专业设计师，现在 NBP 居然 30 秒一次成型。

如果你的论文正缺一张镇楼图，可以考虑试试它。

前两个我还能理解，但实验数据的折线图（Plot），它能画准吗？

我拿 Mamba-3 的 Figure 3 试了试水。只给了坐标轴含义、数据点和模型名称。

结果再次打脸：

扁平化、不加玻璃反光、不加渐变、线宽统一、色卡克制。坐标轴标注清晰，不抢戏。甚至比原论文更整洁、更专业。

这不是“能看”，这是“能投”级别的。

就连最烦人的大表格可视化，我把 Mamba-3 的 Table 3 数据扔进去：

它反手就甩给我一张清晰的柱状图：

配色稳、比例准，完全符合顶会图表规范。

以前这种图我得用 Matplotlib 调半小时色卡，现在只要 30 秒。

📝 抄作业时间：万能 Prompt

测试下来，我发现 NBP 的核心逻辑是：你负责逻辑（Text），它负责审美（Visuals）。

为了让大家少走弯路，我总结了一个「万能科研绘图 Prompt 模板」。

你只需要把论文内容按顺序填进以下结构，方法、流程、实验、表格，各种图都能照这个模板衍生。

You are an expert ML illustrator.
Draw a clean, NeurIPS/ICLR-style scientific figure using Nano Banana Pro.

GOAL:
Create a professional, publication-quality diagram that exactly follows the
structure and logic provided in the MODULE LIST below.
Do not invent components, do not reinterpret, do not add creativity.
Strictly follow the logical flow.

GLOBAL RULES:

- Flat, clean NeurIPS style (no gradients, no gloss, no shadows)
- Consistent thin line weights
- Professional pastel palette
- Rounded rectangles for blocks
- Arrows must clearly indicate data flow
- No long sentences, only short labels
- Keep spacing clean and balanced
- All modules must appear exactly once unless specified

LAYOUT:

- Horizontal left → right layout (recommended)
- Or vertical top → bottom if modules are inherently sequential
- Align components cleanly in straight lines
- Respect the module order exactly as listed

MODULE LIST (FILL THIS WITH YOUR PAPER'S CONTENT):

1. Input(s):
   - [Your input items]

2. Preprocessing / Encoding / Embedding:
   - [Your modules]

3. Core Architecture / Stages / Blocks:
   - [Your modules in exact order]

4. Special Mechanisms (optional):
   - [Attention / memory / routing / dynamic paths]

5. Output Head:
   - [Your output block]

NOTES (Optional but useful):

- Specify any required two-branch or multi-branch flow
- Specify “A and B must merge here”
- Specify “keep this as a single tall block with submodules”
- If experimental plot → replace section above with structured numbers

STYLE REQUIREMENTS:

- NeurIPS 2024 visual tone
- Very light background
- Text left-aligned inside blocks
- Arrows short and clean
- Use consistent vertical spacing

Generate the final diagram.

如果你正在被 Deadline 追杀，或者对着 Rebuttal 里的修图意见抓狂，强烈建议你去试一试。

试过你就知道，它帮你省下的不只是时间，是命。

科研已经很难了，画图这种事，就交给 AI 吧。

技术交流群邀请函

△长按添加小助手

扫描二维码添加小助手微信

请备注：姓名-学校/公司-研究方向

（如：小张-哈工大-对话系统）

即可申请加入自然语言处理/Pytorch等技术交流群

关于我们

MLNLP 社区是由国内外机器学习与自然语言处理学者联合构建的民间学术社区，目前已经发展为国内外知名的机器学习与自然语言处理社区，旨在促进机器学习，自然语言处理学术界、产业界和广大爱好者之间的进步。

社区可以为相关从业者的深造、就业及研究等方面提供开放交流平台。欢迎大家关注和加入我们。

【声明】内容源于网络

跨境大白

跨境分享社 | 持续输出跨境知识

内容 45144

粉丝 0

跨境大白跨境分享社 | 持续输出跨境知识

总阅读277.7k

粉丝0

内容45.1k