大数跨境
0
0

可灵,退钱!...等等,视频生成这么强?

可灵,退钱!...等等,视频生成这么强? 刀哥聊AI
2025-12-04
6
双十一,我做了一次剁手的冲动消费。。。

当时我还沾沾自喜,万年不打折的可灵,竟然全场会员年卡5折!我第一时间冲了黄金会员,还以为捡到大便宜了。

然后就没有然后了,买完之后基本没用过,11月20号NBP(Nano Banana Pro,以下都简称NBP)发布后,可灵的生图直接残废报废,即使它史上最强的O1模型上线也救不了。不信咱们来一起看看。

下边的测试,主要是在设计Agent平台 Lovart上 :https://lovart.ai
视频一小部分是在可灵官网 :https://app.klingai.com/
可灵O1做图  VS  Nano Banana Pro做图
同样的拆万物提示词,左边可灵,右边NBP,你说用谁?

明明告诉它,一朵被完全解构的【牡丹】,镊子位置固定在画面外缘附近,但不遮挡任何花瓣或标签。可灵就是做不到啊 ,还是用它最先进的O1。

增强的推理、世界知识和实时信息生成更准确、具上下文丰富性的视觉内容。NBP胜。

再比两次
🧙
制作一个复古、1950年代风格的信息图,介绍中国餐馆的历史。确保所有文字清晰可读,并以符合当时的风格。

可灵连餐都写不对,乱码,莫名其妙的字,重复字。NBP呢?图像中直接生成更好、更准确且易读的多语言文本,中文也可以很好支持,又胜。
可能你觉得我在欺负人,那么我拿官网(可灵 O1 - 图片 O1 使用指南 - 轻雀文档)的经典例子来测试,多物品合成
这里我没有使用它的物品库,我怕它训练过。从我电脑里选了五张图




可灵用起来还是很方便的。结果呢?左边可灵右边NBP

可以说可灵O1生成的人物跟我原始人物比,除了卷发,没有什么关系。很萌的是小猫戴上了帽子。。。

而NBP很强地保留了人物特征,做到了切水果动作,虽然还差了件衣服。不过我们用Lovart的Touch Edit功能,可以轻松搞定。。
选中黑色衣服,再上传狐灵女子,让它换衣,很快就得到我们想要的图:
你看看,在Lovart用NanoBanana Pro,绝对 SOTA 的顶级图片模型,而且和Lovart的画布配合起来简直天衣无缝,如虎添翼。

你看我只需要利用画布的“上下文”,准确感知我mark的点位是什么内容、画面、元素,做出精准的选区和识别。一边指一边说,模型就帮我把事做完了。
可灵的,说实话我真不知道该怎么修了:
为什么文生图差距这么大呢?原因其实很简单,可灵有拿得出手的文本大模型吗?没有。NBP可是用最先进的Gemini3啊。站在巨人肩膀上的NBP,虽然有几个缺点,依然是超神的文生图存在。

而且费用也是个大问题,在可灵官网,生图一张2个灵感,我用Lovart会员免费用NBP。可灵的积分做10个视频,生300张图就没了,Lovart的积分我都花不完,真是今年最划算的AI消费。

而且,年卡会员0积分无限子弹你敢信?做4K图都是免费!拿最近爆火的九宫格电影来说,输入超长提示词(不用管内容,直接复制就能用):
🏖️
<role>
You are an award-winning trailer director + cinematographer + storyboard artist. Your job: turn ONE reference image into a cohesive cinematic short sequence, then output AI-video-ready keyframes.
</role>

<input>
User provides: one reference image (image).
</input>

<non-negotiable rules - continuity & truthfulness>
  1. First, analyze the full composition: identify ALL key subjects (person/group/vehicle/object/animal/props/environment elements) and describe spatial relationships and interactions (left/right/foreground/background, facing direction, what each is doing).
  2. Do NOT guess real identities, exact real-world locations, or brand ownership. Stick to visible facts. Mood/atmosphere inference is allowed, but never present it as real-world truth.
  3. Strict continuity across ALL shots: same subjects, same wardrobe/appearance, same environment, same time-of-day and lighting style. Only action, expression, blocking, framing, angle, and camera movement may change.
  4. Depth of field must be realistic: deeper in wides, shallower in close-ups with natural bokeh. Keep ONE consistent cinematic color grade across the entire sequence.
  5. Do NOT introduce new characters/objects not present in the reference image. If you need tension/conflict, imply it off-screen (shadow, sound, reflection, occlusion, gaze).
</non-negotiable rules - continuity & truthfulness>

<goal>
Expand the image into a 10–20 second cinematic clip with a clear theme and emotional progression (setup → build → turn → payoff).
The user will generate video clips from your keyframes and stitch them into a final sequence.
</goal>

<step 1 - scene breakdown>
Output (with clear subheadings):
  • Subjects: list each key subject (A/B/C…), describe visible traits (wardrobe/material/form), relative positions, facing direction, action/state, and any interaction.
  • Environment & Lighting: interior/exterior, spatial layout, background elements, ground/walls/materials, light direction & quality (hard/soft; key/fill/rim), implied time-of-day, 3–8 vibe keywords.
  • Visual Anchors: list 3–6 visual traits that must stay constant across all shots (palette, signature prop, key light source, weather/fog/rain, grain/texture, background markers).
</step 1 - scene breakdown>

<step 2 - theme & story>
From the image, propose:
  • Theme: one sentence.
  • Logline: one restrained trailer-style sentence grounded in what the image can support.
  • Emotional Arc: 4 beats (setup/build/turn/payoff), one line each.
</step 2 - theme & story>

<step 3 - cinematic approach>
Choose and explain your filmmaking approach (must include):
  • Shot progression strategy: how you move from wide to close (or reverse) to serve the beats
  • Camera movement plan: push/pull/pan/dolly/track/orbit/handheld micro-shake/gimbal—and WHY
  • Lens & exposure suggestions: focal length range (18/24/35/50/85mm etc.), DoF tendency (shallow/medium/deep), shutter “feel” (cinematic vs documentary)
  • Light & color: contrast, key tones, material rendering priorities, optional grain (must match the reference style)
</step 3 - cinematic approach>

<step 4 - keyframes for AI video (primary deliverable)>
Output a Keyframe List: default 9–12 frames (later assembled into ONE master grid). These frames must stitch into a coherent 10–20s sequence with a clear 4-beat arc.
Each frame must be a plausible continuation within the SAME environment.

Use this exact format per frame:

[KF# | suggested duration (sec) | shot type (ELS/LS/MLS/MS/MCU/CU/ECU/Low/Worm’s-eye/High/Bird’s-eye/Insert)]
  • Composition: subject placement, foreground/mid/background, leading lines, gaze direction
  • Action/beat: what visibly happens (simple, executable)
  • Camera: height, angle, movement (e.g., slow 5% push-in / 1m lateral move / subtle handheld)
  • Lens/DoF: focal length (mm), DoF (shallow/medium/deep), focus target
  • Lighting & grade: keep consistent; call out highlight/shadow emphasis
  • Sound/atmos (optional): one line (wind, city hum, footsteps, metal creak) to support editing rhythm
Hard requirements:
  • Must include: 1 environment-establishing wide, 1 intimate close-up, 1 extreme detail ECU, and 1 power-angle shot (low or high).
  • Ensure edit-motivated continuity between shots (eyeline match, action continuation, consistent screen direction / axis).
</step 4 - keyframes for AI video>

<step 5 - contact sheet output (MUST OUTPUT ONE BIG GRID IMAGE)>
You MUST additionally output ONE single master image: a Cinematic Contact Sheet / Storyboard Grid containing ALL keyframes in one large image.
  • Default grid: 3x3. If more than 9 keyframes, use 4x3 or 5x3 so every keyframe fits into ONE image.
Requirements:
  1. The single master image must include every keyframe as a separate panel (one shot per cell) for easy selection.
  2. Each panel must be clearly labeled: KF number + shot type + suggested duration (labels placed in safe margins, never covering the subject).
  3. Strict continuity across ALL panels: same subjects, same wardrobe/appearance, same environment, same lighting & same cinematic color grade; only action/expression/blocking/framing/movement changes.
  4. DoF shifts realistically: shallow in close-ups, deeper in wides; photoreal textures and consistent grading.
  5. After the master grid image, output the full text breakdown for each KF in order so the user can regenerate any single frame at higher quality.
</step 5 - contact sheet output>

<final output format>
Output in this order:
A) Scene Breakdown
B) Theme & Story
C) Cinematic Approach
D) Keyframes (KF# list)
E) ONE Master Contact Sheet Image (All KFs in one grid)
</final output format>
看看效果,太酷辣!
Nano Banana Pro已经进化出了700多种用法,感兴趣的朋友,后台回复【NBP】,获取37个分类的720个绝妙提示词。
可灵O1做视频
本来充钱是冲着可灵的封闭去的,作为国内最强大模型,在其他集成平台上,要么用不到可灵,要么只能用到旧模型,想用最新的?只能买可灵会员。

但这次,Lovart居然把可灵最新的O1集成进来了!做视频这块儿,确实可灵O1是超神般的存在
同一个提示词:
🧙
图一女人戴上图三帽子,和图二的女人打起来,抢苹果吃,小猫在旁边跟她们说别打了,别打了,我再去摘个苹果给你们

可灵O1基本上完美呈现:
Veo3完败,这是在过家家?

谷歌NBP引以为傲的物理啊、理解能力啊 ,在视频这块儿就完全不行了。为什么呢?

因为 文生图是在“背诵照片”,而 文生视频是在“模拟现实”。
文生图像是一个极具天赋的画家,凭记忆画出一瞬间的画面,为了构图美观可以牺牲物理逻辑。

文生视频像是一个初级的物理学家,它必须通过计算前一秒的状态来推导后一秒的状态,物理规律是它生存(降低 Loss)的必要手段。

光物理能力强还不够,可灵的王炸级更新则是可以直接改视频
我们先做一个女子打台球的视频
然后神级操作来了,直接选择刚才生成的视频,改成爱因斯坦在草原打台球
用嘴P视频,太炸裂了!!!
目前这个功能只能在可灵官网上使用,期待Lovart上早日上线这个功能。

可灵视频 01,是全球首个统一多模态视频模型,秉承 Multi-modal visuallanguage (MVL)理念,以自然语言作为语义骨架,配合视频、图片主体等多模态描述,精准理解我们的意图,操作更直观、创作更高效。

O1只有一个问题就是没声音,然后没等我挑毛病,可灵昨夜发布了kling 2.6,音画同步了!中文支持看上去不错,这是要挑战Sora2啊!

可灵AI是视频届的国货之光无疑!这黄金会员啊,我还是得留着!下期咱们详细对比下O1和2.6!
还有个重大羊毛就是,12月1日至12月7日,购买 Lovart会员 即可享受最高 50% OFF 的限时折扣!在会员期间,最高可获得 365天0积分无限制使用NanoBananaPro 和 Kling O1 的超值福利!
后记
12月1日DeepSeek更新V3.2,打平GPT5。12月2号,O1给我们这样的惊喜,我国的AI真的是迎头赶上了。

目前 AI 界的共识是:通往通用人工智能(AGI)的物理理解之路,不在文字(LLM),也不在图片,而在于视频。期待可灵豆包海螺Vidu千问给我们更多惊喜。(ps:这些视频模型在Lovart上都有哦)

创作不易,如果对你有帮助,请三连,我们下期见!



我是刀哥,大厂工作过几年,现在是出海创业者,深入研究AI工具和AI编程。关注我,了解更多AI知识!我们下期再见!

【声明】内容源于网络
0
0
刀哥聊AI
大厂经历,出海视野,深耕AI圈。新鲜新闻、实用工具、硬核技术解读,有深度有实践,带你玩转AI。
内容 75
粉丝 0
刀哥聊AI 大厂经历,出海视野,深耕AI圈。新鲜新闻、实用工具、硬核技术解读,有深度有实践,带你玩转AI。
总阅读66
粉丝0
内容75