Seedance 2.0 Basic Formula: Multimodal Reference · Edit · Extend · Combined Tasks
Seedance 2.0 Basic Formula: Multimodal Reference · Edit · Extend · Combination
Overview
Seedance 2.0 series models natively support joint audio and video generation, with excellent semantic understanding and multimodal interaction capabilities. The basic formula covers three core task types — multimodal reference, video editing, and video extension — along with their combined usage, serving as the fundamental starting point for creating with Seedance 2.0.
Below we introduce the applicable scenarios, recommended phrasing, and notes for each of these four task types.
1. Multimodal Reference
Concept
Extract certain elements (such as subject, style, scene, sound effects) from source material to generate a brand new video. The original material's temporal dimension and narrative logic are not preserved — only its visual/auditory features are borrowed.
Applicable Scenarios
- Motion transfer (make character A perform character B's actions)
- Subject reuse (place a character from source material into a new scene)
- Atmosphere borrowing (reference the lighting, color palette, and rhythm of a video)
Recommended Phrasing
| Reference Type | Recommended Phrasing |
| Image reference | Referencing |
| Video reference | Referencing |
| Audio reference | Referencing |
Usage Tips
- Use
,,to precisely refer to source material - Upload materials in order, and reference them in order within the prompt
- Place important reference materials toward the beginning of the prompt
- Avoid too many reference materials; typically 4–5 total works best
2. Video Editing
Concept
Make partial or global modifications on top of the original video. Parts not mentioned remain unchanged by default. Unlike "reference" (which generates a new video), the core of editing is preserving the original video's timeline, motion, and narrative, modifying only the specified elements.
Applicable Scenarios
- Local replacement (swap a person/object in the frame with something new)
- Subject removal (erase unwanted characters or objects from the frame)
- Attribute modification (change color, material, style, etc.)
Recommended Phrasing
| Operation | Recommended Phrasing |
| Add element | Clearly describe + + |
| Modify element | Strictly edit |
| Remove element | Specify the element to remove; for elements that should remain unchanged, emphasizing them in the prompt yields better results |
Notes
For video editing / extension tasks, directly use
to refer to the video. Do not use "referencing , as this may cause the model to misidentify it as a reference task.
3. Video Extension
Concept
Continue from the original video along the temporal dimension, requiring consistent audio/video style, subject, and narrative. The model understands the ending state and motion trend of the original video, naturally extending to generate subsequent content.
Applicable Scenarios
- Continuing the storyline (what happens next?)
- Extending an action (the action hasn't finished yet — continue demonstrating it)
- Completing clips (add content before or after the video)
Recommended Phrasing
| Type | Recommended Phrasing |
| Extend backward | Extend |
| Extend forward | Extend |
| Track completion |
Notes
- The model will automatically crop the衔接 (transition/join) portion for compositing; original segments of the input video will not be regenerated
- For editing/extension tasks, directly use
, do not use "referencing" - Track completion supports up to 3 video segments as input, with a total duration not exceeding 15 seconds
4. Combined Tasks
Concept
Multimodal reference, video editing, and video extension tasks support combined usage to achieve more complex creative needs. For example: reference an image's style, edit an element in another video, then extend the result backward.
Regardless of the task type, prompts must include the following essential elements:
┌─ Essential Elements ──────────────────────┐
│ │
│ Precise subject (who? what object?) │
│ Action details (what are they doing?) │
│ Scene environment (where? what mood?) │
│ Lighting & color (what light? what tone?) │
│ Camera movement (how is it shot?) │
│ Visual style (what art style?) │
│ Quality constraints (HD? cinematic?) │
│ Constraints (no watermarks/subtitles/logos)│
│ │
└─────────────────────────────────────────────┘
These elements can be simplified as: Who + What action + Where + What light + How shot + What style + What quality + No issues. In combined tasks, each element can come from different reference materials, enabling complex creations like "reference subject A from material A + scene B + action C, to edit video D."
Applicable Scenarios
Reference the features of one piece of material to edit or extend another piece of material.
Recommended Phrasing
Referencing [reference dimension] from <Image/Video N>, strictly edit <Video X>, [specific edit content]
Example
Referencing the character appearance from Image 1, strictly edit Video 1, changing the red dress in it to a blue dress.
Five Tasks Quick Comparison
| Task | Generates New Video | Preserves Original Timeline | Typical Keywords |
| Multimodal Reference | Yes, entirely new video | No | referencing...in..., generate... |
| Edit — Add Element | No, modifies original video | Yes | add...to... |
| Edit — Modify Element | No, modifies original video | Yes | change...to... |
| Edit — Remove Element | No, modifies original video | Yes | remove..., keep...unchanged |
| Video Extension | Yes, continues from original | Inherits original end state | extend forward/backward..., generate... |
| Track Completion | Yes, splices multiple segments | Transitions between segments | |
| Combined Tasks | Mixed | Mixed | Referencing..., strictly edit... |
Summary
The basic formula of Seedance 2.0 revolves around the three core capabilities of "Reference — Edit — Extend." Remember one principle:
Reference — borrow elements, generate a new video
Edit — modify content, preserve the original video
Extend — continue in time, join with the original video
Combine — mix usage, achieve complex creations
Equally important: for editing and extension tasks, directly use to refer to materials. Do not add the word "reference," otherwise the model will treat it as a "reference" task — first referencing your material then editing/extending it, which adds unnecessary uncertainty.
🍅 Try AI Video Generation Free on Tomato AI
Sign up for free credits. Access Seedance 2.0, Sora 2, Kling 3 & more top models. No watermark, 1080P output.
Start Creating Free →