Seedance 2.0

Seedance 2.0 Basic Formula: Multimodal Reference · Edit · Extend · Combined Tasks

2026-06-0410 min readTomato AI Team

Seedance 2.0 Basic Formula: Multimodal Reference · Edit · Extend · Combination

Overview

Seedance 2.0 series models natively support joint audio and video generation, with excellent semantic understanding and multimodal interaction capabilities. The basic formula covers three core task types — multimodal reference, video editing, and video extension — along with their combined usage, serving as the fundamental starting point for creating with Seedance 2.0.

Below we introduce the applicable scenarios, recommended phrasing, and notes for each of these four task types.

1. Multimodal Reference

Concept

Extract certain elements (such as subject, style, scene, sound effects) from source material to generate a brand new video. The original material's temporal dimension and narrative logic are not preserved — only its visual/auditory features are borrowed.

Applicable Scenarios

Motion transfer (make character A perform character B's actions)
Subject reuse (place a character from source material into a new scene)
Atmosphere borrowing (reference the lighting, color palette, and rhythm of a video)

Recommended Phrasing

Reference Type	Recommended Phrasing
Image reference	Referencing 's , generate...
Video reference	Referencing
Audio reference	Referencing

Usage Tips

Use , , to precisely refer to source material
Upload materials in order, and reference them in order within the prompt
Place important reference materials toward the beginning of the prompt
Avoid too many reference materials; typically 4–5 total works best

2. Video Editing

Concept

Make partial or global modifications on top of the original video. Parts not mentioned remain unchanged by default. Unlike "reference" (which generates a new video), the core of editing is preserving the original video's timeline, motion, and narrative, modifying only the specified elements.

Applicable Scenarios

Local replacement (swap a person/object in the frame with something new)
Subject removal (erase unwanted characters or objects from the frame)
Attribute modification (change color, material, style, etc.)

Recommended Phrasing

Operation	Recommended Phrasing
Add element	Clearly describe + +
Modify element	Strictly edit
Remove element	Specify the element to remove; for elements that should remain unchanged, emphasizing them in the prompt yields better results

Notes

For video editing / extension tasks, directly use to refer to the video. Do not use "referencing , as this may cause the model to misidentify it as a reference task.

3. Video Extension

Concept

Continue from the original video along the temporal dimension, requiring consistent audio/video style, subject, and narrative. The model understands the ending state and motion trend of the original video, naturally extending to generate subsequent content.

Applicable Scenarios

Continuing the storyline (what happens next?)
Extending an action (the action hasn't finished yet — continue demonstrating it)
Completing clips (add content before or after the video)

Recommended Phrasing

Type	Recommended Phrasing
Extend backward	Extend
Extend forward	Extend
Track completion

Notes

The model will automatically crop the衔接 (transition/join) portion for compositing; original segments of the input video will not be regenerated
For editing/extension tasks, directly use , do not use "referencing
Track completion supports up to 3 video segments as input, with a total duration not exceeding 15 seconds

4. Combined Tasks

Concept

Multimodal reference, video editing, and video extension tasks support combined usage to achieve more complex creative needs. For example: reference an image's style, edit an element in another video, then extend the result backward.

Regardless of the task type, prompts must include the following essential elements:

┌─ Essential Elements ──────────────────────┐
│                                             │
│   Precise subject (who? what object?)       │
│   Action details (what are they doing?)     │
│   Scene environment (where? what mood?)     │
│   Lighting & color (what light? what tone?) │
│   Camera movement (how is it shot?)         │
│   Visual style (what art style?)            │
│   Quality constraints (HD? cinematic?)      │
│   Constraints (no watermarks/subtitles/logos)│
│                                             │
└─────────────────────────────────────────────┘

These elements can be simplified as: Who + What action + Where + What light + How shot + What style + What quality + No issues. In combined tasks, each element can come from different reference materials, enabling complex creations like "reference subject A from material A + scene B + action C, to edit video D."

Applicable Scenarios

Reference the features of one piece of material to edit or extend another piece of material.

Recommended Phrasing

Referencing [reference dimension] from <Image/Video N>, strictly edit <Video X>, [specific edit content]

Example

Referencing the character appearance from Image 1, strictly edit Video 1, changing the red dress in it to a blue dress.

Five Tasks Quick Comparison

Task	Generates New Video	Preserves Original Timeline	Typical Keywords
Multimodal Reference	Yes, entirely new video	No	referencing...in..., generate...
Edit — Add Element	No, modifies original video	Yes	add...to...
Edit — Modify Element	No, modifies original video	Yes	change...to...
Edit — Remove Element	No, modifies original video	Yes	remove..., keep...unchanged
Video Extension	Yes, continues from original	Inherits original end state	extend forward/backward..., generate...
Track Completion	Yes, splices multiple segments	Transitions between segments
Combined Tasks	Mixed	Mixed	Referencing..., strictly edit...

Summary

The basic formula of Seedance 2.0 revolves around the three core capabilities of "Reference — Edit — Extend." Remember one principle:

Reference — borrow elements, generate a new video

Edit — modify content, preserve the original video

Extend — continue in time, join with the original video

Combine — mix usage, achieve complex creations

Equally important: for editing and extension tasks, directly use to refer to materials. Do not add the word "reference," otherwise the model will treat it as a "reference" task — first referencing your material then editing/extending it, which adds unnecessary uncertainty.

🍅 Try AI Video Generation Free on Tomato AI

Start Creating Free →

← Back to Blog

Seedance 2.0

Seedance 2.0 Basic Formula: Multimodal Reference · Edit · Extend · Combined Tasks

2026-06-0410 min readTomato AI Team

Seedance 2.0 Basic Formula: Multimodal Reference · Edit · Extend · Combination

Overview

Below we introduce the applicable scenarios, recommended phrasing, and notes for each of these four task types.

1. Multimodal Reference

Concept

Applicable Scenarios

Motion transfer (make character A perform character B's actions)
Subject reuse (place a character from source material into a new scene)
Atmosphere borrowing (reference the lighting, color palette, and rhythm of a video)

Recommended Phrasing

Reference Type	Recommended Phrasing
Image reference	Referencing 's , generate...
Video reference	Referencing
Audio reference	Referencing

Usage Tips

Use , , to precisely refer to source material
Upload materials in order, and reference them in order within the prompt
Place important reference materials toward the beginning of the prompt
Avoid too many reference materials; typically 4–5 total works best

2. Video Editing

Concept

Applicable Scenarios

Local replacement (swap a person/object in the frame with something new)
Subject removal (erase unwanted characters or objects from the frame)
Attribute modification (change color, material, style, etc.)

Recommended Phrasing

Operation	Recommended Phrasing
Add element	Clearly describe + +
Modify element	Strictly edit
Remove element	Specify the element to remove; for elements that should remain unchanged, emphasizing them in the prompt yields better results

Notes

For video editing / extension tasks, directly use to refer to the video. Do not use "referencing , as this may cause the model to misidentify it as a reference task.

3. Video Extension

Concept

Applicable Scenarios

Continuing the storyline (what happens next?)
Extending an action (the action hasn't finished yet — continue demonstrating it)
Completing clips (add content before or after the video)

Recommended Phrasing

Type	Recommended Phrasing
Extend backward	Extend
Extend forward	Extend
Track completion

Notes

The model will automatically crop the衔接 (transition/join) portion for compositing; original segments of the input video will not be regenerated
For editing/extension tasks, directly use , do not use "referencing
Track completion supports up to 3 video segments as input, with a total duration not exceeding 15 seconds

4. Combined Tasks

Concept

Regardless of the task type, prompts must include the following essential elements:

┌─ Essential Elements ──────────────────────┐
│                                             │
│   Precise subject (who? what object?)       │
│   Action details (what are they doing?)     │
│   Scene environment (where? what mood?)     │
│   Lighting & color (what light? what tone?) │
│   Camera movement (how is it shot?)         │
│   Visual style (what art style?)            │
│   Quality constraints (HD? cinematic?)      │
│   Constraints (no watermarks/subtitles/logos)│
│                                             │
└─────────────────────────────────────────────┘

Applicable Scenarios

Reference the features of one piece of material to edit or extend another piece of material.

Recommended Phrasing

Referencing [reference dimension] from <Image/Video N>, strictly edit <Video X>, [specific edit content]

Example

Referencing the character appearance from Image 1, strictly edit Video 1, changing the red dress in it to a blue dress.

Five Tasks Quick Comparison

Task	Generates New Video	Preserves Original Timeline	Typical Keywords
Multimodal Reference	Yes, entirely new video	No	referencing...in..., generate...
Edit — Add Element	No, modifies original video	Yes	add...to...
Edit — Modify Element	No, modifies original video	Yes	change...to...
Edit — Remove Element	No, modifies original video	Yes	remove..., keep...unchanged
Video Extension	Yes, continues from original	Inherits original end state	extend forward/backward..., generate...
Track Completion	Yes, splices multiple segments	Transitions between segments
Combined Tasks	Mixed	Mixed	Referencing..., strictly edit...

Summary

The basic formula of Seedance 2.0 revolves around the three core capabilities of "Reference — Edit — Extend." Remember one principle:

Reference — borrow elements, generate a new video

Edit — modify content, preserve the original video

Extend — continue in time, join with the original video

Combine — mix usage, achieve complex creations

🍅 Try AI Video Generation Free on Tomato AI

Start Creating Free →