Tomato AI LogoTomato AI
Home
Video AI
Pricing-50%
Blog⚽ World CupHOT
←
Tomato AI LogoTomato AI

Tomato AI integrates Jimeng 3.0, Veo 3.1, Sora 2, Kling 3 and other top models. Deliver commercial-grade videos from text, images or video in seconds.

Product

  • Text to Video
  • Image to Video
  • 关于我们

Resources

  • Pricing
  • FAQ
  • Blog

© 2026 • Tomato AI All Rights Reservedsupport@tomato.ai
Terms of ServicePrivacy Policy
Tomato AI is an independent product and is not affiliated with ByteDance, Google, OpenAI, etc.
← Back to Blog
Seedance 2.0

Seedance 2.0 Basic Formula: Multimodal Reference · Edit · Extend · Combined Tasks

2026-06-0410 min readTomato AI Team

Seedance 2.0 Basic Formula: Multimodal Reference · Edit · Extend · Combination

Overview

Seedance 2.0 series models natively support joint audio and video generation, with excellent semantic understanding and multimodal interaction capabilities. The basic formula covers three core task types — multimodal reference, video editing, and video extension — along with their combined usage, serving as the fundamental starting point for creating with Seedance 2.0.

Below we introduce the applicable scenarios, recommended phrasing, and notes for each of these four task types.


1. Multimodal Reference

Concept

Extract certain elements (such as subject, style, scene, sound effects) from source material to generate a brand new video. The original material's temporal dimension and narrative logic are not preserved — only its visual/auditory features are borrowed.

Applicable Scenarios

  • Motion transfer (make character A perform character B's actions)
  • Subject reuse (place a character from source material into a new scene)
  • Atmosphere borrowing (reference the lighting, color palette, and rhythm of a video)

Recommended Phrasing

Reference TypeRecommended Phrasing
Image referenceReferencing 's , generate...
Video referenceReferencing 's , generate...
Audio referenceReferencing 's timbre/voice, generate...

Usage Tips

  • Use , , to precisely refer to source material
  • Upload materials in order, and reference them in order within the prompt
  • Place important reference materials toward the beginning of the prompt
  • Avoid too many reference materials; typically 4–5 total works best

2. Video Editing

Concept

Make partial or global modifications on top of the original video. Parts not mentioned remain unchanged by default. Unlike "reference" (which generates a new video), the core of editing is preserving the original video's timeline, motion, and narrative, modifying only the specified elements.

Applicable Scenarios

  • Local replacement (swap a person/object in the frame with something new)
  • Subject removal (erase unwanted characters or objects from the frame)
  • Attribute modification (change color, material, style, etc.)

Recommended Phrasing

OperationRecommended Phrasing
Add elementClearly describe + +
Modify elementStrictly edit , changing to
Remove elementSpecify the element to remove; for elements that should remain unchanged, emphasizing them in the prompt yields better results

Notes

For video editing / extension tasks, directly use to refer to the video. Do not use "referencing ", as this may cause the model to misidentify it as a reference task.


3. Video Extension

Concept

Continue from the original video along the temporal dimension, requiring consistent audio/video style, subject, and narrative. The model understands the ending state and motion trend of the original video, naturally extending to generate subsequent content.

Applicable Scenarios

  • Continuing the storyline (what happens next?)
  • Extending an action (the action hasn't finished yet — continue demonstrating it)
  • Completing clips (add content before or after the video)

Recommended Phrasing

TypeRecommended Phrasing
Extend backwardExtend backward, generate...
Extend forwardExtend forward, generate...
Track completion + + then + ...

Notes

  • The model will automatically crop the衔接 (transition/join) portion for compositing; original segments of the input video will not be regenerated
  • For editing/extension tasks, directly use , do not use "referencing "
  • Track completion supports up to 3 video segments as input, with a total duration not exceeding 15 seconds

4. Combined Tasks

Concept

Multimodal reference, video editing, and video extension tasks support combined usage to achieve more complex creative needs. For example: reference an image's style, edit an element in another video, then extend the result backward.

Regardless of the task type, prompts must include the following essential elements:

┌─ Essential Elements ──────────────────────┐
│                                             │
│   Precise subject (who? what object?)       │
│   Action details (what are they doing?)     │
│   Scene environment (where? what mood?)     │
│   Lighting & color (what light? what tone?) │
│   Camera movement (how is it shot?)         │
│   Visual style (what art style?)            │
│   Quality constraints (HD? cinematic?)      │
│   Constraints (no watermarks/subtitles/logos)│
│                                             │
└─────────────────────────────────────────────┘

These elements can be simplified as: Who + What action + Where + What light + How shot + What style + What quality + No issues. In combined tasks, each element can come from different reference materials, enabling complex creations like "reference subject A from material A + scene B + action C, to edit video D."

Applicable Scenarios

Reference the features of one piece of material to edit or extend another piece of material.

Recommended Phrasing

Referencing [reference dimension] from <Image/Video N>, strictly edit <Video X>, [specific edit content]

Example

Referencing the character appearance from Image 1, strictly edit Video 1, changing the red dress in it to a blue dress.


Five Tasks Quick Comparison

TaskGenerates New VideoPreserves Original TimelineTypical Keywords
Multimodal ReferenceYes, entirely new videoNoreferencing...in..., generate...
Edit — Add ElementNo, modifies original videoYesadd...to...
Edit — Modify ElementNo, modifies original videoYeschange...to...
Edit — Remove ElementNo, modifies original videoYesremove..., keep...unchanged
Video ExtensionYes, continues from originalInherits original end stateextend forward/backward..., generate...
Track CompletionYes, splices multiple segmentsTransitions between segments + transition + then
Combined TasksMixedMixedReferencing..., strictly edit...

Summary

The basic formula of Seedance 2.0 revolves around the three core capabilities of "Reference — Edit — Extend." Remember one principle:

Reference — borrow elements, generate a new video

Edit — modify content, preserve the original video

Extend — continue in time, join with the original video

Combine — mix usage, achieve complex creations

Equally important: for editing and extension tasks, directly use to refer to materials. Do not add the word "reference," otherwise the model will treat it as a "reference" task — first referencing your material then editing/extending it, which adds unnecessary uncertainty.

🍅 Try AI Video Generation Free on Tomato AI

Sign up for free credits. Access Seedance 2.0, Sora 2, Kling 3 & more top models. No watermark, 1080P output.

Start Creating Free →