AI Video Prompt Guide: Sora, Veo 3, and Runway
AI video generation is moving fast. Sora, Veo 3, and Runway can produce surprisingly good clips, but only if you prompt them precisely. Vague prompts get you vague footage. This guide teaches you a structured approach to video prompting that works across all major tools.
The SCAAL Framework for Video Prompts
SCAAL stands for Subject, Camera, Action, Atmosphere, and Length. It gives you a checklist for every video prompt, ensuring you cover the elements that matter most for motion content.
- Subject -- What's in the frame? Be specific about appearance, number of subjects, and their starting position. "A woman in a red coat standing at the edge of a pier" is better than "a person near water."
- Camera -- How does the viewer see the scene? Include shot type (wide, medium, close-up) and any camera movement (slow dolly in, orbit, static tripod). We'll cover this in more detail in the next section.
- Action -- What happens during the clip? Describe the primary motion. "The woman turns to face the camera as wind catches her hair" gives the model a clear event to animate.
- Atmosphere -- What's the lighting, weather, and mood? "Overcast afternoon, soft diffused light, melancholic tone" sets the visual feeling of the entire clip.
- Length -- How long is the clip, and what's the pacing? Most tools generate 4-10 second clips. Mention pacing explicitly: "slow motion" or "real-time speed."
According to the Sora prompting tips, describing temporal pacing helps the model distribute motion evenly across the clip duration.
Camera Movement and Framing
Camera movement is what separates a video prompt from an image prompt. You're not describing a frozen moment; you're directing a shot. Use cinematic terms that the models have learned from film production data.
Common camera movements and when to use them:
- Static (tripod, locked off) for dialogue or calm scenes
- Slow dolly in for building tension or drawing attention
- Tracking shot for following a moving subject
- Orbit (arc around subject) for revealing three-dimensional form
- Crane up/down for dramatic reveals
- Handheld for documentary feel or urgency
Framing rules from cinematography apply directly. Start with the shot size: extreme wide for establishing context, wide for full body and environment, medium for waist-up interaction, close-up for emotion, extreme close-up for detail. Then add movement on top: "Medium shot, slow dolly in to close-up as the subject speaks."
The Runway documentation notes that specifying a single, clear camera movement per clip produces much better results than combining multiple movements. If you need a dolly-to-pan, generate two clips and cut between them. One movement per clip is the reliable rule.
Lighting, Atmosphere, and Mood
Lighting in video prompts works the same way as in image prompts, but you can also change it over time. "The scene starts in shadow, then warm sunlight breaks through the clouds" gives the model a lighting transition to animate. This creates visual interest that static lighting can't match.
Weather and environment act as atmosphere modifiers. "Rain-soaked street reflecting neon signs" tells the model about the ground surface, the light sources, and the reflections all at once. "Dusty desert highway at noon" implies harsh overhead light, washed-out colors, and heat haze. Use environmental details as shortcuts for complex lighting setups.
Mood keywords shape the overall tone. "Eerie," "joyful," "tense," "peaceful," and "cinematic" each push the generation in a different direction. Pair mood words with specific visual cues for stronger results: "eerie fog rolling through an abandoned hospital corridor, flickering fluorescent lights" is more actionable than "scary mood."
According to the Google Veo documentation, atmosphere keywords placed early in the prompt carry more weight. If mood is critical to your video, mention it before the action description, not after. "Nostalgic, sun-drenched afternoon: a child runs through a sprinkler on a suburban lawn" front-loads the feeling you want.
Aspect Ratios by Platform
Choosing the right aspect ratio before you generate saves you from awkward cropping later. Each platform has an ideal format, and generating in the wrong ratio means losing parts of your carefully composed frame.
16:9 (landscape): YouTube, desktop web, presentations, TV. This is the default for most video tools and the safest choice when you're unsure where the video will be used.
9:16 (vertical): TikTok, Instagram Reels, YouTube Shorts, Snapchat. Vertical video should have the subject centered or slightly above center, with minimal important detail at the very top or bottom where UI elements overlap.
1:1 (square): Instagram feed, LinkedIn video, some ad placements. Square works well for product showcases and talking-head clips where a centered composition is natural.
4:5 (portrait): Instagram feed (maximizes screen space), Facebook feed. A good compromise between vertical and square when you want height without full 9:16.
Most AI video tools let you set aspect ratio as a parameter. Set it before generating, not after. The model composes the scene based on the frame shape, so a wide establishing shot generated in 16:9 and then cropped to 9:16 will lose its intended composition. Plan your ratio around the destination platform from the start.
Commercial and Product Videos
AI-generated video is increasingly viable for product marketing, social ads, and explainer content. The key to commercial-quality output is treating the prompt like a creative brief, not a casual request.
For product shots, describe the product with precision: material, color, size relative to the frame, and surface finish. "A matte black wireless earbud case resting on a marble surface, soft studio lighting from above, slow 180-degree orbit" gives the model enough detail to produce a usable product hero shot.
For lifestyle or aspirational content, focus on the feeling you want the viewer to associate with the product. "A person opens a laptop in a bright, minimalist café, golden morning light from a large window, shallow depth of field on the screen" tells a visual story without being a hard sell.
The Sora guide recommends keeping generated clips short (under 10 seconds) and compositing them in an editor for longer sequences. This gives you more control over pacing, transitions, and brand consistency. Generate multiple angles of the same scene and cut between them for a professional result that feels like it was shot with multiple cameras.
Troubleshooting Common Issues
Flickering or jittering subjects: This usually happens when the prompt describes too much simultaneous movement. Simplify the action. Reduce the number of moving elements to one or two. If a background should stay still, say "static background" explicitly.
Unnatural motion: If movement looks robotic or too smooth, add natural modifiers. "Slight sway," "natural gait," "wind-blown" introduce organic imperfection that makes motion feel real. Avoid mechanical terms like "rotate 45 degrees" unless you specifically want mechanical motion.
Wrong style or era: If the video looks like the wrong decade or genre, add stronger style anchors. "Shot on 16mm film, 1990s color grading" or "clean digital cinema, modern color science" force the model toward a specific visual era. The Runway docs suggest using reference keywords from real filmmaking to guide visual style more precisely.
Inconsistent subjects across clips: If you're generating multiple clips of the same character or product, copy your subject description exactly between prompts. Change only the action and camera.
Keeping the subject text identical helps the model maintain visual consistency, though it's not guaranteed. For mission-critical consistency, use tools that support character or style references natively.
Text or logos appearing garbled: AI video models struggle with readable text. Avoid prompting for on-screen text. Add titles, logos, and captions in post-production instead.