Veo 3 UGC Video System
in Claude Code
The complete pipeline for deconstructing UGC videos, engineering Veo 3 prompts, generating images, animating to video, and producing full AI UGC ads — all without leaving your terminal.
Previously, creating a Veo 3 UGC video ad required bouncing between three platforms:
| Before (Old Stack) | After (Claude Code) |
|---|---|
| Gemini Gem — Upload video, get UGC deconstruction + Veo 3 prompt | /veo3-prompt — Same analysis + prompt generation, with superior structured output |
| Custom GPT — Script writing, hook generation, ad copy | /hook-doctor + /copy-alchemist — Specialized skills for each task |
| VeoStack App — localhost:3333 web UI for prompt gen + kie.ai pipeline | /kieai + /ai-ugc-creator — Direct API calls from terminal, no server needed |
The system is a 6-stage pipeline where each stage maps to a dedicated Claude skill. You call them sequentially (or skip stages you don't need):
/veo3-prompt
/veo3-prompt
/hook-doctor
/kieai
/kieai
/ai-ugc-creator
How Data Flows
- You upload an image or screenshot of a UGC video you want to recreate
- /veo3-prompt deconstructs the image into UGC components (camera, lighting, subject, motion, audio) and synthesizes a 2048-character Veo 3 prompt
- You copy the Veo 3 prompt into Google Veo 3 to generate the video — OR continue in Claude Code:
- /hook-doctor writes scroll-stopping hooks; /copy-alchemist writes the full ad script
- /kieai generates storyboard images via Nano Banana 2, then animates them to 5s video clips via Kling
- /ai-ugc-creator orchestrates the full pipeline: script → storyboard → images → video → export specs
Path B (kie.ai + Kling): Use /kieai to generate images + animate entirely within Claude Code. Lower cost, more control, no waitlist.
What this does: You upload a screenshot or frame from a UGC video you like, and Claude reverse-engineers every element that makes it feel authentic — the camera type, framing imperfections, lighting, audio quality, subject performance, and social context.
How to Use It
1. Take a screenshot of any UGC video (TikTok, Reels, etc.) 2. In Claude Code, type: /veo3-prompt 3. Upload the screenshot when prompted 4. Say: "Deconstruct this and give me a Veo 3 prompt"
What Claude Analyzes
Claude infers the camera model based on visual evidence:
- Aspect ratio: 9:16 = phone vertical, 16:9 = landscape/webcam
- Lens distortion: Wide-angle barrel distortion = front camera selfie
- Dynamic range: Crushed shadows = older phone; HDR bloom = iPhone 14+
- Noise pattern: Fine luminance noise = good sensor; chroma blotching = cheap sensor
- Framing: Off-center, too much headroom, rule of thirds broken
- Camera motion: Handheld wobble, selfie grip jitter, abrupt pans
- Lighting: Harsh overhead, uneven window light, ring light catch in eyes
- Editing: Single take, rough jump cuts, no color grading
- Visual noise: Grain in shadows, minor lens flare, JPEG compression
- Background: Room echo, AC hum, muffled traffic, kitchen sounds
- Mic quality: Phone-mic echo, proximity bass boost, wind noise
- Delivery: Filler words ("um", "like"), natural gestures, eye contact with camera
- Body language: Relaxed posture, casual hand movements, authentic expressions
What this does: Claude takes everything from the deconstruction and compresses it into a single, copy-paste-ready Veo 3 prompt — exactly 2048 characters, following the official 6-section template.
The 6 Sections (Always in This Order)
| # | Section | What It Contains |
|---|---|---|
| 1 | Cinematography & Shot Type | Shot size, camera model, framing, movement, focus, resolution, color grade, filename |
| 2 | Subject Description | Name, age, ethnicity, hair, face, eyes, skin, build, clothing, accessories |
| 3 | Action & Physics | Position, posture, specific movements in beats (3 minimum) |
| 4 | Environment & Lighting | Atmosphere, mood, light source and quality, shadow details |
| 5 | Audio & Dialogue | Mic type, audio quality, background sounds, voice characteristics, exact dialogue with filler words |
| 6 | Style Guidelines & Negatives | Visual style keywords, editing style, universal quality control negatives list |
Using the Output
- Copy the prompt from the code block Claude outputs
- Paste directly into Google Veo 3 (AI Test Kitchen or Flow)
- Generate — the video should match the UGC feel of your reference
- Iterate: If close but not right, tell Claude to change one variable (e.g., "make the lighting warmer" or "change to golden hour")
Before generating visuals, nail the script. Two skills handle this:
/hook-doctor — The First 1.5 Seconds
/hook-doctor "Write 10 scroll-stopping hooks for a UGC ad about Daily Dosey dog supplement pouches. Target: female dog owners 25-45 on Instagram Reels."
Returns 10 hooks ranked by pattern type (curiosity, controversy, transformation, social proof, etc.). Pick the strongest one for your Veo 3 prompt dialogue.
/copy-alchemist — The Full Script
/copy-alchemist "Write a 15-second UGC video ad script for Daily Dosey. Hook: [paste winning hook from hook-doctor] Structure: Hook (0-3s) > One Benefit (3-12s) > CTA (12-15s) Tone: casual, authentic, like texting a friend Include filler words for realism."
The script output feeds directly into Section 5 (Audio & Dialogue) of your Veo 3 prompt.
If you're using Path B (kie.ai + Kling) instead of pasting into Veo 3, this is where you generate your storyboard images.
How to Call It
/kieai "Generate an image: A 28-year-old Indian woman with shoulder-length black hair, sitting in a modern kitchen, holding a Daily Dosey stand-up pouch, looking at camera with a surprised expression, natural window lighting, slightly off-center framing, photorealistic, iPhone quality, natural lighting, no text, no watermarks"
What Happens Under the Hood
- Claude sends a POST request to kie.ai's createTask endpoint with the Nano Banana 2 model
- Gets back a taskId
- Polls every 10-15 seconds until state = "success"
- Returns the image URL you can view and download
UGC Image Authenticity Tricks
Add these to your image prompts to make AI images look like real phone photos:
- End every prompt with: "photorealistic, iPhone quality, natural lighting, no text, no watermarks"
- Describe imperfect framing: "slightly off-center, too much headroom"
- Include environmental mess: "messy desk in background", "laundry basket visible"
- Specify the social scenario: "selfie taken in a bathroom mirror" not just "woman smiling"
- For Indian market: always specify "Indian woman/man" explicitly
- For TFT products: Daily Dosey is a stand-up pouch (NEVER jar/bottle)
Once you have storyboard images, animate them into 5-second video clips using Kling AI via the same /kieai skill.
How to Call It
/kieai "Animate this image to video: Image URL: [paste the URL from Step 4] Motion: The woman looks at camera with a surprised expression, then holds up the pouch and smiles. Natural handheld selfie motion with subtle shake. Slight zoom-in on product. Duration: 5 seconds Aspect: 9:16 Model: kling-v2.1-pro-i2v"
Video Model Tiers
| Model | Resolution | Cost/5s | Best For |
|---|---|---|---|
| kling-v2.1-standard-i2v | 720p | $0.125 | Quick tests, drafts |
| kling-v2.1-pro-i2v | 1080p | $0.25 | Production ads (recommended) |
| kling-v2.1-master-i2v | 1080p+ | $0.80 | Premium quality, hero shots |
Motion Prompt Best Practices
- Describe in beats: "She looks up, pauses, then holds up the product" (not "she moves naturally")
- Camera motion: Always include "Natural handheld selfie motion with subtle shake" for UGC feel
- Keep it simple: 2-3 actions max for a 5-second clip
- Pure static = looks AI. Pure chaos = also looks AI. The middle ground is real.
For brands that need a consistent AI creator across multiple ads, the /ai-ugc-creator skill includes an AI Influencer module with identity locking.
How to Create a Character
/ai-ugc-creator "Build an AI influencer character for Treat for Tails: - Female, 28-30, Indian - Warm, approachable, dog-mom energy - Casual style (oversized tees, messy bun) - Generate a character template + model sheet"
Identity Lock System
- Character Template: Claude generates a reusable text block with exact physical features
- Model Sheet: Multi-angle reference grid (front, 3/4, side) generated via /kieai
- Consistency Test: Generate 5+ images in different settings — the character should be recognizable
- Repeat the FULL character description in every single prompt — never abbreviate
For a complete production-ready UGC ad (not just a single Veo 3 clip), use the full orchestration pipeline:
/ai-ugc-creator "Create a 6-shot UGC ad for Daily Dosey dog supplement. Platform: Instagram Reels 9:16 Character: 28-year-old Indian woman, dog mom Setting: Modern apartment, living room + kitchen Structure: Hook > Problem > Discovery > Demo > Result > CTA"
What Claude Produces
- Ad script with voiceover text per shot
- 6 image prompts (Nano Banana 2) — character locked across all
- 6 motion prompts (Kling) — one per shot with camera directions
- Platform export specs (resolution, duration, aspect ratio)
Then call /kieai for each shot to generate images and animate them. The skill handles this sequentially — generate image, wait for completion, animate, wait, move to next shot.
Post-Production (Manual Steps)
- Voiceover: Run the script through ElevenLabs (or use Veo 3's native audio for speech)
- Lip sync: Use ElevenLabs Flows or CapCut for lip-syncing
- Assembly: Stitch clips in CapCut, add text overlays, export
[1. CINEMATOGRAPHY] Shot Size: Selfie Shot (Vertical 9:16). Camera: IPHONE 15 PRO Front Camera (~24mm equivalent). Framing: [FRAMING], filmed [LOCATION]. Movement: [MOVEMENT]. Focus: [DEPTH OF FIELD]. Resolution: 720x1280 (Vertical). Grade: iPhone HDR auto-tone; [COLOR PALETTE]; [FILTER]. Filename: "IMG_[XXXX].MOV". [2. SUBJECT] Subject: [NAME], a [AGE] [ETHNICITY] [GENDER] with [HAIR]. Face: [FACIAL FEATURES]. Eyes: [COLOR] [SHAPE] eyes [DETAILS]. Skin: [TONE with undertones, natural realistic pores]. Build: [BUILD]. Attire: [CLOTHING] and [ACCESSORIES]. [3. ACTION & PHYSICS] Position: [He/She] [sits/stands] [WHERE]. Physics: Holds phone at arm's length. [POSTURE]. Movements: - [Beat 1] - [Beat 2] - [Beat 3] [4. ENVIRONMENT & LIGHTING] Atmosphere: [MOOD] -- like [he/she]'s [EMOTIONAL CONTEXT]. Lighting: [SOURCE & QUALITY], illuminating face [HOW]. Shadows: [SHADOW DETAILS]. [5. AUDIO & DIALOGUE] Audio: [PHONE] internal mic. [QUALITY]. [BG SOUNDS]. Voice: [CHARACTERISTICS]. Tone: [TONE]. Dialogue: [NAME] says: "[SCRIPT WITH FILLER WORDS. 3-8 SENTENCES.]" [6. STYLE & NEGATIVES] Style: Smartphone selfie, handheld realism, direct-to-camera, raw unfiltered [PLATFORM] aesthetic, [EDITING STYLE]. Negatives: Subtitles, captions, watermark, text overlays, logo, branding, blurry, artifacts, cartoon effects, distorted hands, artificial lighting, oversaturation.
For Image Prompts (kie.ai / Midjourney)
| Trick | What to Add | Why It Works |
|---|---|---|
| Kill beautification | --stylize 0 --style raw (MJ only) | Removes AI "perfection" that screams fake |
| Specify device | "taken on iPhone 11" | Triggers device-specific rendering characteristics |
| Add filename | "IMG_4673.HEIC" | HEIC = higher dynamic range; JPG = grainier |
| Social platform | "Posted on Instagram" | Applies platform-specific compression artifacts |
| Timeframe | "Posted in 2016" | Matches era-specific phone camera quality |
| Controlled randomness | --weird 4 (MJ only) | Introduces natural imperfection |
| Social scenario | "photo taken at a work party" | Contextualizes the pose and setting |
For Video Prompts (Veo 3 / Kling)
- Camera motion: "Subtle handheld sway and jitter consistent with a selfie grip" — not "smooth" or "static"
- Imperfect framing: "Slightly off-center, too much headroom on the left"
- Lighting flaws: "Uneven natural light, slight overexposure on the right cheek"
- Audio imperfections: "Faint AC hum", "slight room echo"
- Filler words in dialogue: "uh", "like", "you know", "honestly" — real people don't speak in clean sentences
- Environmental clutter: "Messy desk visible behind", "laundry basket in corner"
[Cinematography/Lens] + [Subject] + [Action/Physics] + [Environment] + [Lighting] + [Audio/Dialogue]
This order gives Veo the visual hierarchy it needs. Camera first = it "sets up the shot" before populating it.
Weak: "Actor walks across the room"
Strong: "Actor takes four steps to the window, pauses, and pulls the curtain in the final second"
Describe actions in beats or counts — small steps, gestures, pauses. This gives Veo timing anchors.
For multi-shot ads, name 3-5 specific colors to keep palette stable.
Weak: "bright room"
Strong: "Soft window light with a warm lamp fill and a cool edge from the hallway"
Describe both the quality of light AND the color anchors.
- Format:
Character Name: "Line of dialogue." - Timing: A 4-second shot fits ONE short exchange
- Long speeches break lip-sync — keep it concise
- Always specify ambient audio even for "silent" shots
- Upload a reference image (from kie.ai or Midjourney)
- Veo uses it as an anchor for the first frame
- Your text prompt defines what happens next
- This is the best way to maintain character across shots
Change one variable at a time when a result is close:
- "Same shot, but change the lighting to Golden Hour"
- "Same action, but add the sound of a police siren"
If misfiring: freeze the camera, simplify the action, clear the background. Layer complexity back step by step.
Pre-Production
- Reference video/image selected and screenshot taken
- Product identified (name, type, key benefit)
- Target platform chosen (TikTok/Reels/Shorts/Feed)
- Character template created (if multi-shot)
- One-benefit rule: single benefit identified for this ad
Script
- Hook written via /hook-doctor (1.5s scroll-stopper)
- Full script via /copy-alchemist (Hook > Benefit > CTA)
- Filler words included ("uh", "like", "honestly")
- Script fits 15-second format (3-8 sentences max)
Generation
- Veo 3 prompt generated via /veo3-prompt (2048 chars, 6 sections)
- OR storyboard images generated via /kieai
- Images pass UGC authenticity check (imperfect framing, natural lighting)
- Video clips animated via /kieai Kling (Pro model, 9:16)
- Motion prompts use beats/counts, not vague descriptions
Post-Production
- Voiceover added (ElevenLabs or Veo native audio)
- Lip sync verified (if talking head)
- Clips stitched in order (CapCut or similar)
- No visible AI artifacts (distorted hands, wrong proportions)
- No text overlays in the generated video (add in post only)
- Exported at correct specs for target platform
Platform Export Specs
| Platform | Aspect | Resolution | Duration |
|---|---|---|---|
| TikTok / Reels / Shorts | 9:16 | 1080x1920 | 6-30s |
| Instagram Feed | 1:1 | 1080x1080 | 15-60s |
| Facebook Feed | 4:5 | 1080x1350 | 15-30s |
| YouTube Pre-roll | 16:9 | 1920x1080 | 15-30s |