⚡ AI-Powered Ad Pipeline

Create UGC Video Ads
with Zero Actors

A complete 6-step pipeline using kie.ai's Nano Banana 2 for image generation and Kling AI for video animation. From script to final ad in under 5 minutes.

$2.24
Cost per 6-shot ad
~5 min
Generation time (parallel)
97%
Cost savings vs human UGC
Script
Claude
🖼
Images
Nano Banana 2
🔍
Refine
NB2 Upscale
🎬
Video
Kling AI
🎤
Voice
ElevenLabs
Assembly
CapCut

01

Script & Concept Claude AI

Before generating any visuals, create the ad concept and shot list. Every great UGC ad follows a proven structure: hook, problem, discovery, demo, result, CTA.

💡

Pro Tip: Write the script first, then derive your storyboard from it. The visual should serve the story — not the other way around.

Script Prompt for Claude

Prompt
Create a UGC-style ad script for [PRODUCT]. The ad should feel like a real person talking to camera sharing their genuine experience. Include: 1. A scroll-stopping hook (first 2 seconds) 2. Problem identification (3-5 seconds) 3. Product discovery moment (5-8 seconds) 4. Usage/demo moment (8-12 seconds) 5. Result/transformation (12-15 seconds) 6. CTA (final 2-3 seconds) For each shot, describe: - Camera angle (selfie, medium, close-up, POV) - Character action and expression - Background/setting - Product placement

Ad Format Templates

FormatDurationShotsBest For
Quick hook6-8s3-4TikTok / Reels
Standard UGC15-20s6-8Feed ads
Testimonial25-35s8-12YouTube / Meta
Problem-solution15s5-6Story ads

02

Generate Storyboard Images Nano Banana 2 via kie.ai

Use kie.ai's Nano Banana 2 model to generate photorealistic images for each shot in your storyboard. The key technique: the 3x3 grid strategy for character consistency.

API Configuration

ParameterValue
EndpointPOST https://api.kie.ai/api/v1/jobs/createTask
PollGET https://api.kie.ai/api/v1/jobs/recordInfo?taskId={id}
Modelnano-banana-2

The 3x3 Grid Strategy

📌

Why a grid? Generating all 9 frames in a single image keeps the character looking consistent across all shots. Individual generations will drift.

Image Prompt
A 3x3 grid of 9 sequential shots for a UGC-style video ad. Each cell is a distinct camera angle of the same person. Character: [Age] [ethnicity] [gender], [hair], [clothing], [distinguishing features] Setting: [Location — kitchen, bathroom, living room, outdoors] Product: [Exact product description with colors and branding] Row 1 (top): 1. Selfie angle, excited expression, holding phone, [setting] background 2. Close-up face, talking to camera, natural lighting 3. Medium shot, holding [product], showing it to camera Row 2 (middle): 4. POV shot looking down at [product] in hands 5. Close-up of [product] being used/applied/opened 6. Medium shot, character reacting positively to product Row 3 (bottom): 7. Before/after comparison layout 8. Character smiling, holding product near face 9. Wide shot, character in full setting, product visible Photorealistic, iPhone quality, natural lighting, no text, no watermarks

Single Frame API Call

Bash
curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nano-banana-2",
    "input": {
      "prompt": "A 25-year-old Indian woman in a casual t-shirt, sitting on a couch in a modern living room, holding [PRODUCT] up to camera with an excited expression, selfie angle, iPhone quality, natural window lighting, photorealistic, no text"
    }
  }'

Image Prompt Rules

"photorealistic, iPhone quality, natural lighting, no text, no watermarks"

These five keywords ensure your output looks like real UGC, not AI-generated marketing content.

Always specify "Indian woman/man" explicitly in your prompts. Without this, Nano Banana 2 defaults to Western-looking subjects.

Describe your character in exactly the same terms across all prompts — same age, hair color, clothing, and distinguishing features. Any variation will cause drift.

  • 9:16 — Vertical (TikTok, Reels, Shorts, Stories)
  • 1:1 — Square (Instagram Feed, Facebook Feed)
  • 16:9 — Horizontal (YouTube, pre-roll ads)
  • Text in images (always say "no text")
  • Multiple people (hard to control consistency)
  • Extreme poses or complex hand positions
  • Branded backgrounds (logos, storefronts)

03

Detail Refinement Nano Banana 2 Image-to-Image

Critical Rule: Fix ALL image issues before moving to video. Problems are exponentially harder to fix at the video stage. Inspect product details at 100% zoom.

If product details are blurry or the character drifted between frames, use image-to-image editing to upscale and correct before animating.

Upscale Prompt
Upscale and sharpen the product details in this photo. The person is a [character description] holding [product]. Restore sharp product branding and labels while maintaining the natural UGC feel. Photorealistic, no text overlay.

API Call with Image Input

Bash
curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nano-banana-2",
    "input": {
      "prompt": "Upscale and sharpen the product details...",
      "image_url": "[URL_OF_BLURRY_FRAME]"
    }
  }'

04

Animate to Video Kling AI via kie.ai

Send each refined storyboard image to Kling AI's image-to-video endpoint. Motion prompts are the secret sauce — describing specific natural human movements makes AI video feel real.

Kling Model Pricing

ModelResolution5s Price10s PriceUse Case
Standard720p$0.125$0.25Testing / drafts
Pro1080p$0.25$0.50Final ads
Master1080p+$0.80$1.60Hero content

Image-to-Video API Call

Bash
curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kling-v2.1-pro-i2v",
    "input": {
      "prompt": "The woman looks at the camera and starts talking with natural hand gestures, slight head movements, warm smile. Handheld camera feel with subtle motion. Natural indoor lighting.",
      "image_url": "[STORYBOARD_FRAME_URL]",
      "duration": "5",
      "aspect_ratio": "9:16",
      "cfg_scale": 0.5
    }
  }'

Motion Prompt Templates

These are the bread and butter of natural-looking UGC video. Copy and customize for each shot type.

Motion
The person looks directly at camera and speaks naturally with slight head tilts and hand gestures. Subtle handheld camera motion. Warm natural lighting.
Motion
The person slowly raises the product into frame, turns it to show the label, then looks at camera with an excited expression. Smooth handheld motion.
Motion
Close-up of hands opening/applying/using [product]. Natural movement, slight camera drift as if filmed on a phone propped up nearby.
Motion
The person's eyes widen with surprise, then breaks into a genuine smile. Slight lean forward toward camera. Natural selfie angle motion.
Motion
First-person POV looking down at hands opening a package. Pull product out, hold it up. Slight natural hand shake as if filming with one hand.

Multi-Shot Video (Kling 3.0)

Multi-Shot
[0:04] Shot 1: Wide shot, woman sitting on couch, picks up phone, sees product ad, intrigued expression [4:08] Shot 2: Medium shot, woman holding product, examining it closely, impressed expression [8:12] Shot 3: Close-up selfie angle, woman talking to camera excitedly about product, natural gestures

Polling for Completion

Bash
TASK_ID="your-task-id"
while true; do
  RESULT=$(curl -s "https://api.kie.ai/api/v1/jobs/recordInfo?taskId=$TASK_ID" \
    -H "Authorization: Bearer YOUR_API_KEY")
  STATE=$(echo $RESULT | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['state'])")
  if [ "$STATE" = "success" ]; then
    VIDEO_URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
    echo "Video: $VIDEO_URL"
    break
  elif [ "$STATE" = "failed" ]; then
    echo "Failed - adjust prompt and retry"
    break
  fi
  echo "Status: $STATE - waiting..."
  sleep 15
done

05

Voiceover ElevenLabs

Generate the UGC script voiceover separately, then sync with video. Voice selection is critical — the wrong voice kills authenticity instantly.

Voice Selection Tips

  • Match the character — pick voices that fit the apparent age/ethnicity
  • Indian market — use Indian-accented English voices
  • Authenticity over polish — choose slightly imperfect, conversational voices. Avoid "announcer" voices.
  • Voice cloning — use ElevenLabs voice cloning if you have a reference voice

Script Timing Rules

💡
  • Match voiceover pacing to shot durations
  • Leave 0.5s breathing room between shots
  • Hook line: fast, energetic
  • Product explanation: slower, clear
  • CTA: upbeat, direct

06

Lip Sync + Assembly CapCut / ElevenLabs Flows

The final step: combine your animated clips with voiceover, add lip sync, captions, and music.

ElevenLabs Flows combines image/video models with TTS, lip-sync, sound effects, and music in one workspace. Upload your animated video + voiceover and get lip-synced output.

  1. Import all animated video clips into CapCut
  2. Import voiceover audio track
  3. Align clips to audio timing
  4. Add captions/subtitles (CapCut auto-generates these)
  5. Add background music (subtle, low volume)
  6. Export in platform-specific format

For programmatic lip sync at scale. Best when you're generating 10+ ads per batch and need automation.


Complete Batch Workflow Full automation script

This bash script generates a full 6-shot UGC ad programmatically — fires all image tasks in parallel, collects URLs, then fires all video tasks. Total time: ~90 seconds.

Bash — Full Pipeline
#!/bin/bash
AUTH="Authorization: Bearer YOUR_API_KEY"
API="https://api.kie.ai/api/v1/jobs"

# Define your 6 shot prompts
SHOTS=(
  "A 28-year-old Indian woman in casual clothes, selfie angle, excited expression, modern kitchen background, iPhone quality, photorealistic, no text"
  "Close-up of the same woman talking to camera, natural indoor lighting, warm expression, photorealistic, no text"
  "Medium shot, same woman holding [PRODUCT] up to camera, showing the label, impressed look, photorealistic, no text"
  "POV close-up of hands using [PRODUCT], natural lighting, kitchen counter, photorealistic, no text"
  "Same woman, medium shot, reacting with genuine delight after using product, natural lighting, photorealistic, no text"
  "Same woman, selfie angle, big smile, holding product near face, giving thumbs up, photorealistic, no text"
)

# Motion prompts for each shot
MOTIONS=(
  "Woman picks up phone, looks at it, then looks at camera with excited expression. Subtle handheld motion."
  "Woman speaks to camera with natural gestures, slight head tilts, warm smile. Handheld selfie feel."
  "Woman holds up product, slowly rotates it to show label, nods approvingly. Natural motion."
  "Hands open product, pour/apply/use it. Slight camera shake as if phone propped nearby."
  "Woman touches face/hair/skin, reacts with genuine surprise and delight. Natural movement."
  "Woman leans toward camera, speaks enthusiastically, gives thumbs up. Energetic selfie motion."
)

echo "=== PHASE 1: Generating storyboard images ==="
IMAGE_TASKS=()
for i in "${!SHOTS[@]}"; do
  RESP=$(curl -s -X POST "$API/createTask" \
    -H "$AUTH" -H "Content-Type: application/json" \
    -d "{\"model\":\"nano-banana-2\",\"input\":{\"prompt\":\"${SHOTS[$i]}\"}}")
  TID=$(echo $RESP | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['taskId'])")
  IMAGE_TASKS+=("$TID")
  echo "Shot $((i+1)) image task: $TID"
done

echo "Waiting 20s for images..."
sleep 20

echo "=== Collecting image URLs ==="
IMAGE_URLS=()
for TID in "${IMAGE_TASKS[@]}"; do
  RESULT=$(curl -s "$API/recordInfo?taskId=$TID" -H "$AUTH")
  URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
  IMAGE_URLS+=("$URL")
  echo "Image: $URL"
done

echo "=== PHASE 2: Generating video clips ==="
VIDEO_TASKS=()
for i in "${!IMAGE_URLS[@]}"; do
  PAYLOAD=$(python3 -c "
import json
print(json.dumps({
  'model': 'kling-v2.1-pro-i2v',
  'input': {
    'prompt': '${MOTIONS[$i]}',
    'image_url': '${IMAGE_URLS[$i]}',
    'duration': '5',
    'aspect_ratio': '9:16',
    'cfg_scale': 0.5
  }
}))
  ")
  RESP=$(curl -s -X POST "$API/createTask" \
    -H "$AUTH" -H "Content-Type: application/json" \
    -d "$PAYLOAD")
  TID=$(echo $RESP | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['taskId'])")
  VIDEO_TASKS+=("$TID")
  echo "Shot $((i+1)) video task: $TID"
done

echo "Waiting 60s for videos..."
sleep 60

echo "=== Collecting video URLs ==="
for TID in "${VIDEO_TASKS[@]}"; do
  RESULT=$(curl -s "$API/recordInfo?taskId=$TID" -H "$AUTH")
  URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
  echo "Video: $URL"
done

Prompt Library Ready-to-use by vertical

Copy-paste these prompts and customize with your product details. Each includes both image and motion prompts.

Image
A 25-year-old Indian woman in a bathroom mirror, morning routine, holding [product] near her face, natural morning light from window, dewy skin, casual pajamas, photorealistic, iPhone selfie quality, no text
Motion
She dabs product on her cheek, blends it gently, then leans into the mirror to check, slight smile forming. Natural selfie camera motion.
Image
A 30-year-old Indian woman in a modern kitchen, casual activewear, holding a glass/pouch of [product], bright kitchen lighting, clean counter, photorealistic, iPhone quality, no text
Motion
She pours/opens the product, takes a sip/bite, pauses, then nods with a pleasantly surprised expression. Slight handheld camera movement.
Image
A 22-year-old Indian man sitting at a cafe, laptop open, holding phone showing [app screen], casual streetwear, natural cafe lighting, photorealistic, no text
Motion
He taps the phone screen, scrolls, then turns the phone toward camera to show the screen. Excited head nod. Cafe ambiance.
Image
A 30-year-old Indian woman kneeling on the floor with a golden retriever, holding [product] treat pouch, living room setting, warm lighting, photorealistic, no text
Motion
She opens the pouch, offers a treat to the dog, dog eats it eagerly, she laughs and pets the dog. Natural movement.
Image
A 26-year-old Indian woman in front of a full-length mirror, wearing [product], adjusting it, modern bedroom, natural daylight, OOTD vibe, photorealistic, no text
Motion
She turns side to side checking the fit, smooths the fabric, then faces camera with a confident smile and slight pose. Mirror selfie feel.

$

Cost Breakdown Per ad and monthly

$2.24 per ad

vs. $150-500 for a traditional UGC creator

ComponentPer Ad (6 shots)Monthly (30 ads)
Nano Banana 2 images (6x)~$0.24~$7.20
Kling Pro video (6x 5s)~$1.50~$45.00
ElevenLabs voiceover~$0.50~$15.00
Total~$2.24~$67.20

Quality Checklist Before publishing

Run through this checklist before publishing any AI UGC ad. State persists in your browser.


!

Known Limitations And workarounds

IssueWorkaround
Character face drifts between shotsUse the 3x3 grid strategy; regenerate if >15% drift
Product text/labels blur in videoFix at image stage with upscaling before animating
Lip sync imperfectUse captions to compensate; keep talking shots to 3-4s
Multi-person scenes inconsistentStick to single-person UGC format
Hands/fingers sometimes glitchUse medium/wide shots; avoid extreme hand close-ups
Video feels too smooth/AIAdd slight grain + handheld shake in CapCut

Export Settings Platform-specific specs

PlatformAspectDurationResolution
TikTok9:166-15s1080x1920
Instagram Reels9:166-30s1080x1920
Instagram Feed1:115-60s1080x1080
Facebook Feed1:1 or 4:515-30s1080x1350
YouTube Shorts9:1615-60s1080x1920
YouTube Pre-roll16:915-30s1920x1080
Meta Story Ads9:165-15s1080x1920