AI UGC Creator Pipeline

01

Script & Concept Claude AI

Before generating any visuals, create the ad concept and shot list. Every great UGC ad follows a proven structure: hook, problem, discovery, demo, result, CTA.

💡

Pro Tip: Write the script first, then derive your storyboard from it. The visual should serve the story — not the other way around.

Script Prompt for Claude

Prompt

Create a UGC-style ad script for [PRODUCT]. The ad should feel like a real person talking to camera sharing their genuine experience. Include: 1. A scroll-stopping hook (first 2 seconds) 2. Problem identification (3-5 seconds) 3. Product discovery moment (5-8 seconds) 4. Usage/demo moment (8-12 seconds) 5. Result/transformation (12-15 seconds) 6. CTA (final 2-3 seconds) For each shot, describe: - Camera angle (selfie, medium, close-up, POV) - Character action and expression - Background/setting - Product placement

Ad Format Templates

Format	Duration	Shots	Best For
Quick hook	6-8s	3-4	TikTok / Reels
Standard UGC	15-20s	6-8	Feed ads
Testimonial	25-35s	8-12	YouTube / Meta
Problem-solution	15s	5-6	Story ads

02

Generate Storyboard Images Nano Banana 2 via kie.ai

Use kie.ai's Nano Banana 2 model to generate photorealistic images for each shot in your storyboard. The key technique: the 3x3 grid strategy for character consistency.

API Configuration

Parameter	Value
Endpoint	POST https://api.kie.ai/api/v1/jobs/createTask
Poll	GET https://api.kie.ai/api/v1/jobs/recordInfo?taskId={id}
Model	nano-banana-2

The 3x3 Grid Strategy

📌

Why a grid? Generating all 9 frames in a single image keeps the character looking consistent across all shots. Individual generations will drift.

Image Prompt

A 3x3 grid of 9 sequential shots for a UGC-style video ad. Each cell is a distinct camera angle of the same person. Character: [Age] [ethnicity] [gender], [hair], [clothing], [distinguishing features] Setting: [Location — kitchen, bathroom, living room, outdoors] Product: [Exact product description with colors and branding] Row 1 (top): 1. Selfie angle, excited expression, holding phone, [setting] background 2. Close-up face, talking to camera, natural lighting 3. Medium shot, holding [product], showing it to camera Row 2 (middle): 4. POV shot looking down at [product] in hands 5. Close-up of [product] being used/applied/opened 6. Medium shot, character reacting positively to product Row 3 (bottom): 7. Before/after comparison layout 8. Character smiling, holding product near face 9. Wide shot, character in full setting, product visible Photorealistic, iPhone quality, natural lighting, no text, no watermarks

Single Frame API Call

Bash

curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nano-banana-2",
    "input": {
      "prompt": "A 25-year-old Indian woman in a casual t-shirt, sitting on a couch in a modern living room, holding [PRODUCT] up to camera with an excited expression, selfie angle, iPhone quality, natural window lighting, photorealistic, no text"
    }
  }'

Image Prompt Rules

"photorealistic, iPhone quality, natural lighting, no text, no watermarks"

These five keywords ensure your output looks like real UGC, not AI-generated marketing content.

Always specify "Indian woman/man" explicitly in your prompts. Without this, Nano Banana 2 defaults to Western-looking subjects.

Describe your character in exactly the same terms across all prompts — same age, hair color, clothing, and distinguishing features. Any variation will cause drift.

9:16 — Vertical (TikTok, Reels, Shorts, Stories)
1:1 — Square (Instagram Feed, Facebook Feed)
16:9 — Horizontal (YouTube, pre-roll ads)

Text in images (always say "no text")
Multiple people (hard to control consistency)
Extreme poses or complex hand positions
Branded backgrounds (logos, storefronts)

03

Detail Refinement Nano Banana 2 Image-to-Image

⚠

Critical Rule: Fix ALL image issues before moving to video. Problems are exponentially harder to fix at the video stage. Inspect product details at 100% zoom.

If product details are blurry or the character drifted between frames, use image-to-image editing to upscale and correct before animating.

Upscale Prompt

Upscale and sharpen the product details in this photo. The person is a [character description] holding [product]. Restore sharp product branding and labels while maintaining the natural UGC feel. Photorealistic, no text overlay.

API Call with Image Input

Bash

curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nano-banana-2",
    "input": {
      "prompt": "Upscale and sharpen the product details...",
      "image_url": "[URL_OF_BLURRY_FRAME]"
    }
  }'

04

Animate to Video Kling AI via kie.ai

Send each refined storyboard image to Kling AI's image-to-video endpoint. Motion prompts are the secret sauce — describing specific natural human movements makes AI video feel real.

Kling Model Pricing

Model	Resolution	5s Price	10s Price	Use Case
Standard	720p	$0.125	$0.25	Testing / drafts
Pro	1080p	$0.25	$0.50	Final ads
Master	1080p+	$0.80	$1.60	Hero content

Image-to-Video API Call

Bash

curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kling-v2.1-pro-i2v",
    "input": {
      "prompt": "The woman looks at the camera and starts talking with natural hand gestures, slight head movements, warm smile. Handheld camera feel with subtle motion. Natural indoor lighting.",
      "image_url": "[STORYBOARD_FRAME_URL]",
      "duration": "5",
      "aspect_ratio": "9:16",
      "cfg_scale": 0.5
    }
  }'

Motion Prompt Templates

These are the bread and butter of natural-looking UGC video. Copy and customize for each shot type.

Motion

The person looks directly at camera and speaks naturally with slight head tilts and hand gestures. Subtle handheld camera motion. Warm natural lighting.

Motion

The person slowly raises the product into frame, turns it to show the label, then looks at camera with an excited expression. Smooth handheld motion.

Motion

Close-up of hands opening/applying/using [product]. Natural movement, slight camera drift as if filmed on a phone propped up nearby.

Motion

The person's eyes widen with surprise, then breaks into a genuine smile. Slight lean forward toward camera. Natural selfie angle motion.

Motion

First-person POV looking down at hands opening a package. Pull product out, hold it up. Slight natural hand shake as if filming with one hand.

Multi-Shot Video (Kling 3.0)

Multi-Shot

[0:04] Shot 1: Wide shot, woman sitting on couch, picks up phone, sees product ad, intrigued expression [4:08] Shot 2: Medium shot, woman holding product, examining it closely, impressed expression [8:12] Shot 3: Close-up selfie angle, woman talking to camera excitedly about product, natural gestures

Polling for Completion

Bash

TASK_ID="your-task-id"
while true; do
  RESULT=$(curl -s "https://api.kie.ai/api/v1/jobs/recordInfo?taskId=$TASK_ID" \
    -H "Authorization: Bearer YOUR_API_KEY")
  STATE=$(echo $RESULT | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['state'])")
  if [ "$STATE" = "success" ]; then
    VIDEO_URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
    echo "Video: $VIDEO_URL"
    break
  elif [ "$STATE" = "failed" ]; then
    echo "Failed - adjust prompt and retry"
    break
  fi
  echo "Status: $STATE - waiting..."
  sleep 15
done

05

Voiceover ElevenLabs

Generate the UGC script voiceover separately, then sync with video. Voice selection is critical — the wrong voice kills authenticity instantly.

Voice Selection Tips

Match the character — pick voices that fit the apparent age/ethnicity
Indian market — use Indian-accented English voices
Authenticity over polish — choose slightly imperfect, conversational voices. Avoid "announcer" voices.
Voice cloning — use ElevenLabs voice cloning if you have a reference voice

Script Timing Rules

💡

Match voiceover pacing to shot durations
Leave 0.5s breathing room between shots
Hook line: fast, energetic
Product explanation: slower, clear
CTA: upbeat, direct

06

Lip Sync + Assembly CapCut / ElevenLabs Flows

The final step: combine your animated clips with voiceover, add lip sync, captions, and music.

ElevenLabs Flows combines image/video models with TTS, lip-sync, sound effects, and music in one workspace. Upload your animated video + voiceover and get lip-synced output.

Import all animated video clips into CapCut
Import voiceover audio track
Align clips to audio timing
Add captions/subtitles (CapCut auto-generates these)
Add background music (subtle, low volume)
Export in platform-specific format

For programmatic lip sync at scale. Best when you're generating 10+ ads per batch and need automation.

▶

Complete Batch Workflow Full automation script

This bash script generates a full 6-shot UGC ad programmatically — fires all image tasks in parallel, collects URLs, then fires all video tasks. Total time: ~90 seconds.

Bash — Full Pipeline

#!/bin/bash
AUTH="Authorization: Bearer YOUR_API_KEY"
API="https://api.kie.ai/api/v1/jobs"

# Define your 6 shot prompts
SHOTS=(
  "A 28-year-old Indian woman in casual clothes, selfie angle, excited expression, modern kitchen background, iPhone quality, photorealistic, no text"
  "Close-up of the same woman talking to camera, natural indoor lighting, warm expression, photorealistic, no text"
  "Medium shot, same woman holding [PRODUCT] up to camera, showing the label, impressed look, photorealistic, no text"
  "POV close-up of hands using [PRODUCT], natural lighting, kitchen counter, photorealistic, no text"
  "Same woman, medium shot, reacting with genuine delight after using product, natural lighting, photorealistic, no text"
  "Same woman, selfie angle, big smile, holding product near face, giving thumbs up, photorealistic, no text"
)

# Motion prompts for each shot
MOTIONS=(
  "Woman picks up phone, looks at it, then looks at camera with excited expression. Subtle handheld motion."
  "Woman speaks to camera with natural gestures, slight head tilts, warm smile. Handheld selfie feel."
  "Woman holds up product, slowly rotates it to show label, nods approvingly. Natural motion."
  "Hands open product, pour/apply/use it. Slight camera shake as if phone propped nearby."
  "Woman touches face/hair/skin, reacts with genuine surprise and delight. Natural movement."
  "Woman leans toward camera, speaks enthusiastically, gives thumbs up. Energetic selfie motion."
)

echo "=== PHASE 1: Generating storyboard images ==="
IMAGE_TASKS=()
for i in "${!SHOTS[@]}"; do
  RESP=$(curl -s -X POST "$API/createTask" \
    -H "$AUTH" -H "Content-Type: application/json" \
    -d "{\"model\":\"nano-banana-2\",\"input\":{\"prompt\":\"${SHOTS[$i]}\"}}")
  TID=$(echo $RESP | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['taskId'])")
  IMAGE_TASKS+=("$TID")
  echo "Shot $((i+1)) image task: $TID"
done

echo "Waiting 20s for images..."
sleep 20

echo "=== Collecting image URLs ==="
IMAGE_URLS=()
for TID in "${IMAGE_TASKS[@]}"; do
  RESULT=$(curl -s "$API/recordInfo?taskId=$TID" -H "$AUTH")
  URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
  IMAGE_URLS+=("$URL")
  echo "Image: $URL"
done

echo "=== PHASE 2: Generating video clips ==="
VIDEO_TASKS=()
for i in "${!IMAGE_URLS[@]}"; do
  PAYLOAD=$(python3 -c "
import json
print(json.dumps({
  'model': 'kling-v2.1-pro-i2v',
  'input': {
    'prompt': '${MOTIONS[$i]}',
    'image_url': '${IMAGE_URLS[$i]}',
    'duration': '5',
    'aspect_ratio': '9:16',
    'cfg_scale': 0.5
  }
}))
  ")
  RESP=$(curl -s -X POST "$API/createTask" \
    -H "$AUTH" -H "Content-Type: application/json" \
    -d "$PAYLOAD")
  TID=$(echo $RESP | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['taskId'])")
  VIDEO_TASKS+=("$TID")
  echo "Shot $((i+1)) video task: $TID"
done

echo "Waiting 60s for videos..."
sleep 60

echo "=== Collecting video URLs ==="
for TID in "${VIDEO_TASKS[@]}"; do
  RESULT=$(curl -s "$API/recordInfo?taskId=$TID" -H "$AUTH")
  URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
  echo "Video: $URL"
done

✎

Prompt Library Ready-to-use by vertical

Copy-paste these prompts and customize with your product details. Each includes both image and motion prompts.

Image

A 25-year-old Indian woman in a bathroom mirror, morning routine, holding [product] near her face, natural morning light from window, dewy skin, casual pajamas, photorealistic, iPhone selfie quality, no text

Motion

She dabs product on her cheek, blends it gently, then leans into the mirror to check, slight smile forming. Natural selfie camera motion.

Image

A 30-year-old Indian woman in a modern kitchen, casual activewear, holding a glass/pouch of [product], bright kitchen lighting, clean counter, photorealistic, iPhone quality, no text

Motion

She pours/opens the product, takes a sip/bite, pauses, then nods with a pleasantly surprised expression. Slight handheld camera movement.

Image

A 22-year-old Indian man sitting at a cafe, laptop open, holding phone showing [app screen], casual streetwear, natural cafe lighting, photorealistic, no text

Motion

He taps the phone screen, scrolls, then turns the phone toward camera to show the screen. Excited head nod. Cafe ambiance.

Image

A 30-year-old Indian woman kneeling on the floor with a golden retriever, holding [product] treat pouch, living room setting, warm lighting, photorealistic, no text

Motion

She opens the pouch, offers a treat to the dog, dog eats it eagerly, she laughs and pets the dog. Natural movement.

Image

A 26-year-old Indian woman in front of a full-length mirror, wearing [product], adjusting it, modern bedroom, natural daylight, OOTD vibe, photorealistic, no text

Motion

She turns side to side checking the fit, smooths the fabric, then faces camera with a confident smile and slight pose. Mirror selfie feel.

$

Cost Breakdown Per ad and monthly

$2.24 per ad

vs. $150-500 for a traditional UGC creator

Component	Per Ad (6 shots)	Monthly (30 ads)
Nano Banana 2 images (6x)	~$0.24	~$7.20
Kling Pro video (6x 5s)	~$1.50	~$45.00
ElevenLabs voiceover	~$0.50	~$15.00
Total	~$2.24	~$67.20

✓

Quality Checklist Before publishing

Run through this checklist before publishing any AI UGC ad. State persists in your browser.

!

Known Limitations And workarounds

Issue	Workaround
Character face drifts between shots	Use the 3x3 grid strategy; regenerate if >15% drift
Product text/labels blur in video	Fix at image stage with upscaling before animating
Lip sync imperfect	Use captions to compensate; keep talking shots to 3-4s
Multi-person scenes inconsistent	Stick to single-person UGC format
Hands/fingers sometimes glitch	Use medium/wide shots; avoid extreme hand close-ups
Video feels too smooth/AI	Add slight grain + handheld shake in CapCut

■

Export Settings Platform-specific specs

Platform	Aspect	Duration	Resolution
TikTok	9:16	6-15s	1080x1920
Instagram Reels	9:16	6-30s	1080x1920
Instagram Feed	1:1	15-60s	1080x1080
Facebook Feed	1:1 or 4:5	15-30s	1080x1350
YouTube Shorts	9:16	15-60s	1080x1920
YouTube Pre-roll	16:9	15-30s	1920x1080
Meta Story Ads	9:16	5-15s	1080x1920

Create UGC Video Adswith Zero Actors

Script & Concept Claude AI

Script Prompt for Claude

Ad Format Templates

Generate Storyboard Images Nano Banana 2 via kie.ai

API Configuration

The 3x3 Grid Strategy

Single Frame API Call

Image Prompt Rules

Detail Refinement Nano Banana 2 Image-to-Image

API Call with Image Input

Animate to Video Kling AI via kie.ai

Kling Model Pricing

Image-to-Video API Call

Motion Prompt Templates

Multi-Shot Video (Kling 3.0)

Polling for Completion

Voiceover ElevenLabs

Voice Selection Tips

Script Timing Rules

Lip Sync + Assembly CapCut / ElevenLabs Flows

Complete Batch Workflow Full automation script

Prompt Library Ready-to-use by vertical

Cost Breakdown Per ad and monthly

Quality Checklist Before publishing

Known Limitations And workarounds

Export Settings Platform-specific specs

Create UGC Video Ads
with Zero Actors