Create UGC Video Ads
with Zero Actors
A complete 6-step pipeline using kie.ai's Nano Banana 2 for image generation and Kling AI for video animation. From script to final ad in under 5 minutes.
Script & Concept Claude AI
Before generating any visuals, create the ad concept and shot list. Every great UGC ad follows a proven structure: hook, problem, discovery, demo, result, CTA.
Pro Tip: Write the script first, then derive your storyboard from it. The visual should serve the story — not the other way around.
Script Prompt for Claude
Ad Format Templates
| Format | Duration | Shots | Best For |
|---|---|---|---|
| Quick hook | 6-8s | 3-4 | TikTok / Reels |
| Standard UGC | 15-20s | 6-8 | Feed ads |
| Testimonial | 25-35s | 8-12 | YouTube / Meta |
| Problem-solution | 15s | 5-6 | Story ads |
Generate Storyboard Images Nano Banana 2 via kie.ai
Use kie.ai's Nano Banana 2 model to generate photorealistic images for each shot in your storyboard. The key technique: the 3x3 grid strategy for character consistency.
API Configuration
| Parameter | Value |
|---|---|
| Endpoint | POST https://api.kie.ai/api/v1/jobs/createTask |
| Poll | GET https://api.kie.ai/api/v1/jobs/recordInfo?taskId={id} |
| Model | nano-banana-2 |
The 3x3 Grid Strategy
Why a grid? Generating all 9 frames in a single image keeps the character looking consistent across all shots. Individual generations will drift.
Single Frame API Call
curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nano-banana-2",
"input": {
"prompt": "A 25-year-old Indian woman in a casual t-shirt, sitting on a couch in a modern living room, holding [PRODUCT] up to camera with an excited expression, selfie angle, iPhone quality, natural window lighting, photorealistic, no text"
}
}'
Image Prompt Rules
"photorealistic, iPhone quality, natural lighting, no text, no watermarks"
These five keywords ensure your output looks like real UGC, not AI-generated marketing content.
Always specify "Indian woman/man" explicitly in your prompts. Without this, Nano Banana 2 defaults to Western-looking subjects.
Describe your character in exactly the same terms across all prompts — same age, hair color, clothing, and distinguishing features. Any variation will cause drift.
- 9:16 — Vertical (TikTok, Reels, Shorts, Stories)
- 1:1 — Square (Instagram Feed, Facebook Feed)
- 16:9 — Horizontal (YouTube, pre-roll ads)
- Text in images (always say "no text")
- Multiple people (hard to control consistency)
- Extreme poses or complex hand positions
- Branded backgrounds (logos, storefronts)
Detail Refinement Nano Banana 2 Image-to-Image
Critical Rule: Fix ALL image issues before moving to video. Problems are exponentially harder to fix at the video stage. Inspect product details at 100% zoom.
If product details are blurry or the character drifted between frames, use image-to-image editing to upscale and correct before animating.
API Call with Image Input
curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nano-banana-2",
"input": {
"prompt": "Upscale and sharpen the product details...",
"image_url": "[URL_OF_BLURRY_FRAME]"
}
}'
Animate to Video Kling AI via kie.ai
Send each refined storyboard image to Kling AI's image-to-video endpoint. Motion prompts are the secret sauce — describing specific natural human movements makes AI video feel real.
Kling Model Pricing
| Model | Resolution | 5s Price | 10s Price | Use Case |
|---|---|---|---|---|
| Standard | 720p | $0.125 | $0.25 | Testing / drafts |
| Pro | 1080p | $0.25 | $0.50 | Final ads |
| Master | 1080p+ | $0.80 | $1.60 | Hero content |
Image-to-Video API Call
curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kling-v2.1-pro-i2v",
"input": {
"prompt": "The woman looks at the camera and starts talking with natural hand gestures, slight head movements, warm smile. Handheld camera feel with subtle motion. Natural indoor lighting.",
"image_url": "[STORYBOARD_FRAME_URL]",
"duration": "5",
"aspect_ratio": "9:16",
"cfg_scale": 0.5
}
}'
Motion Prompt Templates
These are the bread and butter of natural-looking UGC video. Copy and customize for each shot type.
Multi-Shot Video (Kling 3.0)
Polling for Completion
TASK_ID="your-task-id"
while true; do
RESULT=$(curl -s "https://api.kie.ai/api/v1/jobs/recordInfo?taskId=$TASK_ID" \
-H "Authorization: Bearer YOUR_API_KEY")
STATE=$(echo $RESULT | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['state'])")
if [ "$STATE" = "success" ]; then
VIDEO_URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
echo "Video: $VIDEO_URL"
break
elif [ "$STATE" = "failed" ]; then
echo "Failed - adjust prompt and retry"
break
fi
echo "Status: $STATE - waiting..."
sleep 15
done
Voiceover ElevenLabs
Generate the UGC script voiceover separately, then sync with video. Voice selection is critical — the wrong voice kills authenticity instantly.
Voice Selection Tips
- Match the character — pick voices that fit the apparent age/ethnicity
- Indian market — use Indian-accented English voices
- Authenticity over polish — choose slightly imperfect, conversational voices. Avoid "announcer" voices.
- Voice cloning — use ElevenLabs voice cloning if you have a reference voice
Script Timing Rules
- Match voiceover pacing to shot durations
- Leave 0.5s breathing room between shots
- Hook line: fast, energetic
- Product explanation: slower, clear
- CTA: upbeat, direct
Lip Sync + Assembly CapCut / ElevenLabs Flows
The final step: combine your animated clips with voiceover, add lip sync, captions, and music.
ElevenLabs Flows combines image/video models with TTS, lip-sync, sound effects, and music in one workspace. Upload your animated video + voiceover and get lip-synced output.
- Import all animated video clips into CapCut
- Import voiceover audio track
- Align clips to audio timing
- Add captions/subtitles (CapCut auto-generates these)
- Add background music (subtle, low volume)
- Export in platform-specific format
For programmatic lip sync at scale. Best when you're generating 10+ ads per batch and need automation.
Complete Batch Workflow Full automation script
This bash script generates a full 6-shot UGC ad programmatically — fires all image tasks in parallel, collects URLs, then fires all video tasks. Total time: ~90 seconds.
#!/bin/bash
AUTH="Authorization: Bearer YOUR_API_KEY"
API="https://api.kie.ai/api/v1/jobs"
# Define your 6 shot prompts
SHOTS=(
"A 28-year-old Indian woman in casual clothes, selfie angle, excited expression, modern kitchen background, iPhone quality, photorealistic, no text"
"Close-up of the same woman talking to camera, natural indoor lighting, warm expression, photorealistic, no text"
"Medium shot, same woman holding [PRODUCT] up to camera, showing the label, impressed look, photorealistic, no text"
"POV close-up of hands using [PRODUCT], natural lighting, kitchen counter, photorealistic, no text"
"Same woman, medium shot, reacting with genuine delight after using product, natural lighting, photorealistic, no text"
"Same woman, selfie angle, big smile, holding product near face, giving thumbs up, photorealistic, no text"
)
# Motion prompts for each shot
MOTIONS=(
"Woman picks up phone, looks at it, then looks at camera with excited expression. Subtle handheld motion."
"Woman speaks to camera with natural gestures, slight head tilts, warm smile. Handheld selfie feel."
"Woman holds up product, slowly rotates it to show label, nods approvingly. Natural motion."
"Hands open product, pour/apply/use it. Slight camera shake as if phone propped nearby."
"Woman touches face/hair/skin, reacts with genuine surprise and delight. Natural movement."
"Woman leans toward camera, speaks enthusiastically, gives thumbs up. Energetic selfie motion."
)
echo "=== PHASE 1: Generating storyboard images ==="
IMAGE_TASKS=()
for i in "${!SHOTS[@]}"; do
RESP=$(curl -s -X POST "$API/createTask" \
-H "$AUTH" -H "Content-Type: application/json" \
-d "{\"model\":\"nano-banana-2\",\"input\":{\"prompt\":\"${SHOTS[$i]}\"}}")
TID=$(echo $RESP | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['taskId'])")
IMAGE_TASKS+=("$TID")
echo "Shot $((i+1)) image task: $TID"
done
echo "Waiting 20s for images..."
sleep 20
echo "=== Collecting image URLs ==="
IMAGE_URLS=()
for TID in "${IMAGE_TASKS[@]}"; do
RESULT=$(curl -s "$API/recordInfo?taskId=$TID" -H "$AUTH")
URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
IMAGE_URLS+=("$URL")
echo "Image: $URL"
done
echo "=== PHASE 2: Generating video clips ==="
VIDEO_TASKS=()
for i in "${!IMAGE_URLS[@]}"; do
PAYLOAD=$(python3 -c "
import json
print(json.dumps({
'model': 'kling-v2.1-pro-i2v',
'input': {
'prompt': '${MOTIONS[$i]}',
'image_url': '${IMAGE_URLS[$i]}',
'duration': '5',
'aspect_ratio': '9:16',
'cfg_scale': 0.5
}
}))
")
RESP=$(curl -s -X POST "$API/createTask" \
-H "$AUTH" -H "Content-Type: application/json" \
-d "$PAYLOAD")
TID=$(echo $RESP | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['taskId'])")
VIDEO_TASKS+=("$TID")
echo "Shot $((i+1)) video task: $TID"
done
echo "Waiting 60s for videos..."
sleep 60
echo "=== Collecting video URLs ==="
for TID in "${VIDEO_TASKS[@]}"; do
RESULT=$(curl -s "$API/recordInfo?taskId=$TID" -H "$AUTH")
URL=$(echo $RESULT | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(json.loads(d['resultJson'])['resultUrls'][0])")
echo "Video: $URL"
done
Prompt Library Ready-to-use by vertical
Copy-paste these prompts and customize with your product details. Each includes both image and motion prompts.
Cost Breakdown Per ad and monthly
vs. $150-500 for a traditional UGC creator
| Component | Per Ad (6 shots) | Monthly (30 ads) |
|---|---|---|
| Nano Banana 2 images (6x) | ~$0.24 | ~$7.20 |
| Kling Pro video (6x 5s) | ~$1.50 | ~$45.00 |
| ElevenLabs voiceover | ~$0.50 | ~$15.00 |
| Total | ~$2.24 | ~$67.20 |
Quality Checklist Before publishing
Run through this checklist before publishing any AI UGC ad. State persists in your browser.
Known Limitations And workarounds
| Issue | Workaround |
|---|---|
| Character face drifts between shots | Use the 3x3 grid strategy; regenerate if >15% drift |
| Product text/labels blur in video | Fix at image stage with upscaling before animating |
| Lip sync imperfect | Use captions to compensate; keep talking shots to 3-4s |
| Multi-person scenes inconsistent | Stick to single-person UGC format |
| Hands/fingers sometimes glitch | Use medium/wide shots; avoid extreme hand close-ups |
| Video feels too smooth/AI | Add slight grain + handheld shake in CapCut |
Export Settings Platform-specific specs
| Platform | Aspect | Duration | Resolution |
|---|---|---|---|
| TikTok | 9:16 | 6-15s | 1080x1920 |
| Instagram Reels | 9:16 | 6-30s | 1080x1920 |
| Instagram Feed | 1:1 | 15-60s | 1080x1080 |
| Facebook Feed | 1:1 or 4:5 | 15-30s | 1080x1350 |
| YouTube Shorts | 9:16 | 15-60s | 1080x1920 |
| YouTube Pre-roll | 16:9 | 15-30s | 1920x1080 |
| Meta Story Ads | 9:16 | 5-15s | 1080x1920 |