compose-cinematic-mv
New音乐先行、踩点剪辑的电影感意境短视频生成器(Seedream+Kling)。Music-first, beat-synced cinematic MV generator — a Claude Code skill.
Summary
0, animating them with Kling, and cutting the shots on the music's beats using ffmpeg.
- It is designed for developers who want to programmatically create mood-driven, beat-synced short videos in the contemplative 'lone figure in a vast landscape' style, without manual video editing.
Overview
name: compose-cinematic-mv description: Generate a short cinematic music-video from a text intention + a piece of music, in the contemplative "lone figure in a vast landscape" style of Xiaohongshu creator Rainshae. Music-first, beat-synced editing. Use when the user wants to 生成/制作 一段 电影感/氛围感/卡点 短视频 or MV from music + an imagery/scene idea, turn a song + a mood into a cinematic vertical clip, or make a Rainshae-style 意境 video. Triggers: 生成一段视频, 做个MV, 卡点视频, 音乐+文字生成视频, cinematic short video, Seedream+Kling 出片. Creates ORIGINAL scenes from learned style patterns — never copies the reference creator's specific scenes. ---
compose-cinematic-mv — music + intention → beat-synced cinematic MV
Turn a text intention (a mood / imagery / scene idea) plus music into a short vertical cinematic MV: design an original scene → first-frames with Seedream 4.0 (text-to-image) → motion with Kling (image-to-video) → cut the shots on the music's beats with ffmpeg. The aesthetic is modeled on Xiaohongshu creator Rainshae (lone figure in a vast landscape, cold cinematic mood, 3:4) — see reference/style-patterns.md.
The feeling comes from cuts riding the music, with short information-dense shots — not from pretty stills alone. So this skill is music-first and beat-synced.
Preconditions
- •macOS with
ffmpeg/ffprobe,curl,openssl,bc. - •Credentials in
~/.config/rainshae-mv/credentials.env(chmod 600):ARK_API_KEY(Volcengine Ark / Seedream),KLING_AK,KLING_SK(Kling open platform). Source it:set -a; . ~/.config/rainshae-mv/credentials.env; set +a. - •A working internet path to
ark.cn-beijing.volces.comandapi-beijing.klingai.com— see Network gotchas below; a local proxy will break these unless bypassed.
The workflow (music-first)
1. Design an original scene from the intention
From the user's intention, invent a NEW scene in the Rainshae idiom (title formula {年份},你{第二人称处境}, lone figure / vast nature / strong mood / 3:4). Do not reproduce the reference creator's specific scenes (no "英国白崖", "长安", etc.). Read reference/style-patterns.md and skim ~/Documents/xhs-rainshae/Rainshae_视频提示词反推.md for the pattern, then create fresh.
2. Find matching music FIRST, then analyze its beats
Pick a background track that fits the scene's mood before planning shots. Reference the example videos for what fits: slow (~60–80 BPM), solo piano / ambient / strings, melancholic-serene ("深夜疗愈"), with phrase rise-and-fall (not a static drone).
- •Source royalty-free / public-domain audio (e.g. archive.org public-domain piano — Satie Gymnopédie/Gnossienne, Chopin nocturnes, Debussy; or licensed ambient). archive.org direct files work; Pixabay blocks scraping (403).
- •Avoid noisy recordings. archive.org "Great 78 Project" transfers (identifiers starting
78_or containinggbia) have heavy surface hiss — denoise can't save them. Prefer modern/clean uploads. Verify the noise floor before committing: the quietest 0.5s RMS should be very low for a clean digital recording (≲ −60 dB); a noisy 78 sits around −40 dB. Check with:
- •Download into the working dir, take a ~20–30s excerpt, normalize loudness, fade:
ffmpeg -ss <start> -t 26 -i src.mp3 -af "loudnorm=I=-16:TP=-1.5,afade=t=in:st=0:d=1.2,afade=t=out:st=23.5:d=2.5" -ar 44100 -ac 2 excerpt.mp3
- •Detect downbeats:
scripts/find_beats.sh excerpt.mp3→ space-separated beat times. These are your cut points.
3. Plan shots ON the beats (short!)
- •Short shots, ~2–3s each (5s reads as boring — each shot carries little info). Set the timeline cut points to a subset of the detected downbeats.
- •One distinct generated video per shot — no reusing/re-cutting one clip into multiple shots (that was a cost-saving exception in the prototype, not the method).
- •Mix shot types across the scene (wide establishing → medium figure → detail/atmosphere → closing long shot), camera moves slow.
4. Generate first-frames — Seedream, in parallel
For each shot write a model-ready English image prompt (subject, setting, era, framing, light, palette, mood, lens, film grain) ending in the right aspect. Generate concurrently (Seedream allows it):
set -a; . ~/.config/rainshae-mv/credentials.env; set +a
scripts/gen_image.sh "<prompt s1>" shot1.jpeg &
scripts/gen_image.sh "<prompt s2>" shot2.jpeg &
... ; waitView each frame (Read tool) and regenerate any that miss before spending video credits.
5. Generate per-shot videos — Kling, parallel, short, fast poll
Submit up to 5 concurrent tasks; generate short single clips (duration 5, matching the ~2–3s shots — cheaper and fits fast cutting). Poll every ~6–8s.
scripts/kling.sh submit shot1.jpeg "<video prompt s1>" kling-v1-6 5 # -> task_id
# ...submit all, collect task_ids, then poll each until task_status=succeed
scripts/kling.sh poll <task_id> # -> url when done
curl --noproxy '*' -o clip1.mp4 "<url>" # CDN needs --noproxyParse with grep (avoid python, see gotchas): status grep -oE '"task_status":"[a-z]+"'; url grep -oE '"url":"[^"]+"'.
6. Assemble beat-synced
Build plan.tsv (one line per shot, in order): clip<TAB>inpoint<TAB>duration, where the cumulative durations equal your chosen downbeat cut points. Then:
scripts/assemble.sh plan.tsv excerpt.mp3 out.mp4Verify: probe duration, extract a couple frames at cut boundaries (Read) to confirm each cut changes the image. Deliver the mp4 (or give its path if file-delivery is unavailable).
Things that will bite you (all learned the hard way)
- •Local proxy breaks the APIs. If the user runs ClashX/Clash (e.g.
HTTPS_PROXY=127.0.0.1:7890), TLS tovolces.com/klingai.comgets reset. Fix on their side: rule mode + DIRECT rules forvolces.comandklingai.com. For downloads, alwayscurl --noproxy '*'— result CDNs (kechuangai.comfor Kling, Ark image CDN) are NOT covered by those rules and the proxy MITMs them (self-signed-cert / truncated downloads). - •Prefer bash + curl + openssl over python. In this sandbox python intermittently throws
PermissionError: Operation not permittedon stdlib import (_path_importer_cache). All scripts here are python-free. (If you do use python for SSL, python.org's interpreter lacks CA certs →ssl.create_default_context(cafile="/etc/ssl/cert.pem").) - •Beat detection via ffmpeg, not librosa. numpy/librosa hit the same PermissionError;
find_beats.shuses an ffmpeg RMS envelope. - •Kling caps concurrency at 5. Submitting a 6th in-flight task returns
1303 parallel task over resource pack limit. Submit in batches of ≤5: poll+download the first batch, then submit the next. - •Account setup is the usual blocker, not code. Ark: the Seedream model must be activated in the Ark console (
ModelNotOpen). Kling: the API resource pack must be purchased (1102 Account balance not enough) — distinct from klingai.com consumer credits. - •Quoting: prompts passed to the scripts must contain no double-quotes (JSON is built inline). Keep prompts to commas/letters.
- •Older files can become unreadable. Files downloaded/created in earlier sessions may carry
com.apple.quarantine/provenanceand the sandbox denies access (Operation not permitted) — keep a run's working files together and don't depend on re-reading prior-session media. - •Model ids: Seedream
doubao-seedream-4-0-250828; Klingkling-v1-6(std). Endpoints:ark.cn-beijing.volces.com/api/v3/images/generations,api-beijing.klingai.com/v1/videos/image2video.
Files
- •
SKILL.md— this playbook. - •
scripts/find_beats.sh— audio → downbeat times (ffmpeg RMS envelope). - •
scripts/gen_image.sh— Seedream 4.0 text-to-image (curl); run in parallel. - •
scripts/kling.sh— Kling image-to-video submit/poll (openssl JWT). - •
scripts/assemble.sh— beat-synced assembler: (clip, in-point, duration) shots + music. - •
reference/style-patterns.md— the Rainshae style规律 (title/visual/pacing/music) for creating original scenes.
Install & Usage
mkdir -p .claude/skillsAdd the configuration to .claude/skills/compose-cinematic-mv.md
/compose-cinematic-mvUse Cases
Usage Examples
/compose-cinematic-mv Generate a cinematic MV from the song 'Midnight Dreams' with the intention: 'A lone figure walking through a neon-lit rain-soaked city street at night, melancholic mood.'
Create a beat-synced short video using the track 'Ocean Waves' and the scene: 'A solitary silhouette standing on a cliff overlooking a stormy sea, cold cinematic mood.'
/compose-cinematic-mv Make a Rainshae-style 意境 video from the song 'Autumn Leaves' with intention: 'A person sitting on a bench in a misty forest, leaves falling, contemplative atmosphere.'
Security Audits
Frequently Asked Questions
What is compose-cinematic-mv?
This skill transforms a text intention and a piece of music into a short vertical cinematic music video by generating original scenes with Seedream 4.0, animating them with Kling, and cutting the shots on the music's beats using ffmpeg. It is designed for developers who want to programmatically create mood-driven, beat-synced short videos in the contemplative 'lone figure in a vast landscape' style, without manual video editing.
How to install compose-cinematic-mv?
To install compose-cinematic-mv: create the skills directory (mkdir -p .claude/skills), then add the config to .claude/skills/compose-cinematic-mv.md. Finally, /compose-cinematic-mv in Claude Code.
What is compose-cinematic-mv best for?
compose-cinematic-mv is a community categorized under Development. Created by YijiaDuan.
What can I use compose-cinematic-mv for?
compose-cinematic-mv is useful for: Generate a cinematic MV from a song and a mood description for social media content.; Create beat-synced short videos for music promotion or album teasers.; Rapidly prototype visual concepts for music videos using AI-generated scenes.; Automate the production of atmospheric vertical clips for platforms like Xiaohongshu or TikTok.; Turn a poetic scene idea into a short film with synchronized music and motion.; Produce multiple variations of a music video by tweaking the text intention or music input..