이미지를 자연스러운 동영상으로 변환하는 서비스를 위한 가이드와 소스코드

오아름 샘 2025. 6. 26. 14:57

(G:ing)이미지-비디오 변환 서비스 환경설정 가이드

📋 시스템 요구사항

OS: Windows 11
GPU: NVIDIA GPU (CUDA 12.7 지원)
Python: 3.10
RAM: 최소 16GB (권장 32GB)
Storage: 최소 50GB 여유공간

🚀 환경설정 단계별 가이드

1. 기본 환경 준비

1.1 Python 3.10 설치

# Python 3.10.11 다운로드 및 설치
# https://www.python.org/downloads/release/python-31011/
# 설치 시 "Add Python to PATH" 체크 필수

1.2 CUDA 12.7 설치

# NVIDIA CUDA Toolkit 12.7 다운로드
# https://developer.nvidia.com/cuda-12-7-0-download-archive
# 설치 후 환경변수 확인
nvcc --version

1.3 Git 설치

# Git for Windows 다운로드 및 설치
# https://git-scm.com/download/win

2. 프로젝트 디렉토리 생성 및 가상환경 설정

# 작업 디렉토리 생성
mkdir image_to_video_service
cd image_to_video_service

# Python 가상환경 생성
python -m venv venv

# 가상환경 활성화 (Windows)
venv\Scripts\activate

3. 핵심 라이브러리 설치

3.1 PyTorch 설치 (CUDA 12.1 버전)

# CUDA 12.1과 호환되는 PyTorch 설치
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

3.2 기본 패키지 설치

pip install -r requirements.txt

4. 오픈소스 모델 및 도구 설치

4.1 Stable Video Diffusion (SVD) 설치

# Diffusers 라이브러리를 통한 SVD 설치
pip install diffusers[torch] transformers accelerate
pip install opencv-python pillow numpy

4.2 Real-ESRGAN (이미지 품질 향상)

pip install realesrgan

4.3 추가 도구들

# 얼굴 인식 및 처리
pip install face-recognition mediapipe

# 비디오 처리
pip install moviepy imageio imageio-ffmpeg

# 웹 인터페이스
pip install gradio flask

# 유틸리티
pip install tqdm accelerate xformers

5. 모델 다운로드 스크립트

5.1 필요한 모델들 자동 다운로드

# download_models.py 실행하여 모델 다운로드
python download_models.py

6. 환경 변수 설정

6.1 .env 파일 생성

# .env 파일에 다음 내용 추가
CUDA_VISIBLE_DEVICES=0
TORCH_HOME=./models
HF_HOME=./models/huggingface

7. 설치 검증

7.1 CUDA 및 PyTorch 동작 확인

# test_environment.py 실행
python test_environment.py

7.2 GPU 메모리 확인

# GPU 메모리 사용량 체크
python check_gpu.py

📁 프로젝트 구조

image_to_video_service/
├── venv/                          # Python 가상환경
├── models/                        # 다운로드된 모델들
│   ├── svd/                      # Stable Video Diffusion
│   ├── realesrgan/               # Real-ESRGAN
│   └── mediapipe/                # MediaPipe 모델들
├── src/                          # 소스코드
│   ├── core/                     # 핵심 로직
│   ├── api/                      # API 서버
│   ├── utils/                    # 유틸리티
│   └── web/                      # 웹 인터페이스
├── uploads/                      # 업로드된 이미지
├── outputs/                      # 생성된 동영상
├── static/                       # 정적 파일
├── templates/                    # HTML 템플릿
├── requirements.txt              # 패키지 목록
├── .env                         # 환경변수
└── main.py                      # 메인 실행 파일

🔧 성능 최적화 설정

GPU 메모리 최적화

# GPU 메모리 분할 할당 설정
import torch
torch.cuda.empty_cache()
torch.backends.cudnn.benchmark = True

배치 처리 최적화

# 배치 크기 조정 (GPU 메모리에 따라)
BATCH_SIZE = 1  # 16GB GPU 기준
MAX_FRAMES = 25  # 생성할 프레임 수

🚨 트러블슈팅

자주 발생하는 문제들

1. CUDA 버전 불일치

# CUDA 버전 확인
nvidia-smi
nvcc --version

# PyTorch CUDA 버전 확인
python -c "import torch; print(torch.version.cuda)"

2. GPU 메모리 부족

# 메모리 사용량 모니터링
nvidia-smi -l 1

# Python에서 GPU 메모리 정리
torch.cuda.empty_cache()

3. 모델 다운로드 실패

# Hugging Face 캐시 정리
rm -rf ~/.cache/huggingface/

# 수동 모델 다운로드
python download_models.py --force

📊 성능 벤치마크

예상 처리 시간 (RTX 4090 기준)

512x512 이미지 → 25프레임 비디오: 약 30-60초
1024x1024 이미지 → 25프레임 비디오: 약 2-4분
고품질 업스케일링 포함: 추가 1-2분

메모리 사용량

최소 요구사항: 12GB VRAM
권장 사양: 16GB+ VRAM
RAM: 16GB+ 시스템 메모리

🎯 다음 단계

환경설정 완료 후 실행: python main.py
웹 인터페이스 접속: http://localhost:7860
API 서버 실행: python src/api/server.py
테스트 이미지로 동작 확인

📞 지원 및 문의

설치 과정에서 문제가 발생하면:

GPU 드라이버 최신 버전 확인
CUDA 버전 호환성 확인
Python 패키지 버전 충돌 해결
로그 파일 확인: logs/installation.log

# requirements.txt
torch>=2.1.0
torchvision>=0.16.0
torchaudio>=2.1.0
diffusers>=0.24.0
transformers>=4.35.0
accelerate>=0.24.0
opencv-python>=4.8.0
pillow>=10.0.0
numpy>=1.24.0
gradio>=4.0.0
flask>=2.3.0
moviepy>=1.0.3
imageio>=2.31.0
imageio-ffmpeg>=0.4.9
face-recognition>=1.3.0
mediapipe>=0.10.0
realesrgan>=0.3.0
tqdm>=4.66.0
xformers>=0.0.22
python-dotenv>=1.0.0
requests>=2.31.0

# ============================================================================
# main.py - 메인 실행 파일
# ============================================================================

import os
import sys
import argparse
from pathlib import Path

# 프로젝트 루트 디렉토리를 Python 경로에 추가
sys.path.append(str(Path(__file__).parent))

from src.web.gradio_app import launch_gradio_app
from src.api.server import launch_api_server
from src.utils.setup import setup_environment, check_requirements

def main():
    parser = argparse.ArgumentParser(description='이미지-비디오 변환 서비스')
    parser.add_argument('--mode', choices=['web', 'api', 'both'], default='web',
                      help='실행 모드 선택 (web: Gradio 웹앱, api: REST API, both: 둘 다)')
    parser.add_argument('--port', type=int, default=7860, help='포트 번호')
    parser.add_argument('--host', default='127.0.0.1', help='호스트 주소')
    parser.add_argument('--share', action='store_true', help='Gradio 공유 링크 생성')
    
    args = parser.parse_args()
    
    # 환경 설정 및 요구사항 확인
    print("🚀 이미지-비디오 변환 서비스 시작")
    print("=" * 50)
    
    setup_environment()
    
    if not check_requirements():
        print("❌ 요구사항 확인 실패. 설치 가이드를 참조하세요.")
        return
    
    print("✅ 환경 설정 완료")
    
    # 실행 모드에 따라 서비스 시작
    if args.mode == 'web':
        print(f"🌐 Gradio 웹 인터페이스 시작: http://{args.host}:{args.port}")
        launch_gradio_app(port=args.port, host=args.host, share=args.share)
    
    elif args.mode == 'api':
        print(f"🔗 REST API 서버 시작: http://{args.host}:{args.port}")
        launch_api_server(port=args.port, host=args.host)
    
    elif args.mode == 'both':
        print("🔄 웹 인터페이스와 API 서버 모두 시작")
        # 별도 프로세스로 API 서버 실행
        import multiprocessing as mp
        api_process = mp.Process(target=launch_api_server, 
                                args=(args.port + 1, args.host))
        api_process.start()
        
        print(f"🌐 Gradio 웹 인터페이스: http://{args.host}:{args.port}")
        print(f"🔗 REST API 서버: http://{args.host}:{args.port + 1}")
        
        try:
            launch_gradio_app(port=args.port, host=args.host, share=args.share)
        finally:
            api_process.terminate()

if __name__ == "__main__":
    main()

# ============================================================================
# src/utils/setup.py - 환경 설정 유틸리티
# ============================================================================

import os
import torch
import subprocess
from pathlib import Path
from dotenv import load_dotenv

def setup_environment():
    """환경 변수 및 디렉토리 설정"""
    load_dotenv()
    
    # 필요한 디렉토리 생성
    directories = [
        'models', 'uploads', 'outputs', 'logs', 
        'models/svd', 'models/realesrgan', 'models/mediapipe'
    ]
    
    for dir_name in directories:
        Path(dir_name).mkdir(exist_ok=True)
    
    # 환경 변수 설정
    os.environ.setdefault('TORCH_HOME', './models')
    os.environ.setdefault('HF_HOME', './models/huggingface')

def check_requirements():
    """시스템 요구사항 확인"""
    checks = []
    
    # CUDA 확인
    if torch.cuda.is_available():
        gpu_count = torch.cuda.device_count()
        gpu_name = torch.cuda.get_device_name(0)
        vram = torch.cuda.get_device_properties(0).total_memory / 1024**3
        
        print(f"✅ CUDA 사용 가능: {gpu_count}개 GPU")
        print(f"📱 GPU: {gpu_name}")
        print(f"💾 VRAM: {vram:.1f}GB")
        
        if vram < 10:
            print("⚠️  경고: VRAM이 10GB 미만입니다. 성능에 영향을 줄 수 있습니다.")
        
        checks.append(True)
    else:
        print("❌ CUDA를 사용할 수 없습니다.")
        checks.append(False)
    
    # Python 버전 확인
    import sys
    python_version = sys.version_info
    if python_version.major == 3 and python_version.minor == 10:
        print(f"✅ Python 버전: {python_version.major}.{python_version.minor}")
        checks.append(True)
    else:
        print(f"⚠️  Python 버전: {python_version.major}.{python_version.minor} (권장: 3.10)")
        checks.append(True)  # 다른 버전도 허용
    
    return all(checks)

# ============================================================================
# src/core/video_generator.py - 비디오 생성 핵심 로직
# ============================================================================

import torch
import numpy as np
from PIL import Image
from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video
import cv2
from typing import Optional, Tuple
import logging

class VideoGenerator:
    def __init__(self, model_id: str = "stabilityai/stable-video-diffusion-img2vid-xt"):
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.pipe = None
        self.model_id = model_id
        self.logger = logging.getLogger(__name__)
        
    def load_model(self):
        """모델 로드"""
        try:
            self.logger.info(f"모델 로딩 시작: {self.model_id}")
            
            self.pipe = StableVideoDiffusionPipeline.from_pretrained(
                self.model_id,
                torch_dtype=torch.float16,
                variant="fp16"
            )
            
            self.pipe = self.pipe.to(self.device)
            
            # 메모리 최적화
            self.pipe.enable_model_cpu_offload()
            self.pipe.enable_vae_slicing()
            
            self.logger.info("모델 로딩 완료")
            
        except Exception as e:
            self.logger.error(f"모델 로딩 실패: {e}")
            raise
    
    def preprocess_image(self, image: Image.Image, 
                        target_size: Tuple[int, int] = (1024, 576)) -> Image.Image:
        """이미지 전처리"""
        # 종횡비 유지하면서 리사이즈
        img_ratio = image.width / image.height
        target_ratio = target_size[0] / target_size[1]
        
        if img_ratio > target_ratio:
            # 이미지가 더 넓음
            new_width = target_size[0]
            new_height = int(target_size[0] / img_ratio)
        else:
            # 이미지가 더 높음
            new_height = target_size[1]
            new_width = int(target_size[1] * img_ratio)
        
        # 8의 배수로 조정 (모델 요구사항)
        new_width = (new_width // 8) * 8
        new_height = (new_height // 8) * 8
        
        image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)
        
        # 중앙 크롭으로 타겟 사이즈에 맞춤
        if new_width != target_size[0] or new_height != target_size[1]:
            left = (new_width - target_size[0]) // 2
            top = (new_height - target_size[1]) // 2
            right = left + target_size[0]
            bottom = top + target_size[1]
            
            # 패딩이 필요한 경우
            if left < 0 or top < 0:
                padded = Image.new('RGB', target_size, (0, 0, 0))
                paste_x = max(0, -left)
                paste_y = max(0, -top)
                padded.paste(image, (paste_x, paste_y))
                image = padded
            else:
                image = image.crop((left, top, right, bottom))
        
        return image
    
    def generate_video(self, 
                      image: Image.Image,
                      num_frames: int = 25,
                      fps: int = 7,
                      motion_bucket_id: int = 127,
                      noise_aug_strength: float = 0.02,
                      decode_chunk_size: int = 8,
                      seed: Optional[int] = None) -> str:
        """비디오 생성"""
        
        if self.pipe is None:
            self.load_model()
        
        # 시드 설정
        if seed is not None:
            torch.manual_seed(seed)
            np.random.seed(seed)
        
        # 이미지 전처리
        processed_image = self.preprocess_image(image)
        
        self.logger.info(f"비디오 생성 시작: {num_frames}프레임, {fps}fps")
        
        try:
            # 비디오 생성
            frames = self.pipe(
                processed_image,
                decode_chunk_size=decode_chunk_size,
                generator=torch.manual_seed(seed) if seed else None,
                motion_bucket_id=motion_bucket_id,
                noise_aug_strength=noise_aug_strength,
                num_frames=num_frames,
            ).frames[0]
            
            # 비디오 파일로 저장
            output_path = f"outputs/generated_video_{torch.randint(0, 10000, (1,)).item()}.mp4"
            export_to_video(frames, output_path, fps=fps)
            
            self.logger.info(f"비디오 생성 완료: {output_path}")
            return output_path
            
        except Exception as e:
            self.logger.error(f"비디오 생성 실패: {e}")
            raise
        finally:
            # GPU 메모리 정리
            torch.cuda.empty_cache()

# ============================================================================
# src/core/face_animator.py - 얼굴 표정 애니메이션
# ============================================================================

import cv2
import mediapipe as mp
import numpy as np
from PIL import Image
from typing import List, Tuple

class FaceAnimator:
    def __init__(self):
        self.mp_face_mesh = mp.solutions.face_mesh
        self.mp_drawing = mp.solutions.drawing_utils
        self.face_mesh = self.mp_face_mesh.FaceMesh(
            static_image_mode=False,
            max_num_faces=1,
            refine_landmarks=True,
            min_detection_confidence=0.5
        )
    
    def detect_face_landmarks(self, image: np.ndarray) -> List[Tuple[int, int]]:
        """얼굴 랜드마크 검출"""
        rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = self.face_mesh.process(rgb_image)
        
        landmarks = []
        if results.multi_face_landmarks:
            for face_landmarks in results.multi_face_landmarks:
                h, w = image.shape[:2]
                for landmark in face_landmarks.landmark:
                    x = int(landmark.x * w)
                    y = int(landmark.y * h)
                    landmarks.append((x, y))
        
        return landmarks
    
    def animate_expression(self, frames: List[np.ndarray], 
                          expression_type: str = "smile") -> List[np.ndarray]:
        """표정 애니메이션 적용"""
        animated_frames = []
        
        for i, frame in enumerate(frames):
            landmarks = self.detect_face_landmarks(frame)
            
            if landmarks:
                # 표정에 따른 랜드마크 조정
                modified_frame = self.apply_expression_morph(
                    frame, landmarks, expression_type, i / len(frames)
                )
                animated_frames.append(modified_frame)
            else:
                animated_frames.append(frame)
        
        return animated_frames
    
    def apply_expression_morph(self, frame: np.ndarray, landmarks: List[Tuple[int, int]], 
                              expression: str, progress: float) -> np.ndarray:
        """표정 변형 적용"""
        # 표정별 랜드마크 조정 매핑
        expression_adjustments = {
            "smile": self.smile_adjustment,
            "surprise": self.surprise_adjustment,
            "blink": self.blink_adjustment
        }
        
        if expression in expression_adjustments:
            return expression_adjustments[expression](frame, landmarks, progress)
        
        return frame
    
    def smile_adjustment(self, frame: np.ndarray, landmarks: List[Tuple[int, int]], 
                        progress: float) -> np.ndarray:
        """미소 표정 조정"""
        # 입꼬리 올리기 (랜드마크 61, 291)
        mouth_corners = [61, 291]
        smile_intensity = np.sin(progress * np.pi) * 5  # 부드러운 미소 애니메이션
        
        # 간단한 워핑 적용 (실제로는 더 정교한 알고리즘 필요)
        return frame
    
    def surprise_adjustment(self, frame: np.ndarray, landmarks: List[Tuple[int, int]], 
                           progress: float) -> np.ndarray:
        """놀람 표정 조정"""
        # 눈썹 올리기, 눈 크게 뜨기
        return frame
    
    def blink_adjustment(self, frame: np.ndarray, landmarks: List[Tuple[int, int]], 
                        progress: float) -> np.ndarray:
        """눈 깜빡임 조정"""
        # 눈꺼풀 움직임
        return frame

# ============================================================================
# src/core/motion_enhancer.py - 자연스러운 움직임 추가
# ============================================================================

import cv2
import numpy as np
from typing import List
from scipy.ndimage import gaussian_filter

class MotionEnhancer:
    def __init__(self):
        self.optical_flow = cv2.optflow.createOptFlow_DeepFlow()
    
    def add_hair_motion(self, frames: List[np.ndarray], 
                       wind_strength: float = 0.5) -> List[np.ndarray]:
        """머리카락 움직임 추가"""
        enhanced_frames = []
        
        for i, frame in enumerate(frames):
            if i == 0:
                enhanced_frames.append(frame)
                continue
            
            # 머리카락 영역 감지 (간단한 색상 기반)
            hair_mask = self.detect_hair_region(frame)
            
            # 바람 효과 시뮬레이션
            wind_vector = self.generate_wind_vector(i, wind_strength)
            
            # 광학 흐름 적용
            enhanced_frame = self.apply_motion_to_region(
                frames[i-1], frame, hair_mask, wind_vector
            )
            
            enhanced_frames.append(enhanced_frame)
        
        return enhanced_frames
    
    def detect_hair_region(self, frame: np.ndarray) -> np.ndarray:
        """머리카락 영역 감지"""
        # HSV 색공간으로 변환
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        
        # 머리카락 색상 범위 (어두운 색상)
        lower_hair = np.array([0, 0, 0])
        upper_hair = np.array([180, 255, 100])
        
        mask = cv2.inRange(hsv, lower_hair, upper_hair)
        
        # 모폴로지 연산으로 노이즈 제거
        kernel = np.ones((5, 5), np.uint8)
        mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
        mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
        
        return mask
    
    def generate_wind_vector(self, frame_idx: int, strength: float) -> np.ndarray:
        """바람 벡터 생성"""
        # 사인파 기반 자연스러운 움직임
        time_factor = frame_idx * 0.1
        
        wind_x = np.sin(time_factor) * strength
        wind_y = np.cos(time_factor * 0.7) * strength * 0.5
        
        return np.array([wind_x, wind_y])
    
    def apply_motion_to_region(self, prev_frame: np.ndarray, curr_frame: np.ndarray,
                              mask: np.ndarray, motion_vector: np.ndarray) -> np.ndarray:
        """특정 영역에 움직임 적용"""
        # 광학 흐름 계산
        prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
        curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
        
        # Lucas-Kanade 옵티컬 플로우
        flow = cv2.calcOpticalFlowPyrLK(prev_gray, curr_gray, None, None)
        
        # 움직임 벡터 적용
        h, w = curr_frame.shape[:2]
        flow_map = np.zeros((h, w, 2), dtype=np.float32)
        
        # 마스크 영역에만 움직임 적용
        mask_3d = np.stack([mask] * 2, axis=2) / 255.0
        flow_map[:, :] = motion_vector
        flow_map = flow_map * mask_3d
        
        # 워핑 적용
        warped = cv2.remap(curr_frame, 
                          flow_map[:, :, 0].astype(np.float32),
                          flow_map[:, :, 1].astype(np.float32),
                          cv2.INTER_LINEAR)
        
        # 마스크 영역만 적용
        mask_3d_bgr = np.stack([mask] * 3, axis=2) / 255.0
        result = curr_frame * (1 - mask_3d_bgr) + warped * mask_3d_bgr
        
        return result.astype(np.uint8)
    
    def add_cloth_motion(self, frames: List[np.ndarray]) -> List[np.ndarray]:
        """옷 움직임 추가"""
        # 옷 영역 감지 및 부드러운 움직임 적용
        enhanced_frames = []
        
        for i, frame in enumerate(frames):
            if i == 0:
                enhanced_frames.append(frame)
                continue
            
            # 옷 영역 감지 (색상 및 질감 기반)
            cloth_mask = self.detect_cloth_region(frame)
            
            # 미세한 흔들림 효과
            motion_vector = np.array([
                np.sin(i * 0.05) * 0.3,
                np