@senspond

빅데이터/AI 🍎>데이터 사이언스

두 이미지간의 유사도 측정 SSIM, LPIPS 평가지표와 파이썬 구현

등록일시 : 2025-03-02 (일) 06:41

업데이트 : 2025-03-02 (일) 08:12

오늘 조회수 : 1

총 조회수 : 726

두 이미지간의 유사도 측정 SSIM, LPIPS 평가지표와 파이썬 구현방법을 정리해봤습니다.

서론

모바일 기기를 위한 스타일 전이 관련한 연구를 해외저널에 1저자로 투고를 했고, 리뷰어 검토를 받고 수정단계에 있습니다.

리뷰어 3분이 검토를 해주셨고, 모든 코멘트를 전부 반영하려면 시간이 좀 걸릴것 같습니다.

The visual quality comparison of the style transfer outputs is subjective and lacks quantitative evaluation. Incorporating metrics such as SSIM (Structural Similarity Index) or perceptual loss evaluations would provide a more objective assessment of the models' visual performance.

리뷰어가 지적한 내용(번역)

스타일 전이 출력의 시각적 품질 비교는 주관적이며 정량적 평가가 부족합니다. SSIM(구조적 유사성 지수)이나 지각 손실 평가와 같은 지표를 통합하면 모델의 시각적 성능을 보다 객관적으로 평가할 수 있습니다.

SSIM(Structural Similarity Index Measure)

SSIM이란?

SSIM(x, y) = \frac{(2\mu_x \mu_y + C_1)(2\sigma_{xy} + C_2)}{(\mu_x^2 + \mu_y^2 + C_1)(\sigma_x^2 + \sigma_y^2 + C_2)}

SSIM은 밝기(luminance), 대비(contrast), 구조(structure) 유사성을 종합적으로 평가하는 지표로, 1에 가까울수록 두 이미지가 유사함을 의미합니다.

\mu_x, \mu_y : x, y의 평균 \\ \sigma_x^2, \sigma_y^2 : x, y의 분산 \\ \sigma_{xy} : x, y의 공분산 \\

\sigma_x^2 = \frac{1}{N-1} \sum_{i=1}^{N} (x_i - \mu_x)^2\\ \quad \sigma_y^2 = \frac{1}{N-1} \sum_{i=1}^{N} (y_i - \mu_y)^2\\ \quad \sigma_{xy} = \frac{1}{N-1} \sum_{i=1}^{N} (x_i - \mu_x)(y_i - \mu_y)

파이썬 구현

파이썬에서는 skimage.metrics의 structural_similarity를 주로 사용합니다.

def structural_similarity(
    im1,
    im2,
    *,
    win_size=None,
    gradient=False,
    data_range=None,
    channel_axis=None,
    gaussian_weights=False,
    full=False,
    **kwargs,
): 
    check_shape_equality(im1, im2)
    float_type = _supported_float_type(im1.dtype)

    if channel_axis is not None:
        # loop over channels
        args = dict(
            win_size=win_size,
            gradient=gradient,
            data_range=data_range,
            channel_axis=None,
            gaussian_weights=gaussian_weights,
            full=full,
        )
        args.update(kwargs)
        nch = im1.shape[channel_axis]
        mssim = np.empty(nch, dtype=float_type)

        if gradient:
            G = np.empty(im1.shape, dtype=float_type)
        if full:
            S = np.empty(im1.shape, dtype=float_type)
        channel_axis = channel_axis % im1.ndim
        _at = functools.partial(utils.slice_at_axis, axis=channel_axis)
        for ch in range(nch):
            ch_result = structural_similarity(im1[_at(ch)], im2[_at(ch)], **args)
            if gradient and full:
                mssim[ch], G[_at(ch)], S[_at(ch)] = ch_result
            elif gradient:
                mssim[ch], G[_at(ch)] = ch_result
            elif full:
                mssim[ch], S[_at(ch)] = ch_result
            else:
                mssim[ch] = ch_result
        mssim = mssim.mean()
        if gradient and full:
            return mssim, G, S
        elif gradient:
            return mssim, G
        elif full:
            return mssim, S
        else:
            return mssim

    K1 = kwargs.pop('K1', 0.01)
    K2 = kwargs.pop('K2', 0.03)
    sigma = kwargs.pop('sigma', 1.5)
    if K1 < 0:
        raise ValueError("K1 must be positive")
    if K2 < 0:
        raise ValueError("K2 must be positive")
    if sigma < 0:
        raise ValueError("sigma must be positive")
    use_sample_covariance = kwargs.pop('use_sample_covariance', True)

    if gaussian_weights:
        # Set to give an 11-tap filter with the default sigma of 1.5 to match
        # Wang et. al. 2004.
        truncate = 3.5

    if win_size is None:
        if gaussian_weights:
            # set win_size used by crop to match the filter size
            r = int(truncate * sigma + 0.5)  # radius as in ndimage
            win_size = 2 * r + 1
        else:
            win_size = 7  # backwards compatibility

    if np.any((np.asarray(im1.shape) - win_size) < 0):
        raise ValueError(
            'win_size exceeds image extent. '
            'Either ensure that your images are '
            'at least 7x7; or pass win_size explicitly '
            'in the function call, with an odd value '
            'less than or equal to the smaller side of your '
            'images. If your images are multichannel '
            '(with color channels), set channel_axis to '
            'the axis number corresponding to the channels.'
        )

    if not (win_size % 2 == 1):
        raise ValueError('Window size must be odd.')

    if data_range is None:
        if np.issubdtype(im1.dtype, np.floating) or np.issubdtype(
            im2.dtype, np.floating
        ):
            raise ValueError(
                'Since image dtype is floating point, you must specify '
                'the data_range parameter. Please read the documentation '
                'carefully (including the note). It is recommended that '
                'you always specify the data_range anyway.'
            )
        if im1.dtype != im2.dtype:
            warn(
                "Inputs have mismatched dtypes. Setting data_range based on im1.dtype.",
                stacklevel=2,
            )
        dmin, dmax = dtype_range[im1.dtype.type]
        data_range = dmax - dmin
        if np.issubdtype(im1.dtype, np.integer) and (im1.dtype != np.uint8):
            warn(
                "Setting data_range based on im1.dtype. "
                + f"data_range = {data_range:.0f}. "
                + "Please specify data_range explicitly to avoid mistakes.",
                stacklevel=2,
            )

    ndim = im1.ndim

    if gaussian_weights:
        filter_func = gaussian
        filter_args = {'sigma': sigma, 'truncate': truncate, 'mode': 'reflect'}
    else:
        filter_func = uniform_filter
        filter_args = {'size': win_size}

    # ndimage filters need floating point data
    im1 = im1.astype(float_type, copy=False)
    im2 = im2.astype(float_type, copy=False)

    NP = win_size**ndim

    # filter has already normalized by NP
    if use_sample_covariance:
        cov_norm = NP / (NP - 1)  # sample covariance
    else:
        cov_norm = 1.0  # population covariance to match Wang et. al. 2004

    # compute (weighted) means
    ux = filter_func(im1, **filter_args)
    uy = filter_func(im2, **filter_args)

    # compute (weighted) variances and covariances
    uxx = filter_func(im1 * im1, **filter_args)
    uyy = filter_func(im2 * im2, **filter_args)
    uxy = filter_func(im1 * im2, **filter_args)
    vx = cov_norm * (uxx - ux * ux)
    vy = cov_norm * (uyy - uy * uy)
    vxy = cov_norm * (uxy - ux * uy)

    R = data_range
    C1 = (K1 * R) ** 2
    C2 = (K2 * R) ** 2

    A1, A2, B1, B2 = (
        2 * ux * uy + C1,
        2 * vxy + C2,
        ux**2 + uy**2 + C1,
        vx + vy + C2,
    )
    D = B1 * B2
    S = (A1 * A2) / D

    # to avoid edge effects will ignore filter radius strip around edges
    pad = (win_size - 1) // 2

    # compute (weighted) mean of ssim. Use float64 for accuracy.
    mssim = crop(S, pad).mean(dtype=np.float64)

    if gradient:
        # The following is Eqs. 7-8 of Avanaki 2009.
        grad = filter_func(A1 / D, **filter_args) * im1
        grad += filter_func(-S / B2, **filter_args) * im2
        grad += filter_func((ux * (A2 - A1) - uy * (B2 - B1) * S) / D, **filter_args)
        grad *= 2 / im1.size

        if full:
            return mssim, grad, S
        else:
            return mssim, grad
    else:
        if full:
            return mssim, S
        else:
            return mssim

해당 구현은 skimage.metrics에서 제공하는 structural_similarity 함수의 구현입니다.

해당 구현의 참고 논문

[1] Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13, 600-612. https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf, :DOI:`10.1109/TIP.2003.819861
[2] Avanaki, A. N. (2009). Exact global histogram specification optimized for structural similarity. Optical Review, 16, 613-621. :arxiv:0901.0065 :DOI:`10.1007/s10043-009-0119-z

structural_similarity 주요 파라미터 설명

im1, im2: SSIM을 계산할 두 이미지입니다. 두 이미지의 크기(shape)는 동일해야 합니다.
win_size: 필터 윈도우의 크기를 지정합니다. 기본값은 7입니다. 이 크기를 통해 이미지에서 비교할 영역을 정합니다.
gradient: True로 설정하면 SSIM의 그래디언트(gradient)도 계산하여 반환합니다.
data_range: 이미지의 데이터 범위입니다. 예를 들어, uint8 이미지의 경우 0-255의 범위가 될 수 있습니다.
channel_axis: 멀티채널 이미지(예: RGB 이미지)의 경우 각 채널에 대해 별도로 SSIM을 계산하고 평균값을 반환합니다.
gaussian_weights: True로 설정하면 가우시안 가중치를 사용하는 필터링을 적용하여 더 부드럽게 처리합니다.
full: True로 설정하면 SSIM 값뿐만 아니라 SSIM 맵(전체 이미지에 대한 SSIM 값)을 반환합니다.

예시1) 흑백 이미지로 SSIM 측정

import cv2
import numpy as np
from skimage.metrics import structural_similarity as ssim

# 단일 채널(흑백 이미지로) 비교
def calculate_ssim(image1_path, image2_path):
    img1 = cv2.imread(image1_path, cv2.IMREAD_GRAYSCALE)
    img2 = cv2.imread(image2_path, cv2.IMREAD_GRAYSCALE)

    ssim_score, _ = ssim(img1, img2, full=True)
    return ssim_score

예시2) 칼라 이미지로 SSIM 측정


# 멀티 채널(칼라 이미지로) 비교
def calculate_ssim(image1_path, image2_path):
    # 이미지 불러오기
    img1 = cv2.imread(image1_path)
    img2 = cv2.imread(image2_path)

    # SSIM 계산
    ssim_value, _ = ssim(img1, img2, full=True,  channel_axis=-1)
    return ssim_value

LPIPS(Learned Perceptual Image Patch Similarity)

LPIPS란?

LPIPS (Learned Perceptual Image Patch Similarity)는 사람의 시각적 인지(perceptual similarity)와 유사하게 이미지 간 차이를 측정하기 위해 개발된 척도입니다. 기존의 SSIM(Structural Similarity Index)이나 PSNR(Peak Signal-to-Noise Ratio)과 같은 전통적인 정량적 평가 방법이 픽셀 단위 차이(pixel-wise difference)를 중심으로 평가하는 것과 달리, LPIPS는 딥러닝 기반 피처맵(feature map) 비교를 통해 두 이미지 간 유사성을 평가합니다

1. LPIPS의 기본 개념

LPIPS는 신경망(Neural Network)에서 추출한 특징맵(feature map) 차이를 기반으로 이미지 유사성을 측정합니다.
사람의 시각 인식 방식과 유사한 평가 방법을 학습하도록 설계되어, GAN(Generative Adversarial Networks) 기반 모델이나 스타일 전이(Style Transfer) 모델을 평가하는 데 적합합니다.
기존의 SSIM이나 PSNR은 픽셀 단위에서 절대적인 차이를 평가하는 반면, LPIPS는 이미지의 고수준(high-level) 특성을 비교하여 더 직관적인 품질 평가를 제공합니다.

2. LPIPS 계산 과정

LPIPS는 다음과 같은 절차로 두 이미지의 유사성을 측정합니다.

(1) 이미지 피처 추출

입력 이미지 x 와 y 를 사전 학습된 CNN 모델(VGG, AlexNet 등)에 통과시켜 중간층(intermediate layers)에서 피처맵을 추출합니다.
일반적으로 VGGNet, AlexNet, SqueezeNet 등의 네트워크가 사용됩니다.

(2) 피처맵 거리 측정

각 레이어 에서 두 이미지 간 피처맵 차이를 계산합니다.
차이는 유클리드 거리(Euclidean distance)로 측정됩니다.

(3) 피처맵 가중치 적용

레이어마다 중요한 정보가 다르므로, 실험적으로 학습된 가중치를 적용하여 중요도를 조절합니다.
가중치는 사람이 평가한 데이터(ground-truth perceptual scores)를 기반으로 학습되어, 사람의 시각적 지각과 더 잘 일치하도록 설계됩니다.

(4) 평균을 취하여 LPIPS 값 산출

여러 레이어에서 계산된 거리의 가중합을 평균 내어 최종 LPIPS 값을 도출합니다.
LPIPS 값이 낮을수록 두 이미지가 시각적으로 더 유사함을 의미합니다.

LPIPS의 주요 장점

픽셀 단위 비교(SSIM, PSNR)의 한계를 극복

SSIM과 PSNR은 두 이미지의 절대적인 픽셀 차이를 비교하지만, LPIPS는 고수준(high-level) 시각 정보를 바탕으로 비교하기 때문에 사람의 지각(perception)과 더 유사한 결과를 제공합니다.

GAN 및 스타일 전이 모델 평가에 적합

GAN으로 생성한 이미지나 스타일 전이된 이미지의 경우, 픽셀 단위 차이가 크더라도 사람의 눈에는 유사하게 보일 수 있습니다.
LPIPS는 CNN을 활용하여 이러한 고차원적 시각적 차이를 효과적으로 측정합니다.

구현

pip install lpips

https://github.com/richzhang/PerceptualSimilarity

깃허브에 보면 반드시 [-1, 1] 범위로 정규화를 하라고 하고 있습니다.

import lpips
import torch
from PIL import Image
import torchvision.transforms as transforms

# 이미지 불러오기 (PIL 이미지에서 tensor로 변환)
def load_image(image_path):
    img = Image.open(image_path).convert("RGB")
    transform = transforms.Compose([
        transforms.ToTensor(),  # 텐서로 변환
        transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),  # 0-1 범위를 [-1, 1]로 변환
    ])
    return transform(img).unsqueeze(0)  # 배치 차원 추가

def load_lpips():
    # LPIPS 모델 로드
    return lpips.LPIPS(net='vgg')  # 'vgg', 'alex' 등의 네트워크를 선택 가능


# LPIPS 계산 함수
def calculate_lpips(lpips_model, input_image, output_image):
    img1 = load_image(input_image)
    img2 = load_image(output_image)

    # GPU가 있다면 텐서를 GPU로 이동
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    img1 = img1.to(device)
    img2 = img2.to(device)
    lpips_model.to(device)
    # LPIPS 유사도 계산
    lpips_score = lpips_model(img1, img2)

    return lpips_score.item()  # 값 반환

결론

연구한 스타일 전이 모델에서 SSIM 와 LPIPS를 사용해서 평가를 해보니, 결과가 매우 상이하게 나왔습니다.

LPIPS를 사용하는 것이 연구결과의 신뢰도를 높이기 위해서 보다 적합한 지표로 판단되었는데, 이에 따른 논리적인 서술과 인용을 포함해 측정결과를 제공하고 리뷰어를 납득 시켜줘야 할 것 같습니다.

SSIM의 문제점

전통적인 이미지의 구조적 유사성을 평가하는 지표로, SSIM은 픽셀단위로 로컬 영역을 비교하여 유사성을 측정

→ 픽셀에 차이가 발생하면 점수가 낮아짐

→ 사람의 지각(perception)과 일치하지 않는 결과가 나올 수 있음

스타일 전이 모델에서 원본 이미지의 내용(content)을 유지하면서 새로운 스타일(style)을 적용하는 것이므로, 색상이나 질감(texture), 스타일 반영으로 인해 SSIM이 낮아질 수 있음 → 실제로는 좋은 스타일 전이 결과이지만 평가지표가 낮게 나올 수 있음
GAN 모델이 생성한 이미지가 시각적으로 매우 사실적(realistic)일 수 있지만, 원본 이미지와 픽셀 단위에서 차이가 발생하면서 SSIM 점수는 낮게 나올 수 있음.

해결책

FID (Fréchet Inception Distance), LPIPS (Learned Perceptual Image Patch Similarity) 등 GAN 이미지 생성 모델 평가에 사용되는 평가지표를 사용.

참고문헌

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13, 600-612. https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf, :DOI:`10.1109/TIP.2003.819861
Avanaki, A. N. (2009). Exact global histogram specification optimized for structural similarity. Optical Review, 16, 613-621. :arxiv:0901.0065 :DOI:`10.1007/s10043-009-0119-z
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600-612. https://doi.org/10.1109/TIP.2003.819861
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. arXiv preprint arXiv:1801.03924.
파이썬 영상처리 : https://blog.naver.com/kyy0810/221426687033
평가지표 LPIPS : https://xoft.tistory.com/4
GAN을 평가하는 방법 : https://jjuon.tistory.com/33
https://github.com/richzhang/PerceptualSimilarity

senspond

안녕하세요. Red, Green, Blue 가 만나 새로운 세상을 만들어 나가겠다는 이상을 가진 개발자의 개인공간입니다.

댓글 ( 0 )

카테고리내 관련 게시글

현재글에서 작성자가 발행한 같은 카테고리내 이전, 다음 글들을 보여줍니다