How to Compare Images FFmpeg/libav C++ Tutorial Filter Graph PSNR/SSIM

This is a simple C++ tutorial which shows how to compare two still images or videos using PSNR and SSIM metrics. The resulting application prints a score of how the two input images differ. The project uses FFmpeg C library for the processing of the input images or videos and computation of the PSNR and SSIM. This tutorial shows how to decode the input images in any supported format like JPG, PNG, BMP, etc., and how to create a filter graph in FFmpeg with the PSNR and SSIM nodes. The source code is available with CMake settings HERE. The code portions shown in the text below might be shortened for better understanding. Especially error checks are omitted.

How to compare two images? What is PSNR and SSIM?

In one of my projects I needed to compare pairs of images within the application. For example, when using compression algorithms such as H.265, AV1 for videos or JPEG, WebP for still pictures, the original is often compared to the compressed and decompressed data. This is done to evaluate how much do two images or video frames differ, especially when measuring the amount of compression artifacts. One can simply subtract the two images from each other and get the average error, this is similar principle to mean squared error (MSE). A little bit better metrics are PSNR and SSIM as they are closer to the way of how humans perceive differences in images. These metrics belong to the category of reference metrics where the distorted image is compared to the original clean one. One of the notable metrics for video compression evaluation was developed by Netflix and is called VMAF. This metric is also available in FFmpeg when included during the build process. The above-mentioned metrics analyze a pair of images and return a number which represents the similarity score. The higher the number is, the more similar the images are. There are also no-reference metrics which try to evaluate how much of artifacts is present in the image when the original image is not available for comparison. Examples of such metrics are LIQE or NIQSV+. These metrics often focus on specific kind of artifacts and are designed for a specific use cases such as compression or image synthesis quality evaluation.

FFmpeg is not only a C library. It also comes with built binary programs that are very useful tools for video transcoding and editing. Having two images, the PSNR and SSIM can be computed as shown below. The goal of this tutorial is to achieve the same result but with compiled custom C++ code.

ffmpeg -i imageA.jpg -i imageB.jpg -filter_complex "psnr" -f null /dev/null
ffmpeg -i imageA.jpg -i imageB.jpg -filter_complex "ssim" -f null /dev/null

FFmpeg can be used as a C++ library to do any possible operations with videos or still images.

How to Load and Decode an Image in FFmpeg?

First of all the input file needs to be opened and the necessary structures initialized. FFmpeg uses several objects when processing an input multimedia file. The following function shows the standard way to open a file and prepare for decoding. Usually, the FFmpeg structures must first be allocated and then initialized with the actual data. The function first opens the file and checks for its format which is necessary to know for the extraction (demuxing) of the data packets from the input stream. Multimedia files can have multiple streams like video, audio, subtitles etc. A video stream is requested and searched in the input. The codec is initialized based on the input information. Codec is the module that can encode or decode the specific compression format.

Decoder::Decoder(std::string file)
{
    formatContext = avformat_alloc_context();
    avformat_open_input(&formatContext, file.c_str(), nullptr, nullptr);
    avformat_find_stream_info(formatContext, nullptr);
    auto videoStreamId = av_find_best_stream(formatContext, AVMEDIA_TYPE_VIDEO, -1, -1, const_cast(&codec), 0);
    codecContext = avcodec_alloc_context3(codec);
    avcodec_parameters_to_context(codecContext, formatContext->streams[videoStreamId]->codecpar);
    avcodec_open2(codecContext, codec, nullptr);
}

Now when the file is opened and prepared, it is easy to obtain its content as a frame. The compressed packet is extracted from the input file, decoded by the codec, and a frame containing the decompressed image is available. Note that this example shows how to extract the frame from a still image. When working with videos, the extraction needs to run in a loop since there is a sequence of packets, similarly to this example. The idea way is to fill the decoding queue with input packets and then retrieve the decoded frames so that the decoder has still a work to do. It often needs several packets to decode a frame as some packets depend on each other in video compression algorithms. The official recommendations for using this API is here. In the example, still image has only one packet containing all information needed to get the frame. An empty packet is sent to the decoder after the initial one to inform it that no more data is coming. The decoder flushes after this and returns the frame without waiting for more input packets. Again, the frame and packet need to be allocated first. Then the packet is read from the file, sent to the decoder and the decoded frame is requested.

AVFrame *Decoder::getFrame()
{
    frame = av_frame_alloc();
    AVPacket *packet = av_packet_alloc();
    av_read_frame(formatContext, packet);
    avcodec_send_packet(codecContext, packet);
    avcodec_send_packet(codecContext, nullptr);
    avcodec_receive_frame(codecContext, frame);
    av_packet_free(&packet);
    return frame;
}

How to Use PSNR and SSIM in Filter Graph in FFmpeg?

After obtaining the frames of both images, they need to be processed in a tree-like structure called filter graph. FFmpeg uses this graph for various edits of the input frames. It can be used to apply effects like Gaussian blur, sharpening, or even do operations like scaling or rotation of the image. There is a lot of available filters. Among them are PSNR and SSIM ones which process two images calculating the given metric. The image below shows the entire setting in this tutorial.

This scheme shows the entire process of this tutorial. The images are opened, decoded, their frames are passed in the filter graph, and the results are extracted from the processed packets.

The filter graph first needs to be initialized and constructed. The necessary filters are initialized by their names in the FFmpeg API. The preferred pixel formats can be defined for the chosen operations. The graph is allocated. Then the filters are created based on the desired parameters. The input buffers need to know the information about the input videos or images. This can be done by getting the data from one of the input images. The assumption in this example is that the images are of the same format and resolution so the codec context is the same for both. When the filters are created, the links between them are added so the whole graph is connected in the way that the input frames go to both PSNR and SSIM filters. Each filter has input and output sockets. One socket cannot be connected with multiple links. That is the reason why the split filter is used. It duplicates the input to several outputs.

Comparator::Comparator(const AVCodecContext *sampleCodecContext)
{
    const AVFilter *bufferImageA  = avfilter_get_by_name("buffer");
    const AVFilter *bufferImageB  = avfilter_get_by_name("buffer");
    const AVFilter *bufferSinkPsnr = avfilter_get_by_name("buffersink");
    const AVFilter *bufferSinkSsim = avfilter_get_by_name("buffersink");   
    const AVFilter *psnrFilter  = avfilter_get_by_name("psnr");
    const AVFilter *ssimFilter  = avfilter_get_by_name("ssim");
    const AVFilter *splitFilterImageA  = avfilter_get_by_name("split");
    const AVFilter *splitFilterImageB  = avfilter_get_by_name("split");

    enum AVPixelFormat pix_fmts[] = { AV_PIX_FMT_NONE };

    filterGraph = avfilter_graph_alloc();

    std::stringstream arguments;
    arguments << "video_size=" << sampleCodecContext->width << "x" << sampleCodecContext->height <<
              ":pix_fmt=" << sampleCodecContext->pix_fmt <<
              ":time_base=" << 1 << "/" << sampleCodecContext->time_base.den <<
              ":pixel_aspect=" << sampleCodecContext->sample_aspect_ratio.num << "/" << sampleCodecContext->sample_aspect_ratio.den;
    avfilter_graph_create_filter(&bufferImageACtx, bufferImageA, "imageAIn", arguments.str().c_str(), nullptr, filterGraph);
    avfilter_graph_create_filter(&bufferImageBCtx, bufferImageB, "imageBIn", arguments.str().c_str(), nullptr, filterGraph); 

    avfilter_graph_create_filter(&bufferSinkPsnrCtx, bufferSinkPsnr, "outPsnr", nullptr, nullptr, filterGraph);
    avfilter_graph_create_filter(&bufferSinkSsimCtx, bufferSinkSsim, "outSsim", nullptr, nullptr, filterGraph);
    avfilter_graph_create_filter(&psnrFilterCtx, psnrFilter, "psnrFilter", nullptr, nullptr, filterGraph);
    avfilter_graph_create_filter(&ssimFilterCtx, ssimFilter, "ssimFilter", nullptr, nullptr, filterGraph);

    arguments.str("2");
    avfilter_graph_create_filter(&splitFilterImageACtx, splitFilterImageA, "splitFilterImageA", arguments.str().c_str(), nullptr, filterGraph);
    avfilter_graph_create_filter(&splitFilterImageBCtx, splitFilterImageB, "splitFilterImageB", arguments.str().c_str(), nullptr, filterGraph);

    av_opt_set_int_list(bufferSinkPsnrCtx, "pix_fmts", pix_fmts, AV_PIX_FMT_NONE, AV_OPT_SEARCH_CHILDREN);
    av_opt_set_int_list(bufferSinkSsimCtx, "pix_fmts", pix_fmts, AV_PIX_FMT_NONE, AV_OPT_SEARCH_CHILDREN);

    avfilter_link(bufferImageACtx, 0, splitFilterImageACtx, 0);
    avfilter_link(bufferImageBCtx, 0, splitFilterImageBCtx, 0);        
    avfilter_link(splitFilterImageACtx, 0, ssimFilterCtx, 0);
    avfilter_link(splitFilterImageBCtx, 0, ssimFilterCtx, 1);        
    avfilter_link(splitFilterImageACtx, 1, psnrFilterCtx, 0);
    avfilter_link(splitFilterImageBCtx, 1, psnrFilterCtx, 1);
    avfilter_link(psnrFilterCtx, 0, bufferSinkPsnrCtx, 0);
    avfilter_link(ssimFilterCtx, 0, bufferSinkSsimCtx, 0);
   
    avfilter_graph_config(filterGraph, nullptr);
}

The graph is ready and data can be inserted into it.

void Comparator::pushImageA(AVFrame *frame)
{
    av_buffersrc_add_frame_flags(bufferImageACtx, frame, AV_BUFFERSRC_FLAG_KEEP_REF); 
}

void Comparator::pushImageB(AVFrame *frame)
{ 
    av_buffersrc_add_frame_flags(bufferImageBCtx, frame, AV_BUFFERSRC_FLAG_KEEP_REF);
}

When the graph obtains the input data it can start the processing. The outputs can be obtained from the sink buffers. The outpu packets contain either the processed frame when some visual filters are applied or metadata containing the results of other filters like in this case. Again, in case of videos, a loop would be necessary here if processing multiple packets.

float Comparator::getMetric(AVFilterContext *filterContext, std::string dataName)
{
    av_buffersink_get_frame(filterContext, resultFrame);
    auto metricData = av_dict_get(resultFrame->metadata, dataName.c_str(), nullptr, 0)->value;
    float metricValue = std::stof(metricData);
    av_frame_unref(resultFrame);
    return metricValue;
}

void Comparator::printMetrics()
{
    std::cout << "PSNR: " << getMetric(bufferSinkPsnrCtx, "lavfi.psnr.psnr_avg") << std::endl;
    std::cout << "SSIM: " << getMetric(bufferSinkSsimCtx, "lavfi.ssim.All") << std::endl;
}

And that's it! This tutorial can be used as a nice intro into decoding and filtering of input images or videos. Let me know if you find any mistakes or if you want to ask anything in the comments. Hope this helps!

Published:2023-12-29 03:03:28

Keywords: how to, ffmpeg, video decoding, image similarity metrics
#ffmpeg #tutorial #cpp #programming

All the comments are reviewed before publishing! Meaningless posts are automatically refused.