Fight Fake News Images With Copy-Move Forgery Detection

With rapid advances in digital image processing software, there is also a widespread development of tools and techniques for image forgeries. Digital images are easy to manipulate due to the availability of powerful image editing and processing softwares.  One of the commonly used image forgery technique is the copy move forgery. This consists of copying some part of image region and pasting it into another region, in order to maliciously hide or mask some sensible information. All of these manipulation techniques leave specific artifacts on the image which are not visible to the human eye, but are detectable with specialized softwares like Adobe Photoshop.

Adobe recently presented their own detection method aimed to resolve this issue by harnessing the power of artificial intelligence and machine learning. They can now accurately identify whether an image has been digitally manipulated or not.

Copy-Move Forgery Detection

The technique we are presenting in this article is focusing on copy-move forgery detection, where objects of a photograph are moved from one place into another.

The main goal of image forensics is:

  1. To determine the authenticity of a digital image
  2. To decide if an image has been counterfeit or not
  3. To mark the image parts which have the highest probability of image manipulation.

Adopting a reliable algorithm which can detect altered images is the first step. From here, you can authenticate image credibility in fields like medical records, newspaper, or a court of law. One such reliable method should be able to detect copy move forgeries even if the depicted image area has been retouched (scaled, blurred etc.) prior pasting into another region.

The Basics of Copy-Move Forgery Detection

Almost all of the developed forgery detection methods have something in common: each of them start to analyze the image by dividing it into fixed sized blocks and assumes that forged segments will likely be a connected component rather than a collective of individual patches of pixels. Also, each detection will give some false positives, in other terms incorrect matching areas, but an image might be considered forged in case the false positive values do not exceed a certain threshold.

The algorithm we are presenting in this article consist of finding resembling blocks with nearly or completely identical similarities. The easiest way to detect these areas is an exhaustive search, but this can only be done for very small images because it is computationally costly. Moreover, it fails if the image is further processed. To make the detection more efficient, we divide the image into equally sized overlapping blocks. Once the detection is done in case of image forgery, we should obtain robust identical features combined with a detection score.

The idea of robust match detection consists of getting a robust pixel representation of the analyzed blocks instead of a pixel perfect comparison. This is where the Discrete Cosine Transformation comes into discussion since DCT creates a representation of a signal as a sum of cosines of different amplitudes. If this sounds too vague, the idea behind DCT is to reduce the pixel representation of an image to its basic components by discarding the irrelevant high frequency components. Instead of looking at an image as a grid of pixels, we can interpret the image as a two dimensional signal where the provided input values produce some outputs.

Copy-Move Forgery Detection DCT
Fig. 1: Two dimensional DCT applied to 8×8 grayscale image

 

Going backwards the Inverse DCT will generate exactly the same image data as the ones we feed into DCT.

Because we are comparing the quantized values of DCT coefficients instead of the pixel representation, the algorithm might find too many matching blocks (false positives). Thus, the algorithm also looks at the mutual positions of each matching block pair and outputs a specific block pair only if there are many other matching pairs in the same mutual position (they have the same shift vector).

The Detection Method

We can describe the copy-move forgery detection algorithm in the following steps:

  • Convert the RGB image to YUV color space.
  • Divide the R,G,B,Y components into fixed-sized blocks.
  • Obtain each block R,G,B and Y components.
  • Calculate each block R,G,B and Y components DCT (Discrete Cosine Transform) coefficients.
  • Extract features from the obtained DCT coefficients and save it into a matrix. The matrix rows will contain the blocks top-left coordinate position plus the DCT coefficient. The matrix will have (M − b + 1)(N − b + 1)x9 elements.
  • Sort the features in lexicographic order.
  • Search for similar pairs of blocks. Because identical blocks are most probably neighbors, after ordering them in lexicographic order we need to apply a specific threshold to filter out the false positive detections. If the distance between two neighboring blocks is smaller than a predefined threshold, the blocks are considered as a pair of candidates for the forgery.
  • For each pair of candidates, compute the cumulative number of shift vectors (how many times the same block is detected). If that number is greater than a predefined threshold, the corresponding regions are considered forged.
copy-move forgery detection example
Fig. 2: Original image (left), forged image (center), analyzed image (right)

 

The algorithm produces better results both in terms of performance and detection accuracy if the detection window is larger, for example 16×16 pixel. In case the detection window is smaller, some image parts may be considered identical even if no forgery methods have been applied on the particular zones. However, there are certain cases when a smaller detection window is required for better accuracy, so each image is suited for a particular use case.

Why Copy-Move Forgery Detection is Important in Image Processing

Today’s image manipulation techniques and software are so advanced that they cannot be detected by the human eye. Fake news and digitally manipulated images are widespread issues in social media. As I mentioned earlier, a court of law needs to be certain a photo being used as evidence has not been manipulated. This is the case when this method can help a lot, since it will analyze the image by its raw data, which is the ultimate way to eliminate any doubts of digital image tampering. Of course it has its limitation too since it does not work out of the box for every kind of image manipulation, but for copy-move forgeries, even with slight retouching, it can do amazing things.

This solution is a good foundation to filter out counterfeited images. Integrated into a robust API like Filestack and combined with their custom machine learning model service, the use cases are endless.

Please visit our GitHub for the full details on copy-move forgery detection.

Read More →