•
Citation
Shouxin Liu, Yihang Wang, Junfeng Guo, Hongran Zeng, In-Kwon Lee, Yushu Zhang, and Xiaowei Li, "Text-aware CLIP: A Generalized Method for Deepfake and Fake News Detection," IEEE Transactions on Multimedia (accepted), 2026.
•
Abstract
Existing research on fake news detection and deep fake detection is often regarded as two independent fields, resulting in a lack of generalizability in detection methods and redundant development of similar technologies in both fields, increasing the costs of technological development and application. We argue that fake news detection and deepfake detection essentially fall under the category of misinformation detection, and both can be addressed using binary classification strategy to identify intentionally fabricated or falsified information. Therefore, we propose a generalized method for misinformation detection, aiming to deal with fake news and deepfake detection tasks using the same model. Specifically, we take full advantage of the advanced structural benefits of visual-language models and the complementary nature of image-text information, designing a generalized deepfake and fake news detection method based on visual-language model. We first use the image encoder to generate deeply interactive image-text information, and then introduce learnable textual prompts into the model to obtain the feedback from the visual-language model regarding the prompts. Next, a confidence calculation operation is performed by querying the classification label vectors and the model's feedback, resulting in the output classification confidence to determine whether the information is fake or real. Finally, the classifier is used to jointly predict the fused real-fake feedback information to improve the accuracy and robustness of misinformation detection. Experimental results on public deepfake and fake news detection datasets demonstrate that the proposed method shows better detection performance than the baseline models.
