| With the advancement of artificial intelligence, it has become increasingly difficult for the human eye to distinguish whether an image is natural or generated by generative models such as generative adversarial networks or diffusion models. However, existing generated image detection methods generally suffer from insufficient generalization capability. To address this issue, this paper proposes a multi-granularity attention fusion network (MAF-Net), whose core lies in the systematic integration of information from three granularity levels: 1) At the architectural granularity, a global-local dual-branch structure is designed to model the overall semantic context and local subtle forgery traces of an image, respectively. 2) At the feature representation granularity, the backbone network is enhanced by integrating selective kernel convolution (SKConv) to flexibly capture multi-scale forgery features, along with a lightweight efficient channel attention (ECA) module to strengthen the channel-wise response to critical forgery artifacts. 3) At the attentional granularity, a triple collaborative attention mechanism comprising spatial hard attention, self-attention, and global semantic-guided efficient interactive attention are constructed. A gating mechanism is further introduced to achieve adaptive fusion from regional focusing to cross-granularity feature integration. MAF-Net achieves average AP, ACC, and AUC scores of 99.3%, 95.1%, and 99.1%?on the ForenSynths dataset, and 99.5%, 91.3%, and 99.3%?on the GenImage dataset, demonstrating its excellent generalization capability and detection performance. |