实验室王贤栋论文被TCSVT接收! 2026-6-1


Abstract—Referring camouflaged object detection (Ref-COD) is an emerging and challenging task that aims to localize camouflaged objects in complex scenes based on a small set of referring images with salient objects. However, existing methods primarily focus on semantic alignment between the referring and camouflaged objects while overlooking scale discrepancies, leading to under-response when small references guide large objects and over-response when large references guide small ones. To overcome this limitation, we propose a novel Multiscale Interaction Network (MINet), explicitly designed to handle feature interactions across different scales in Ref-COD. MINet begins with a Dual-Source Fusion Block (DSFB) for semantic fusion between the referring and camouflaged features. Then, the Intra-scale Interaction Block (IIB) enhances local saliency within each scale by modeling contextual importance. Next, the Crossscale Interaction Block (CIB) performs offset-guided alignment to bridge spatial gaps in multiscale feature fusion. Finally, the Cross-scale Aggregation Decoder (CAD) integrates multiscale features, effectively decoding the aggregated information to produce accurate predictions. Extensive experiments on Ref-COD datasets demonstrate that our method achieves state-of-the-art performance, highlighting the importance of scale interaction in Ref-COD.


论文链接://ieeexplore.ieee.org/document/11419160.