实验室王贤栋论文被TCSVT接收! 2026-6-1
Abstract—Referring camouflaged object detection (Ref-COD)
is an emerging and challenging task that aims to localize
camouflaged objects in complex scenes based on a small set of
referring images with salient objects. However, existing methods
primarily focus on semantic alignment between the referring
and camouflaged objects while overlooking scale discrepancies,
leading to under-response when small references guide large
objects and over-response when large references guide small
ones. To overcome this limitation, we propose a novel Multiscale Interaction Network (MINet), explicitly designed to handle
feature interactions across different scales in Ref-COD. MINet
begins with a Dual-Source Fusion Block (DSFB) for semantic
fusion between the referring and camouflaged features. Then, the
Intra-scale Interaction Block (IIB) enhances local saliency within
each scale by modeling contextual importance. Next, the Crossscale Interaction Block (CIB) performs offset-guided alignment
to bridge spatial gaps in multiscale feature fusion. Finally, the
Cross-scale Aggregation Decoder (CAD) integrates multiscale
features, effectively decoding the aggregated information to
produce accurate predictions. Extensive experiments on Ref-COD
datasets demonstrate that our method achieves state-of-the-art
performance, highlighting the importance of scale interaction in
Ref-COD.
论文链接://ieeexplore.ieee.org/document/11419160.

