即插即用系列 | AAAI 2026 | SAMC：结构感知多上下文块！多尺度分流与双注意力协同，精准捕获目标结构信息与多维度上下文关联！ | 代码分享

张开发

• 2026/5/23 9:26:02 • 15 分钟阅读

分享文章

即插即用系列 | AAAI 2026 | SAMC：结构感知多上下文块！多尺度分流与双注意力协同，精准捕获目标结构信息与多维度上下文关联！ | 代码分享

0. 前言本文介绍了SAMC结构感知多上下文块Structure-Aware Multi-Context Block其通过多尺度并行分流策略与通道-空间双注意力协同机制首次在超声标准平面识别领域实现浅层结构线索与深层语义特征的精准对齐与深度融合有效破解了传统方法因忽视结构信息而导致的特征判别力不足与边界感知模糊难题。将其作为即插即用模块轻松助力CNN、YOLO、Transformer等深度学习模型精准增强多尺度结构感知能力、强化关键区域特征响应让模型在面对低对比度图像、模糊解剖边界或类间相似性高等挑战性场景时依然能够保持清晰的结构辨识度与稳定的分类精度。专栏链接即插即用系列专栏链接可点击跳转免费订阅目录0. 前言1. SAMC注意力简介2. SAMC注意力原理与创新点 SAMC注意力基本原理 SAMC注意力创新点3. 适用范围与模块效果适用范围⚡模块效果4. SAMC模块代码实现1. SAMC注意力简介超声标准平面识别对于疾病筛查、器官评估和生物测量等临床任务至关重要。然而现有方法未能有效利用浅层结构信息且难以通过图像增强生成的对比样本捕捉细粒度语义差异最终导致超声标准平面对结构和判别细节的识别效果欠佳。为解决这些问题本文提出SEMC一种新颖的结构增强混合专家对比学习框架将结构感知特征融合与专家引导对比学习相结合。具体而言本文首先引入一种新颖的语义-结构融合模块SSFM通过有效对齐浅层和深层特征利用多尺度结构信息增强模型对细粒度结构细节的感知能力。然后设计了一种新颖的混合专家对比识别模块MCRM通过混合专家机制对多层次特征进行分层对比学习和分类进一步提升类间可分性和识别性能。更重要的是本文还构建了一个大规模、精细标注的包含六个标准平面的肝脏超声数据集。在我们内部数据集和两个公共数据集上的大量实验结果表明SEMC在各个指标上均优于最新的最先进方法。原始论文https://arxiv.org/pdf/2511.12559原始代码https://github.com/YanGuihao/SEMC2. SAMC注意力原理与创新点 SAMC注意力基本原理SAMCStructure-Aware Multi-Context Block结构感知多上下文块的核心设计理念是“结构优先、多维融合”——通过多尺度并行卷积、双注意力协同机制与跨维度特征聚合的三级递进架构主动感知并增强图像中的目标结构信息与上下文关联生成兼具高判别力与低冗余性的优质特征表示。与传统的单一尺度特征提取模块不同SAMC针对视觉任务中目标尺度异质性、结构边界模糊、背景干扰复杂等挑战构建了一套完整的“分流-增强-聚合”处理流程1多尺度特征分流单元首先将输入特征通过一组尺寸各异的并行卷积核如3×3、5×5、7×7进行同步处理将特征流拆分为多个分支——小尺度卷积核聚焦细粒度局部细节如边缘、纹理、微小目标中等尺度卷积核捕捉区域结构关联大尺度卷积核覆盖目标整体轮廓与全局上下文。这种设计确保了对不同尺度目标信息的全面捕获从根本上避免单一感受野带来的信息缺失问题。2通道-空间协同注意力增强单元在获得多尺度特征后SAMC引入双重注意力机制进行特征精炼。首先是通道注意力模块通过对特征图进行全局平均池化和全局最大池化捕捉各通道的语义重要性生成通道权重并作用于原始特征有效强化与目标结构相关的有效通道、抑制冗余背景通道。随后是空间注意力模块在通道加权后的特征基础上沿通道维度分别进行均值和最大值聚合再通过卷积层生成空间权重图精准定位目标核心区域与结构边界进一步聚焦关键空间位置、过滤无关背景干扰。3跨维度特征聚合与优化单元将经过注意力增强的多尺度特征进行拼接融合通过通道洗牌Channel Shuffle操作打破不同通道间的信息壁垒促进跨尺度、跨通道的特征交互与融合。最后通过逐点卷积对拼接后的特征进行维度压缩与信息整合消除冗余信息输出紧凑且高效的结构感知特征图为下游任务提供兼具结构辨识度、上下文关联性与计算友好性的特征支撑。 SAMC注意力创新点多尺度分流感知通过并行多尺度卷积结构同时捕捉目标从局部细节到全局形态的全尺度信息有效解决单一尺度特征提取对目标尺寸变化适应性不足的问题。双注意力协同机制创新性地将通道注意力与空间注意力串联协同先强化语义重要通道、再聚焦关键空间区域实现“语义导向空间精修”的双重增强显著提升对目标边缘与核心区域的感知能力。跨维度特征融合通过通道洗牌与逐点卷积的组合设计在促进跨通道信息交互的同时消除特征冗余保证增强效果的前提下维持较低的计算开销。即插即用的轻量化设计模块输入输出维度保持一致可直接嵌入YOLOv26的C3k2模块在不显著增加模型复杂度的前提下实现结构感知能力的全面提升。3. 适用范围与模块效果适用范围SAMC适用于通用视觉领域特别是需要强化结构感知与多尺度上下文建模的视觉任务如目标检测、语义分割、医学图像分析等。为何适用SAMC的多尺度分流机制使其能够灵活适配不同尺度的目标从微小病灶到大型器官都能有效覆盖其双注意力增强机制能够主动过滤复杂背景干扰强化目标边缘与核心区域的特征信号在相似目标并存的场景中实现精准区分跨维度聚合后的紧凑特征在保证判别力的同时控制计算开销满足实时性需求。特别在超声图像分析这类低对比度、边界模糊、类间相似性高的挑战性场景中SAMC能够有效提升模型的结构感知能力与判别精度。⚡模块效果根据原始论文SAMC模块的消融实验详见Table 4Ablation study of the proposed ACE, SAMC, and L_mc on the LP2025 dataset。该表格展示了在LP2025数据集上分别移除或添加SAMC子模块时的性能变化。模块效果性能SOTA。总结从Table 4可以看出在基线模型80.26% Accuracy基础上仅引入ACE模块后准确率提升至81.38%在此基础上进一步加入SAMC模块后准确率达到81.51%表明SAMC模块通过结构感知增强有效提升了模型对解剖结构特征的捕获能力与ACE模块形成良好的互补效应。4. SAMC模块代码实现以下为SAMC模块的官方pytorch实现代码import torch import torch.nn as nn from timm.models.layers import trunc_normal_tf_ from timm.models.helpers import named_apply from functools import partial import math def gcd(a, b): while b: a, b b, a % b return a # Other types of layers can go here (e.g., nn.Linear, etc.) def _init_weights(module, name, scheme): if isinstance(module, nn.Conv2d) or isinstance(module, nn.Conv3d): if scheme normal: nn.init.normal_(module.weight, std.02) if module.bias is not None: nn.init.zeros_(module.bias) elif scheme trunc_normal: trunc_normal_tf_(module.weight, std.02) if module.bias is not None: nn.init.zeros_(module.bias) elif scheme xavier_normal: nn.init.xavier_normal_(module.weight) if module.bias is not None: nn.init.zeros_(module.bias) elif scheme kaiming_normal: nn.init.kaiming_normal_(module.weight, modefan_out, nonlinearityrelu) if module.bias is not None: nn.init.zeros_(module.bias) else: # efficientnet like fan_out module.kernel_size[0] * module.kernel_size[1] * module.out_channels fan_out // module.groups nn.init.normal_(module.weight, 0, math.sqrt(2.0 / fan_out)) if module.bias is not None: nn.init.zeros_(module.bias) elif isinstance(module, nn.BatchNorm2d) or isinstance(module, nn.BatchNorm3d): nn.init.constant_(module.weight, 1) nn.init.constant_(module.bias, 0) elif isinstance(module, nn.LayerNorm): nn.init.constant_(module.weight, 1) nn.init.constant_(module.bias, 0) def act_layer(act, inplaceFalse, neg_slope0.2, n_prelu1): # activation layer act act.lower() if act relu: layer nn.ReLU(inplace) elif act relu6: layer nn.ReLU6(inplace) elif act leakyrelu: layer nn.LeakyReLU(neg_slope, inplace) elif act prelu: layer nn.PReLU(num_parametersn_prelu, initneg_slope) elif act gelu: layer nn.GELU() elif act hswish: layer nn.Hardswish(inplace) else: raise NotImplementedError(activation layer [%s] is not found % act) return layer def channel_shuffle(x, groups): batchsize, num_channels, height, width x.data.size() channels_per_group num_channels // groups # reshape x x.view(batchsize, groups, channels_per_group, height, width) x torch.transpose(x, 1, 2).contiguous() # flatten x x.view(batchsize, -1, height, width) return x # Multi-scale depth-wise convolution (MSDC) class MSDC(nn.Module): def __init__(self, in_channels, kernel_sizes, stride, activationleakyrelu, dw_parallelTrue): super(MSDC, self).__init__() self.in_channels in_channels # 确保 kernel_sizes 是列表 if isinstance(kernel_sizes, int): kernel_sizes [kernel_sizes] elif isinstance(kernel_sizes, tuple): kernel_sizes list(kernel_sizes) self.kernel_sizes kernel_sizes self.activation activation self.dw_parallel dw_parallel self.dwconvs nn.ModuleList([ nn.Sequential( nn.Conv2d(self.in_channels, self.in_channels, kernel_size, stride, kernel_size // 2, groupsself.in_channels, biasFalse), nn.BatchNorm2d(self.in_channels), act_layer(self.activation, inplaceTrue) ) for kernel_size in self.kernel_sizes ]) self.init_weights(normal) def init_weights(self, scheme): named_apply(partial(_init_weights, schemescheme), self) def forward(self, x): # Apply the convolution layers in a loop outputs [] for dwconv in self.dwconvs: dw_out dwconv(x) outputs.append(dw_out) if self.dw_parallel False: x x dw_out # You can return outputs based on what you intend to do with them return outputs class MSCB(nn.Module): Multi-Scale Convolution Block (MSCB): Expands channels, applies depthwise convolutions with different kernel sizes (MSDC), and then compresses channels to extract multi-scale features. def __init__(self, in_channels, out_channels, stride, kernel_sizes[1, 3, 5], expansion_factor2, dw_parallelTrue, addTrue, activationleakyrelu): super(MSCB, self).__init__() self.in_channels in_channels self.out_channels out_channels self.stride stride # 确保 kernel_sizes 是列表 if isinstance(kernel_sizes, int): kernel_sizes [kernel_sizes] elif isinstance(kernel_sizes, tuple): kernel_sizes list(kernel_sizes) self.kernel_sizes kernel_sizes self.expansion_factor expansion_factor self.dw_parallel dw_parallel self.add add self.activation activation self.n_scales len(self.kernel_sizes) assert self.stride in [1, 2] self.use_skip_connection True if self.stride 1 else False self.ex_channels int(self.in_channels * self.expansion_factor) self.pconv1 nn.Sequential( # Pointwise 1x1 nn.Conv2d(self.in_channels, self.ex_channels, 1, 1, 0, biasFalse), nn.BatchNorm2d(self.ex_channels), act_layer(self.activation, inplaceTrue) ) self.msdc MSDC(self.ex_channels, self.kernel_sizes, self.stride, self.activation, dw_parallelself.dw_parallel) if self.add True: self.combined_channels self.ex_channels * 1 else: self.combined_channels self.ex_channels * self.n_scales self.pconv2 nn.Sequential( nn.Conv2d(self.combined_channels, self.out_channels, 1, 1, 0, biasFalse), nn.BatchNorm2d(self.out_channels), ) if self.use_skip_connection and (self.in_channels ! self.out_channels): self.conv1x1 nn.Conv2d(self.in_channels, self.out_channels, 1, 1, 0, biasFalse) self.init_weights(normal) def init_weights(self, scheme): named_apply(partial(_init_weights, schemescheme), self) def forward(self, x): pout1 self.pconv1(x) msdc_outs self.msdc(pout1) if self.add True: dout 0 for dwout in msdc_outs: dout dout dwout else: dout torch.cat(msdc_outs, dim1) dout channel_shuffle(dout, gcd(self.combined_channels, self.out_channels)) out self.pconv2(dout) if self.use_skip_connection: if self.in_channels ! self.out_channels: x self.conv1x1(x) return x out else: return out # Multi-scale Convolution Block (MSCB) def MSCBLayer(in_channels, out_channels, n1, stride1, kernel_sizes[1, 3, 5], expansion_factor2, dw_parallelTrue, addTrue, activationleakyrelu): Create a sequence of multiple MSCB modules (an MSCB layer). Args: - in_channels: Number of input channels. - out_channels: Number of output channels. - n: Number of stacked MSCB modules. - stride: Stride of the first module (stride2 can be used for downsampling). - kernel_sizes: List of kernel sizes for multi-scale convolutions, e.g., [1, 3, 5]. - expansion_factor: Channel expansion factor. - dw_parallel: Whether to apply multi-scale depthwise convolutions in parallel (True for parallel, False for sequential with residual connection). - add: Fusion mode for multi-scale results; True for additive fusion, False for channel concatenation. - activation: Type of activation function, e.g., relu6. convs [] mscb MSCB( in_channels, out_channels, stride, kernel_sizeskernel_sizes, expansion_factorexpansion_factor, dw_paralleldw_parallel, addadd, activationactivation ) convs.append(mscb) if n 1: for i in range(1, n): mscb MSCB( out_channels, out_channels, 1, kernel_sizeskernel_sizes, expansion_factorexpansion_factor, dw_paralleldw_parallel, addadd, activationactivation ) convs.append(mscb) conv nn.Sequential(*convs) return conv class CAB(nn.Module): def __init__(self, in_channels, out_channelsNone, ratio16, activationleakyrelu): super(CAB, self).__init__() self.in_channels in_channels self.out_channels out_channels if self.in_channels ratio: ratio self.in_channels self.reduced_channels self.in_channels // ratio if self.out_channels is None: self.out_channels in_channels self.avg_pool nn.AdaptiveAvgPool2d(1) self.max_pool nn.AdaptiveMaxPool2d(1) self.activation act_layer(activation, inplaceTrue) self.fc1 nn.Conv2d(self.in_channels, self.reduced_channels, 1, biasFalse) self.fc2 nn.Conv2d(self.reduced_channels, self.out_channels, 1, biasFalse) self.sigmoid nn.Sigmoid() self.init_weights(normal) def init_weights(self, scheme): named_apply(partial(_init_weights, schemescheme), self) def forward(self, x): avg_pool_out self.avg_pool(x) avg_out self.fc2(self.activation(self.fc1(avg_pool_out))) max_pool_out self.max_pool(x) max_out self.fc2(self.activation(self.fc1(max_pool_out))) out avg_out max_out return self.sigmoid(out) class SAB(nn.Module): def __init__(self, kernel_size7): super(SAB, self).__init__() assert kernel_size in (3, 7, 11), kernel must be 3 or 7 or 11 padding kernel_size // 2 self.conv nn.Conv2d(2, 1, kernel_size, paddingpadding, biasFalse) self.sigmoid nn.Sigmoid() self.init_weights(normal) def init_weights(self, scheme): named_apply(partial(_init_weights, schemescheme), self) def forward(self, x): avg_out torch.mean(x, dim1, keepdimTrue) max_out, _ torch.max(x, dim1, keepdimTrue) x torch.cat([avg_out, max_out], dim1) x self.conv(x) return self.sigmoid(x) class SAMC(nn.Module): Spatial Attention Multi-scale Convolution (SAMC) Module 支持两种输入格式 1. 4D tensor: (B, C, H, W) - 标准图像特征输入 2. 3D sequence: (B, N, C) H, W参数 - 序列化特征输入与CGTA模块保持一致的接口设计 def __init__(self, in_channels, out_channels, kernel_sizes[1, 3, 5], expansion_factor2, dw_parallelTrue, addTrue, activationleakyrelu, cab_ratio16): Args: in_channels: 输入通道数 out_channels: 输出通道数 kernel_sizes: MSCB中使用的多尺度卷积核大小列表支持int、list、tuple expansion_factor: MSCB中的通道扩展因子 dw_parallel: 是否并行执行多尺度深度卷积 add: 多尺度结果融合方式True为相加False为通道拼接 activation: 激活函数类型 cab_ratio: CAB通道注意力压缩比例 super(SAMC, self).__init__() self.in_channels in_channels self.out_channels out_channels # 确保 kernel_sizes 是列表 if isinstance(kernel_sizes, int): kernel_sizes [kernel_sizes] elif isinstance(kernel_sizes, tuple): kernel_sizes list(kernel_sizes) self.kernel_sizes kernel_sizes # CAB: Channel Attention Block self.cab CAB(in_channels, out_channels, ratiocab_ratio, activationactivation) # SAB: Spatial Attention Block self.sab SAB() # MSCB: Multi-scale Convolution Block self.mscb MSCB( in_channels, out_channels, stride1, kernel_sizesself.kernel_sizes, expansion_factorexpansion_factor, dw_paralleldw_parallel, addadd, activationactivation ) self.init_weights(normal) def init_weights(self, scheme): named_apply(partial(_init_weights, schemescheme), self) def forward(self, x, HNone, WNone): 前向传播支持两种输入格式 Args: x: 输入特征 - 如果H和W为None则x应为4D tensor (B, C, H, W) - 如果H和W不为None则x应为3D tensor (B, N, C)其中NH*W H: 空间高度当输入为序列时提供 W: 空间宽度当输入为序列时提供 Returns: 输出特征格式与输入格式对应 - 输入为4D时返回4D tensor (B, C, H, W) - 输入为3D序列时返回3D tensor (B, N, C) # 独立测试模式输入为序列时需要提供H和W参数 if H is not None and W is not None: return self.forward_seq(x, H, W) # 默认模式输入为4D tensor (B, C, H, W) # 确保输入是4D if x.dim() ! 4: raise ValueError(fExpected 4D input (B, C, H, W), got {x.dim()}D tensor. fIf using sequence input, please provide H and W parameters.) # 标准4D forward # CAB: 通道注意力 cab_out self.cab(x) x_cab cab_out * x # SAB: 空间注意力 sab_out self.sab(x_cab) x_sab sab_out * x_cab # MSCB: 多尺度卷积 out self.mscb(x_sab) return out def forward_seq(self, x_seq, H, W): 序列输入模式的前向传播 Args: x_seq: 输入序列 (B, N, C)其中 N H * W H: 空间高度 W: 空间宽度 Returns: 输出序列 (B, N, C_out) B, N, C x_seq.shape # 验证N H * W if N ! H * W: raise ValueError(fSequence length N{N} does not match H*W{H*W}) # 将序列重塑为4D tensor: (B, C, H, W) x_4d x_seq.permute(0, 2, 1).reshape(B, C, H, W).contiguous() # 执行4D forward out_4d self.forward(x_4d) # 将输出重塑回序列格式: (B, N, C_out) out_seq out_4d.flatten(2).transpose(1, 2).contiguous() return out_seq if __name__ __main__: device torch.device(cuda:0 if torch.cuda.is_available() else cpu) print( * 50) print(测试1: 4D tensor输入模式 (B, C, H, W)) print( * 50) # 4D tensor输入测试 x_4d torch.randn(1, 64, 32, 32).to(device) model SAMC(64, 64).to(device) y_4d model(x_4d) print(输入特征维度, x_4d.shape) print(输出特征维度, y_4d.shape) print() print( * 50) print(测试2: 3D序列输入模式 (B, N, C) H, W) print( * 50) # 3D序列输入测试与CGTA接口一致 B, H, W, C 1, 32, 32, 64 x_seq torch.randn(B, H * W, C).to(device) y_seq model(x_seq, H, W) print(输入序列维度, x_seq.shape) print(输出序列维度, y_seq.shape) print() print( * 50) print(测试3: 测试 kernel_sizes 参数为 int 类型) print( * 50) # 测试 kernel_sizes 为 int 的情况YOLO配置文件解析时可能的情况 model_int_kernel SAMC(64, 64, kernel_sizes3).to(device) y_int_kernel model_int_kernel(x_4d) print(kernel_sizes3 测试通过) print(f输入: {x_4d.shape} - 输出: {y_int_kernel.shape}) print() print( * 50) print(测试4: 测试 kernel_sizes 参数为 tuple 类型) print( * 50) # 测试 kernel_sizes 为 tuple 的情况 model_tuple_kernel SAMC(64, 64, kernel_sizes(1, 3, 5)).to(device) y_tuple_kernel model_tuple_kernel(x_4d) print(kernel_sizes(1, 3, 5) 测试通过) print(f输入: {x_4d.shape} - 输出: {y_tuple_kernel.shape}) print() print( * 50) print(测试5: 验证两种输入模式输出一致性) print( * 50) # 创建相同的输入数据 x_4d_test torch.randn(2, 64, 32, 32).to(device) model_test SAMC(64, 64).to(device) model_test.eval() with torch.no_grad(): # 4D模式输出 out_4d model_test(x_4d_test) # 转换为序列并测试序列模式 x_seq_test x_4d_test.flatten(2).transpose(1, 2) out_seq model_test(x_seq_test, 32, 32) out_4d_from_seq out_seq.transpose(1, 2).reshape(2, 64, 32, 32) # 计算差异 diff torch.abs(out_4d - out_4d_from_seq).max().item() print(f两种模式输出的最大差异: {diff:.2e}) print(f输出是否一致: {diff 1e-6})结合自己的思路可将其即插即用至任何模型做结构创新设计该模块博主已成功嵌入至YOLO26模型中可订阅博主YOLO系列算法改进或YOLO26自研改进专栏YOLO系列算法改进专栏链接、YOLO26自研改进系列专栏

即插即用系列 | AAAI 2026 | SAMC：结构感知多上下文块！多尺度分流与双注意力协同，精准捕获目标结构信息与多维度上下文关联！ | 代码分享

最新文章

Windows Cleaner：免费开源工具，高效解决C盘空间不足问题

WarcraftHelper终极指南：魔兽争霸3全版本兼容性修复与性能优化完整方案

除了RTKLIB，还有哪些轻量级工具能一键把坐标序列转KML？实测3种方案对比

第四篇：Vibe Coding 深度解析（四）：生产级落地的工程化体系与避坑指南

python passlib

5分钟快速上手：xrdp开源远程桌面服务器完整配置指南

推荐文章

相关文章

分享文章

更多文章

自动化Windows系统管理解决方案：一站式系统优化与批量配置实战指南

企业数据流通与敏捷API交付实战（一）：ETL、CDC与API调用对比

Typora文档创作伴侣：一键嵌入忍者像素绘卷生成的技术插图

新手如何通过快马平台实践claude code skills教程中的基础编程概念

告别插件切换！一款满足你所有挖洞需求的浏览器插件助力高效挖洞

别再只盯着H.265了！手把手教你用FFmpeg 6.x + SVT-AV1编码你的第一个AV1视频（附性能对比）

3分钟打造你的专属音乐殿堂：foobox-cn让foobar2000焕然一新

基于深度学习CNN的yolo26无人机沙滩小目标检测第10631期无人机航拍沙滩场景下的小目标检测研究

Awoo Installer：重新定义Switch游戏安装的技术哲学

傅里叶港股上市：市值90亿港元 10个月营收2.8亿亏5178万

Mars3D与Cesium结合：3DTiles数据可视化全流程解析（含示例项目）

从‘能用’到‘优雅’：在UE项目里设计可扩展插件，TScriptInterface是你的秘密武器