文章目录基于多突触发放神经元的塔防智能体进化与认知研究Evolutionary Cognition of Tower Defense Agents Using Multi-Synaptic Firing Neurons1. 引言1. Introduction2. 实验方法2.1 智能体架构2.2 学习机制2.3 实验环境2. Methods2.1 Agent Architecture2.2 Learning Mechanisms2.3 Experimental Environment3. 实验结果3.1 适应度演化3.2 健康与能量变化3.3 神经网络活动3.4 相关性分析3. Results3.1 Fitness Evolution3.2 Health and Energy Dynamics3.3 Neural Activity3.4 Correlation Analysis4. 讨论4.1 适应度提升与收敛4.2 神经网络功能分化4.3 学习局限与未来方向4. Discussion4.1 Fitness Improvement and Convergence4.2 Neural Network Functional Differentiation4.3 Limitations and Future Directions5. 结论5. Conclusion致谢基于多突触发放神经元的塔防智能体进化与认知研究Evolutionary Cognition of Tower Defense Agents Using Multi-Synaptic Firing Neurons摘要本研究构建了一类基于多突触发放MSF神经元的脉冲神经网络智能体在塔防游戏环境中通过遗传算法与奖励调制STDPRSTDP协同进化。智能体包含运动组、大脑组和交流组模拟生物体的感知、决策与通信功能。实验持续1031.55秒共34代种群规模16。结果显示智能体成功学会了收集能源、规避塔伤害最佳适应度健康能量达到理论最大值200平均适应度稳定在154左右。大脑组脉冲发放率最高9.59 Hz运动组次之5.79 Hz交流组未激活。大脑脉冲率与能量呈微弱负相关r-0.005整体活动与生存状态关联不显著。进化过程使种群趋于一致学习收敛。本研究为脉冲神经网络在复杂任务中的适应性提供了实验依据并揭示了神经网络结构分化与功能特化的关系。AbstractThis study constructs a class of spiking neural network agents based on Multi-Synaptic Firing (MSF) neurons, which evolve collaboratively through genetic algorithms and reward-modulated STDP (RSTDP) in a tower defense game environment. Each agent comprises three neural groups: motor, brain, and communication, simulating perception, decision-making, and communication functions in biological organisms. The experiment lasted 1031.55 seconds across 34 generations with a population size of 16. Results show that agents successfully learned to collect energy and avoid tower damage, achieving a maximum fitness (health energy) of 200 (theoretical maximum) and a stable average fitness of about 154. The brain group exhibited the highest spike rate (9.59 Hz), followed by the motor group (5.79 Hz), while the communication group remained inactive. Brain spike rate showed a weak negative correlation with energy (r-0.005), and overall activity was not significantly associated with survival status. The evolutionary process drove the population toward convergence, indicating stable learning. This study provides experimental evidence for the adaptability of spiking neural networks in complex tasks and reveals the relationship between neural network structural differentiation and functional specialization.1. 引言脉冲神经网络Spiking Neural Networks, SNN因其生物合理性和时间信息处理能力在认知建模与神经形态计算领域受到广泛关注。传统SNN多采用LIF神经元而多突触发放Multi-Synaptic Firing, MSF神经元模型通过引入多延迟突触能够编码更丰富的时空模式更接近生物突触的多样性。本研究将MSF神经元与遗传算法Genetic Algorithm, GA和奖励调制STDPReward-modulated STDP, RSTDP相结合构建了具备感知、运动、决策和通信能力的智能体并使其在塔防游戏中自主进化以探究其在动态环境中的适应性及神经分化机制。1. IntroductionSpiking Neural Networks (SNNs) have attracted significant attention in cognitive modeling and neuromorphic computing due to their biological plausibility and temporal information processing capabilities. While traditional SNNs often use LIF neurons, the Multi-Synaptic Firing (MSF) neuron model, which incorporates multiple delayed synapses, can encode richer spatiotemporal patterns and better reflect the diversity of biological synapses. This study combines MSF neurons with Genetic Algorithms (GA) and Reward-modulated STDP (RSTDP) to construct agents capable of perception, motion, decision-making, and communication, allowing them to evolve autonomously in a tower defense game, thereby investigating their adaptability in dynamic environments and neural differentiation mechanisms.2. 实验方法2.1 智能体架构每个智能体包含三个神经网络组运动组输入环境传感器最近敌人距离/方向、能源点距离/方向、自身健康/能量输出转向和速度。大脑组输入状态向量输出高维动作攻击、撤退、收集、空闲用于决策学习。交流组接收其他智能体的广播信号输出自身广播信号实现群体通信。所有突触采用固定结构每个连接包含3个固定延迟1, 2, 3 ms连接密度30%初始权重在[-1,1]随机初始化。各网络组神经元数量运动组12输入6输出2大脑组20输入8输出4交流组8输入4输出4。2.2 学习机制在线学习RSTDP每个时间步根据奖励信号更新突触权重。奖励来自收集能源10和受到塔伤害-5。离线进化GA每30秒为一世代根据适应度健康能量选择前50%精英复制并加入高斯噪声标准差0.1生成后代替换后50%个体。2.3 实验环境世界尺寸80×40逻辑坐标。智能体两队各8个初始位于左下和右上。能源点10个排布于世界中央。塔2座位于世界中央上下边界。传感器范围50通信范围10。总模拟时间1031.55秒共34代。2. Methods2.1 Agent ArchitectureEach agent comprises three neural groups:Motor group: inputs from environmental sensors (nearest enemy distance/direction, energy node distance/direction, own health/energy), outputs steering and speed.Brain group: inputs state vector, outputs high-dimensional actions (attack, retreat, collect, idle) for decision learning.Communication group: receives broadcast signals from other agents, outputs its own broadcast signals for group communication.All synapses have a fixed structure: each connection includes three fixed delays (1, 2, 3 ms), connection density 30%, initial weights randomly initialized in [-1,1]. Neuron counts: motor group 12 (6 inputs, 2 outputs), brain group 20 (8 inputs, 4 outputs), communication group 8 (4 inputs, 4 outputs).2.2 Learning MechanismsOnline learning (RSTDP): synaptic weights are updated at each time step based on reward signals. Rewards come from collecting energy (10) and taking tower damage (-5).Offline evolution (GA): every 30 seconds constitutes a generation. The top 50% elites are selected based on fitness (health energy), copied, and mutated by adding Gaussian noise (std0.1) to produce offspring that replace the bottom 50%.2.3 Experimental EnvironmentWorld size: 80×40 logical coordinates.Agents: two teams of 8 each, initially placed at bottom-left and top-right.Energy nodes: 10, arranged along the center.Towers: 2, located at the center of the top and bottom boundaries.Sensor range: 50, communication range: 10.Total simulation time: 1031.55 seconds, 34 generations.3. 实验结果3.1 适应度演化最佳适应度200满健康满能量出现在第若干代。平均适应度从152稳步上升至154标准差逐渐缩小见图1表明种群收敛学习稳定。3.2 健康与能量变化能量值在游戏初期迅速下降能源被收集后期趋于平稳。两队健康值交替波动反映竞争与塔伤害的动态影响。3.3 神经网络活动平均发放率运动组5.79 Hz大脑组9.59 Hz交流组0.00 Hz图2。大脑组活动显著高于运动组符合其决策核心角色。交流组未激活可能因通信范围较小或权重未学到有效模式。3.4 相关性分析大脑脉冲率与健康nan因数据常数导致无法计算大脑脉冲率与能量微弱负相关r-0.005, p0.001说明能量高时大脑活动略低但无实际意义。广播信号与健康无数据因交流组未激活。3. Results3.1 Fitness EvolutionMaximum fitness: 200 (full health full energy), achieved in some generation.Average fitness: increased from 152 to 154, with decreasing standard deviation (Fig. 1), indicating population convergence and stable learning.3.2 Health and Energy DynamicsEnergy declined rapidly at the beginning (energy nodes were collected) and then stabilized.Health of the two teams fluctuated alternately, reflecting competition and tower damage.3.3 Neural ActivityMean spike rates: motor group 5.79 Hz, brain group 9.59 Hz, communication group 0.00 Hz (Fig. 2).Brain group activity was significantly higher than motor group, consistent with its decision-making role.Communication group remained inactive, possibly due to short communication range or failure to learn effective patterns.3.4 Correlation AnalysisBrain spike rate vs. health: nan (due to constant data in one variable)Brain spike rate vs. energy: weak negative correlation (r-0.005, p0.001), suggesting slightly lower brain activity when energy is high, though negligible.Broadcast signal vs. health: no data (communication group inactive).4. 讨论4.1 适应度提升与收敛最佳适应度达到理论上限200说明智能体掌握了最优生存策略同时维持满健康和满能量。平均适应度仅小幅提升但方差缩小表明种群在进化压力下趋于一致形成稳定策略。4.2 神经网络功能分化大脑组发放率最高印证其核心决策角色运动组次之负责执行交流组未激活可能因任务中通信并非必要或通信范围/权重学习未成功。这反映了神经网络结构分化的自然结果——大脑承担更多计算运动执行相对简单通信仅在必要时启用。4.3 学习局限与未来方向交流组未激活未来可增加通信范围、调整奖励设计如鼓励合作或引入通信的显式收益。大脑与健康/能量相关性弱可能因个体差异大或健康能量受外部因素主导。可引入更精细的奖励结构强化状态-动作关联。数据记录当前仅1代数据长期进化趋势需更多世代验证。4. Discussion4.1 Fitness Improvement and ConvergenceThe maximum fitness reached the theoretical upper limit of 200, indicating that agents learned the optimal survival strategy: maintaining full health and energy simultaneously. Average fitness improved only slightly, but the variance decreased, suggesting the population converged under evolutionary pressure, forming a stable strategy.4.2 Neural Network Functional DifferentiationThe brain group exhibited the highest spike rate, confirming its central decision-making role; the motor group ranked second, responsible for execution; the communication group remained inactive, possibly because communication was not essential for the task, or the communication range/weight learning was unsuccessful. This reflects natural functional differentiation in neural networks — the brain undertakes more computation, motor execution is simpler, and communication activates only when needed.4.3 Limitations and Future DirectionsCommunication group inactivity: future work could increase communication range, adjust reward design (e.g., encourage cooperation), or introduce explicit benefits for communication.Weak correlation between brain activity and health/energy: may be due to high individual variability or external factors dominating health and energy. More refined reward structures could strengthen state-action associations.Data recording: only 34 generations of data; long-term evolutionary trends require more generations for verification.5. 结论本研究成功实现了基于MSF神经元的SNN智能体在塔防游戏中的进化与学习。智能体通过遗传算法与RSTDP协同优化学会了能源收集与塔伤害规避达到最优适应度。神经网络功能分化明显大脑组承担主要决策运动组执行交流组未激活但仍为未来社会性行为研究奠定基础。实验结果验证了MSF神经元在复杂任务中的适应性并为神经形态计算提供了新的实验范式。5. ConclusionThis study successfully implemented SNN agents based on MSF neurons that evolved and learned in a tower defense game. Through the synergistic optimization of genetic algorithms and RSTDP, agents learned to collect energy and avoid tower damage, achieving optimal fitness. Neural functional differentiation was evident, with the brain group performing primary decision-making, the motor group executing actions, and the communication group remaining inactive yet providing a foundation for future studies on social behavior. The experimental results validate the adaptability of MSF neurons in complex tasks and offer a new experimental paradigm for neuromorphic computing.致谢感谢开源社区提供的工具支持SFML图形库、Matplotlib/Pandas数据可视化、FFmpeg视频编码以及所有参与调试与讨论的同行。本研究中的代码实现与算法设计受益于神经形态计算与进化机器人领域的公开文献在此一并致谢。We would like to thank the open‑source community for their tool support Matplotlib/Pandas data visualization, FFmpeg video encoding),. The code implementation and algorithm design in this study benefited from published literature in the fields of neuromorphic computing and evolutionary robotics, and we express our gratitude here.