查看更多>>摘要:Polarity-dependent orientation illusions are manifested in figures in which the impression of target orientation does not depend only on geometrical relations between the elements of the figure, but also on the relations between their luminances, that is, on luminance polarities. The best-known phenomenon belonging to this class of effects is the Munsterberg/Cafe ' Wall illusion. In this paper a considerable number of examples of this type of illusions are presented, many of which are novel variants. A two-level convolutional model of such illusions is introduced, in which the first level corresponds to the stimulus input and the second level contains units fashioned after simple cells in V1, whose spatial patterns of activity represent the model's reaction to the stimulus. The main finding of numerous simulations of the model is that the figures inducing illusory impressions of tilt share a common spatial pattern of neural activation, labeled 'oblique clusters', which is absent in related nonillusory figures. Furthermore, a similar pattern is also present in simulations of figures which induce veridical impressions of tilt. The simulations suggest that the neural basis of perception of a specific degree of tilt may not be the activity of neurons tuned narrowly to that particular degree of tilt, but rather the presence of certain signature spatial patterns of activity of populations of neurons.
查看更多>>摘要:Polarity-dependent orientation illusions are manifested in figures in which the impression of target orientation does not depend only on geometrical relations between the elements of the figure, but also on the relations between their luminances, that is, on luminance polarities. The best-known phenomenon belonging to this class of effects is the Munsterberg/Cafe ' Wall illusion. In this paper a considerable number of examples of this type of illusions are presented, many of which are novel variants. A two-level convolutional model of such illusions is introduced, in which the first level corresponds to the stimulus input and the second level contains units fashioned after simple cells in V1, whose spatial patterns of activity represent the model's reaction to the stimulus. The main finding of numerous simulations of the model is that the figures inducing illusory impressions of tilt share a common spatial pattern of neural activation, labeled 'oblique clusters', which is absent in related nonillusory figures. Furthermore, a similar pattern is also present in simulations of figures which induce veridical impressions of tilt. The simulations suggest that the neural basis of perception of a specific degree of tilt may not be the activity of neurons tuned narrowly to that particular degree of tilt, but rather the presence of certain signature spatial patterns of activity of populations of neurons.
Heinke, DietmarWachman, Petervan Zoest, WieskeLeek, E. Charles...
12页
查看更多>>摘要:Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.
Heinke, DietmarWachman, Petervan Zoest, WieskeLeek, E. Charles...
12页
查看更多>>摘要:Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.
查看更多>>摘要:Radial motion is perceived as faster than linear motion when local spatiotemporal properties are matched. This radial speed bias (RSB) is thought to occur because radial motion is partly interpreted as motion-in-depth. Geometry dictates that a fixed amount of radial expansion at increasing eccentricities is consistent with smaller motion in depth, so it is perhaps surprising that the impact of eccentricity on RSB has not been examined. With this issue in mind, across 3 experiments we investigated the RSB as a function of eccentricity. In a 2IFC task, participants judged which of a linear (test - variable speed) or radial (reference - 2 or 4 degrees/s) stimulus appeared to move faster. Linear and radial stimuli comprised 4 Gabor patches arranged left, right, above and below fixation at varying eccentricities (3.5 degrees-14 degrees). For linear stimuli, Gabors all drifted left or right, whereas for radial stimuli Gabors drifted towards or away from the centre. The RSB (difference in perceived speeds between matched linear and radial stimuli) was recovered from fitted psychometric functions. Across all 3 experiments we found that the RSB decreased with eccentricity but this tendency was less marked beyond 7 degrees -i.e. at odds with the geometry, the effect did not continue to decrease as a function of eccentricity. This was true irrespective of whether stimuli were fixed in size (Experiment 1) or varied in size to account for changes in spatial scale across the retina (Experiment 2). It was also true when we removed conflicting stereo cues via monocular viewing (Experiment 3). To further investigate our data, we extended a previous model of speed perception, which suggests perceived motion for such stimuli reflects a balance between two opposing perceptual interpretations, one for motion in depth and the other for object deformation. We propose, in the context of this model, that our data are consistent with placing greater weight on the motion in depth interpretation with increasing eccentricity and this is why the RSB does not continue to reduce in line with purely geometric constraints.
查看更多>>摘要:Radial motion is perceived as faster than linear motion when local spatiotemporal properties are matched. This radial speed bias (RSB) is thought to occur because radial motion is partly interpreted as motion-in-depth. Geometry dictates that a fixed amount of radial expansion at increasing eccentricities is consistent with smaller motion in depth, so it is perhaps surprising that the impact of eccentricity on RSB has not been examined. With this issue in mind, across 3 experiments we investigated the RSB as a function of eccentricity. In a 2IFC task, participants judged which of a linear (test - variable speed) or radial (reference - 2 or 4 degrees/s) stimulus appeared to move faster. Linear and radial stimuli comprised 4 Gabor patches arranged left, right, above and below fixation at varying eccentricities (3.5 degrees-14 degrees). For linear stimuli, Gabors all drifted left or right, whereas for radial stimuli Gabors drifted towards or away from the centre. The RSB (difference in perceived speeds between matched linear and radial stimuli) was recovered from fitted psychometric functions. Across all 3 experiments we found that the RSB decreased with eccentricity but this tendency was less marked beyond 7 degrees -i.e. at odds with the geometry, the effect did not continue to decrease as a function of eccentricity. This was true irrespective of whether stimuli were fixed in size (Experiment 1) or varied in size to account for changes in spatial scale across the retina (Experiment 2). It was also true when we removed conflicting stereo cues via monocular viewing (Experiment 3). To further investigate our data, we extended a previous model of speed perception, which suggests perceived motion for such stimuli reflects a balance between two opposing perceptual interpretations, one for motion in depth and the other for object deformation. We propose, in the context of this model, that our data are consistent with placing greater weight on the motion in depth interpretation with increasing eccentricity and this is why the RSB does not continue to reduce in line with purely geometric constraints.
查看更多>>摘要:In numerous activities, humans need to attend to multiple sources of visual information at the same time. Although several recent studies support the evidence of this ability, the mechanism of multi-item attentional processing is still a matter of debate and has not been investigated much by previous computational models. Here, we present a neuro-computational model aiming to address specifically the question of how subjects attend to two items that deviate defined by feature and location. We simulate the experiment of Adamo et al. (2010) which required subjects to use two different attentional control sets, each a combination of color and location. The structure of our model is composed of two components "attention" and "decision-making". The important aspect of our model is its dynamic equations that allow us to simulate the time course of processes at a neural level that occur during different stages until a decision is made. We analyze in detail the conditions under which our model matches the behavioral and EEG data from human subjects. Consistent with experimental findings, our model supports the hypothesis of attending to two control settings concurrently. In particular, our model proposes that initially, feature-based attention operates in parallel across the scene, and only in ongoing processing, a selection by the location takes place.
查看更多>>摘要:In numerous activities, humans need to attend to multiple sources of visual information at the same time. Although several recent studies support the evidence of this ability, the mechanism of multi-item attentional processing is still a matter of debate and has not been investigated much by previous computational models. Here, we present a neuro-computational model aiming to address specifically the question of how subjects attend to two items that deviate defined by feature and location. We simulate the experiment of Adamo et al. (2010) which required subjects to use two different attentional control sets, each a combination of color and location. The structure of our model is composed of two components "attention" and "decision-making". The important aspect of our model is its dynamic equations that allow us to simulate the time course of processes at a neural level that occur during different stages until a decision is made. We analyze in detail the conditions under which our model matches the behavioral and EEG data from human subjects. Consistent with experimental findings, our model supports the hypothesis of attending to two control settings concurrently. In particular, our model proposes that initially, feature-based attention operates in parallel across the scene, and only in ongoing processing, a selection by the location takes place.