<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>neural.vision</title>
    <description>This is a blog about vision: visual neuroscience and computer vision, especially deep convolutional neural networks.
</description>
    <link>https://neural.vision/</link>
    <atom:link href="https://neural.vision/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Fri, 22 Nov 2024 12:27:26 -0500</pubDate>
    <lastBuildDate>Fri, 22 Nov 2024 12:27:26 -0500</lastBuildDate>
    <generator>Jekyll v3.0.2</generator>
    
      <item>
        <title>Neuromorphic Computing - an Edgier, Greener AI</title>
        <description>&lt;figure&gt;
&lt;img src=&quot;/images/neuromorphic-greener-edgier-ai.png&quot;
title=&quot;Server towers connected to floating brains with windmills and solar panels in the background.&quot;
alt=&quot;Neuromorphic AI might not just help bring AI to the edge, but also reduce carbon emissions. Generated by author with ImageGen 3.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Neuromorphic AI might not just help bring
AI to the edge, but also reduce carbon emissions. Generated by author
with ImageGen 3.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;There are periodic proclamations of the coming neuromorphic computing
revolution. While there remain substantial challenges in the field,
there are solid successes that have come out of the field and there
continues to be steady progress in using spiking neural network
algorithms with neuromorphic hardware. In this article, I will cover
some neuromorphic computing and engineering basics, training, the
advantages of neuromorphic systems, and the remaining challenges.&lt;/p&gt;
&lt;p&gt;The classical use case of neuromorphic systems is for edge devices
that need to perform the computation locally and are energy-limited, for
example, battery-powered devices. However, one of the recent interests
in using neuromorphic systems is to reduce energy usage at data centers,
such as the energy needed by large language models (LLMs). For example,
OpenAI signed a letter of intent to purchase $51 million of neuromorphic
chips from Rain AI in December 2023. This makes sense since OpenAI
spends a lot on inference, with one estimate of around &lt;a
href=&quot;https://www.deeplearning.ai/the-batch/openai-faces-financial-growing-pains-spending-double-its-revenue/&quot;&gt;$4
billion&lt;/a&gt; on running inference in 2024. It also appears that both
Intel’s Loihi 2 and IBM’s NorthPole (successor to TrueNorth)
neuromorphic systems are designed for use in servers.&lt;/p&gt;
&lt;p&gt;The promises of neuromorphic computing can broadly be divided into 1)
pragmatic, near-term successes that have already found successes and 2)
more aspirational, wacky neuroscientist fever-dream ideas of how spiking
dynamics might endow neural networks with something closer to real
intelligence. Of course, it’s group 2 that really excites me, but I’m
going to focus on group 1 for this post. And there is no more exciting
way to start than to dive into terminology.&lt;/p&gt;
&lt;h1 id=&quot;terminology&quot;&gt;Terminology&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Neuromorphic computation&lt;/strong&gt; is often defined as
computation that is brain-inspired, but that definition leaves a lot to
the imagination. Neural networks are more neuromorphic than classical
computation, but these days neuromorphic computation is specifically
interested in using event-based spiking neural networks (SNNs) for their
energy efficiency. Even though SNNs are a type of artificial neural
network, the term “artificial neural networks” (ANNs) is reserved for
the more standard non-spiking artificial neural networks in the
neuromorphic literature. Schuman and colleagues (2022) define
neuromorphic computers as non-von Neuman computers where both processing
and memory are collocated in artificial neurons and synapses, as opposed
to von Neuman computers that separate processing and memory.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;/images/neuromorphic-vs-vonneumann.png&quot;
title=&quot;Diagram comparing von Neumann computers with Neuromorphic computers.&quot;
alt=&quot;von Neumann Computers operate on digital information, have separate processors and memory, and are synchronized by clocks, while neuromorphic computers operate on event-driven spikes, combine compute and memory, and are asynchronous. Created by author with inspiration from Schuman et al 2022.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;von Neumann Computers operate on digital
information, have separate processors and memory, and are synchronized
by clocks, while neuromorphic computers operate on event-driven spikes,
combine compute and memory, and are asynchronous. Created by author with
inspiration from Schuman et al 2022.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;Neuromorphic engineering&lt;/strong&gt; means designing the
hardware while “neuromorphic computation” is focused on what is being
simulated rather than what it is being simulated on. These are tightly
intertwined since the computation is dependent on the properties of the
hardware and what is implemented in hardware depends on what is
empirically found to work best.&lt;/p&gt;
&lt;p&gt;Another related term is &lt;strong&gt;NeuroAI&lt;/strong&gt;, the goal of which
is to use AI to gain a mechanistic understanding of the brain and is
more interested in biological realism. Neuromorphic computation is
interested in neuroscience as a means to an end. It views the brain as a
source of ideas that can be used to achieve objectives such as energy
efficiency and low latency in neural architectures. A decent amount of
the NeuroAI research relies on spike averages rather than spiking neural
networks, which allows closer comparison of the majority of modern ANNs
that are applied to discrete tasks.&lt;/p&gt;
&lt;h1 id=&quot;event-driven-systems&quot;&gt;Event-Driven Systems&lt;/h1&gt;
&lt;figure&gt;
&lt;img src=&quot;/images/Event-Eyeball-Camera.png&quot;
title=&quot;A distractingly disturbing picture of an eyeball looking out of a camera lens.&quot;
alt=&quot;Generated by author using ImageGen 3.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Generated by author using ImageGen
3.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Neuromorphic systems are event-based, which is a paradigm shift from
how modern ANN systems work. Even real-time ANN systems typically
process one frame at a time, with activity synchronously propagated from
one layer to the next. This means that in ANNs, neurons that carry no
information require the same processing as neurons that carry critical
information. Event-driven is a different paradigm that often starts at
the sensor and applies the most work where information needs to be
processed. ANNs rely on matrix operations that take the same amount of
time and energy regardless of the values in the matrices. Neuromorphic
systems use SNNs where the amount of work depends on the number
of spikes.&lt;/p&gt;
&lt;p&gt;A traditional deployed ANN would often be connected to a camera that
synchronously records a frame in a single exposure. The ANN then
processes the frame. The results of the frame might then be fed into a
tracking algorithm and further processed.&lt;/p&gt;
&lt;p&gt;Event-driven systems may start at the sensor with an event camera.
Each pixel sends updates asynchronously whenever a change crosses a
threshold. So when there is movement in a scene that is otherwise
stationary, the pixels that correspond to the movement send events or
spikes immediately without waiting for a synchronization signal. The
event signals can be sent within tens of microseconds, while a
traditional camera might collect at 24 Hz and could introduce a latency
that’s in the range of tens of milliseconds. In addition to receiving
the information sooner, the information in the event-based system would
be sparser and would focus on the movement. The traditional system would
have to process the entire scene through each network layer
successively.&lt;/p&gt;
&lt;h1 id=&quot;learning-in-spiking-neural-networks&quot;&gt;Learning in Spiking Neural
Networks&lt;/h1&gt;
&lt;figure&gt;
&lt;img src=&quot;/images/Training-Student-Neurons.png&quot;
title=&quot;An image of a teacher teaching a class of neurons the difference between cats and dogs.&quot;
alt=&quot;One way to teach a spiking neural network is to have a teacher [ANN]. Generated by author with ImageGen 3.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;One way to teach a spiking neural network
is to have a teacher [ANN]. Generated by author with ImageGen
3.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;One of the major challenges of SNNs is training them. Backpropagation
algorithms and stochastic gradient descent are the go-to solutions for
training ANNs, however, these methods run into difficulty with SNNs. The
best way to train SNNs is not yet established and the following methods
are some of the more common approaches that are used:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;ANN to SNN conversion&lt;/li&gt;
&lt;li&gt;Backpropagation-like&lt;/li&gt;
&lt;li&gt;Synaptic plasticity&lt;/li&gt;
&lt;li&gt;Evolutionary&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;ann-to-snn-conversion&quot;&gt;ANN to SNN conversion&lt;/h2&gt;
&lt;p&gt;One method of creating SNNs is to bypass training the SNNs directly
and instead train ANNs. This approach limits the types of SNNs and
hardware that can be used. For example, Sengupta et al (2019) converted
VGG and ResNets to ANNs using an integrate-and-fire (IF) neuron that
does not have a leaking or refractory period. They introduce a novel
weight-normalization technique to perform the conversion, which involves
setting the firing threshold of each neuron based on its pre-synaptic
weights. Dr. Priyadarshini Panda goes into more detail in her &lt;a
href=&quot;https://youtu.be/7TybETlCslM?t=3077&amp;amp;si=gK1efoiOx6SVpYfU&quot;&gt;ESWEEK
2021 SNN Talk&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Benefits:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Enables deep SNNs.&lt;/li&gt;
&lt;li&gt;Allows reuse of deep ANN knowledge, such as training, architecture,
etc.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Drawbacks:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Limits architectures to those suited to ANNs and the conversion
procedures.&lt;/li&gt;
&lt;li&gt;Network doesn’t learn to take advantage of SNN properties, which can
lead to lower accuracy and longer latency.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2
id=&quot;backpropagation-like-approaches-and-surrogate-gradient-descent&quot;&gt;Backpropagation-like
approaches and surrogate gradient descent&lt;/h2&gt;
&lt;p&gt;The most common methods currently used to train SNNs are
backpropagation-like approaches. Standard backpropagation does not work
to train SNNs because 1) the spiking threshold function’s gradient is
nonzero except at the threshold where it is undefined and 2) the credit
assignment problem needs to be solved in the temporal dimension in
addition spatial (or color etc).&lt;/p&gt;
&lt;p&gt;In ANNs, the most common activation function is the ReLU. For SNNs,
the neuron will fire if the membrane potential is above some threshold,
otherwise, it will not fire. This is called a Heaviside function. You
could use a sigmoid function instead, but it is not a spiking neural
network. The solution of using surrogate gradients is to use the
standard threshold function in the forward pass, but then use the
derivative from a “smoothed” version of the Heaviside function, such as
the sigmoid function, in the backward pass (Neftci et al. 2019, Bohte,
2011).&lt;/p&gt;
&lt;p&gt;Advantages:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Connects to well-known methods.&lt;/li&gt;
&lt;li&gt;Compared to conversion, can result in a more energy efficient
network (Li et al 2022)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Disadvantages:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Can be computationally intensive to solve both spatially and through
time&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;synaptic-plasticity&quot;&gt;Synaptic Plasticity&lt;/h2&gt;
&lt;p&gt;Spike-timing-dependent plasticity (STDP) is the most well-known form
of synaptic plasticity. In most cases, STDP increases the strength of a
synapse when a presynaptic (input) spike comes immediately before the
postsynaptic spike. Early models have shown promise with STDP on simple
unsupervised tasks, although getting it to work well for more complex
models and tasks has proven more difficult.&lt;/p&gt;
&lt;p&gt;Other biological learning mechanisms include the pruning and creation
of both neurons and synapses, homeostatic plasticity, neuromodulators,
astrocytes, and evolution. There is even some recent evidence that some
primitive types of knowledge can be passed down by epigenetics.&lt;/p&gt;
&lt;p&gt;Advantages:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Unsupervised&lt;/li&gt;
&lt;li&gt;Can take advantage of temporal properties&lt;/li&gt;
&lt;li&gt;Biologically inspired&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Disadvantages:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Synaptic plasticity is not well understood, especially at different
timescales&lt;/li&gt;
&lt;li&gt;Difficult to get to work with non-trivial networks&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;evolutionary-optimization&quot;&gt;Evolutionary Optimization&lt;/h2&gt;
&lt;p&gt;Evolutionary optimization is another approach that has some cool
applications that works well with small networks. Dr. Catherine Schuman
is a leading expert and she gave a fascinating talk on neuromorphic
computing to the ICS lab that is available on YouTube.&lt;/p&gt;
&lt;p&gt;Advantages:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Applicable to many tasks, architectures, and devices.&lt;/li&gt;
&lt;li&gt;Can learn topology and parameters (requiring less knowledge of the
problem).&lt;/li&gt;
&lt;li&gt;Learns small networks which results in lower latency.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Disadvantages:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Not effective for problems that require deep or large
architectures.&lt;/li&gt;
&lt;/ol&gt;
&lt;h1 id=&quot;advantages-of-neuromorphic-systems&quot;&gt;Advantages of Neuromorphic
Systems&lt;/h1&gt;
&lt;h2 id=&quot;energy-efficiency&quot;&gt;Energy Efficiency&lt;/h2&gt;
&lt;p&gt;Neuromorphic systems have two main advantages: 1) energy efficiency
and 2) low latency. There are a lot of reasons to be excited about the
energy efficiency. For example, Intel &lt;a
href=&quot;https://www.intel.com/content/www/us/en/newsroom/news/intel-builds-worlds-largest-neuromorphic-system.html#gs.gq485y&quot;&gt;claimed&lt;/a&gt;
that their Loihi 2 Neural Processing Unit (NPU) can use 100 times less
energy while being as much as 50 times faster than conventional ANNs.
Chris Eliasmith compared the energy efficiency of an SNN on neuromorphic
hardware with an ANN with the same architecture on standard hardware in
&lt;a href=&quot;https://www.youtube.com/watch?v=PeW-TN3P1hk&amp;amp;t=1308s&quot;&gt;a
presentation available on YouTube&lt;/a&gt;. He found that the SNN is 100
times more energy efficient on Loihi compared to the ANN on a standard
NVIDIA GPU and 20 times more efficient than the ANN on an NVIDIA Jetson
GPU. It is 5-7 times more energy efficient than the Intel Neural Compute
Stick (NCS) and NCS 2. At the same time the SNN achieves a 93.8%
accuracy compared to the 92.7% accuracy of the ANN.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;/images/Eliasmith-traditional-vs-neuromorphic-energy.png&quot;
title=&quot;Barplot comparing ANNs running on traditional GPUs and CPUs with an SNN running on an Intel Loihi.&quot;
alt=&quot;Figure recreated by author from Chris Eliasmith’s slides at https://www.youtube.com/watch?v=PeW-TN3P1hk&amp;amp;t=1308s which shows the neuromorphic processor being 5-100x more efficient while achieving a similar accuracy.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Figure recreated by author from Chris
Eliasmith’s slides at
https://www.youtube.com/watch?v=PeW-TN3P1hk&amp;amp;t=1308s which shows the
neuromorphic processor being 5-100x more efficient while achieving a
similar accuracy.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Neuromorphic chips are more energy efficient and allow complex deep
learning models to be deployed on low-energy edge devices. In October
2024, BrainChip introduced the Akida Pico NPU which uses less than 1 mW
of power, and Intel Loihi 2 NPU uses 1 W. That’s a lot less power than
NVIDIA Jetson modules that use between 10-50 watts which is often used
for embedded ANNs and server GPUs can use around 100 watts.&lt;/p&gt;
&lt;p&gt;Comparing the energy efficiency between ANNs and SNNs are difficult
because: 1. energy efficiency is dependent on hardware, 2. SNNs and ANNs
can use different architectures, and 3. they are suited to different
problems. Additionally, the energy used by SNNs scales with the number
of spikes, so the number of spikes needs to be minimized to achieve the
best energy efficiency.&lt;/p&gt;
&lt;p&gt;Theoretical analysis is often used to estimate the energy needed by
SNNs and ANNs, however, this doesn’t take into account all of the
differences between the CPUs and GPUs used for ANNs and the neuromorphic
chips for SNNs.&lt;/p&gt;
&lt;p&gt;Looking into nature can give us an idea of what might be possible in
the future and Mike Davies provided a great anecdote in an Intel &lt;a
href=&quot;https://www.youtube.com/watch?v=6Dcs6fQglRA&quot;&gt;Architecture All
Access YouTube video&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Consider the capabilities of a tiny cockatiel parrot brain, a
two-gram brain running on about 50 mW of power. This brain enables the
cockatiel to fly at speeds up to 20 mph, to navigate unknown
environments while foraging for food, and event to learn to manipulate
objects as tools and utter human words.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In current neural networks, there is a lot of wasted computation. For
example, an image encoder takes the same amount of time encoding a blank
page as a cluttered page in a “Where’s Waldo?” book. In spiking neural
networks, very few units would activate on a blank page and very little
computation would be used, while a page containing a lot of features
would fire a lot more units and use a lot more computation. In real
life, there are often regions in the visual field that contain more
features and require more processing than other regions that contain
fewer features, like a clear sky. In either case, SNNs only perform work
when work needs to be performed, whereas ANNs depend on matrix
multiplications that are difficult to use sparsely.&lt;/p&gt;
&lt;p&gt;This in itself is exciting. A lot of deep learning currently involves
uploading massive amounts of audio or video to the cloud, where the data
is processed in massive data centers, spending a lot of energy on the
computation and cooling the computational devices, and then the results
are returned. With edge computing, you can have more secure and more
responsive voice recognition or video recognition, that you can keep on
your local device, with orders of magnitude less energy consumption.&lt;/p&gt;
&lt;h2 id=&quot;low-latency&quot;&gt;Low Latency&lt;/h2&gt;
&lt;p&gt;When a pixel receptor of an event camera changes by some threshold,
it can send an event or spike within microseconds. It doesn’t need to
wait for a shutter or synchronization signal to be sent. This benefit is
seen throughout the event-based architecture of SNNs. Units can send
events immediately, rather than waiting for a synchronization signal.
This makes neuromorphic computers much faster, in terms of latency, than
ANNs. Hence, neuromorphic processing is better than ANNs for real-time
applications that can benefit from low latency. This benefit is reduced
if the problem allows for batching and you are measuring speed by
throughput since ANNs can take advantage of batching more easily.
However, in real-time processing, such as robotics or user interfacing,
latency is more important.&lt;/p&gt;
&lt;h1 id=&quot;disadvantages-and-challenges&quot;&gt;Disadvantages and Challenges&lt;/h1&gt;
&lt;h2 id=&quot;everything-everywhere-all-at-once&quot;&gt;Everything Everywhere All at
Once&lt;/h2&gt;
&lt;p&gt;One of the challenges is that neuromorphic computing and engineering
are progressing at multiple levels at the same time. The details of the
models depend on the hardware implementation and empirical results with
actualized models guide the development of the hardware. Intel
discovered this with their Loihi 1 chips and built more flexibility into
their Loihi 2 chips, however, there will always be tradeoffs and there
are still many advances to be made on both the hardware and software
side.&lt;/p&gt;
&lt;h2 id=&quot;limited-availability-of-commercial-hardware&quot;&gt;Limited
Availability of Commercial Hardware&lt;/h2&gt;
&lt;p&gt;Hopefully, this will change soon, but commercial hardware isn’t very
available. BrainChip’s Akida was the first neuromorphic chip to be
commercially available, although &lt;a
href=&quot;https://open-neuromorphic.org/neuromorphic-computing/hardware/akida-brainchip/#neurons-and-synapses&quot;&gt;apparently,
it does not even support&lt;/a&gt; the standard leaky-integrate and fire (LIF)
neuron. SpiNNaker boards used to be for sale, which was part of the EU
Human Brain Project but are &lt;a
href=&quot;https://apt.cs.manchester.ac.uk/projects/SpiNNaker/&quot;&gt;no longer
available&lt;/a&gt;. Intel makes Loihi 2 chips available to some academic
researchers via the &lt;a
href=&quot;https://intel-ncl.atlassian.net/wiki/spaces/INRC/pages/1784807425/Join+the+INRC&quot;&gt;Intel
Neuromorphic Research Community (INRC)&lt;/a&gt;program.&lt;/p&gt;
&lt;h2 id=&quot;datasets&quot;&gt;Datasets&lt;/h2&gt;
&lt;p&gt;The number of neuromorphic datasets is much less than traditional
datasets and can be much larger. Some of the common smaller computer
vision datasets, such as MNIST (NMNIST, Li et al 2017) and CIFAR-10
(CIFAR10-DVS, Orchard et al 2015), have been converted to event streams
by displaying the images and recording them using event-based cameras.
The images are collected with movement (or “saccades”) to increase the
number of spikes for processing. With larger datasets, such as
ES-ImageNet (Lin et al 2021), simulation of event cameras has been
used.&lt;/p&gt;
&lt;p&gt;The dataset derived from static images might be useful in comparing
SNNs with conventional ANNs and might be useful as part of the training
or evaluation pipeline, however, SNNs are naturally temporal, and using
them for static inputs does not make a lot of sense if you want to take
advantage of SNNs temporal properties. Some of the datasets that take
advantage of these properties of SNNs include:&lt;/p&gt;
&lt;p&gt;- DvsGesture (Amir et al. 2017) - a dataset of people performing a
set of 11 hand and arm gestures - Bullying10K (Dong et al. 2024) - a
privacy-preserving dataset for bullying recognition&lt;/p&gt;
&lt;p&gt;Synthetic data can be generated from standard visible camera data
without the use of expensive event camera data collections, however
these won’t exhibit the high dynamic range and frame rate that event
cameras would capture.&lt;/p&gt;
&lt;p&gt;Tonic is an example python library that makes it easy to access at
least some of these event-based datasets. The datasets themselves can
take up a lot more space than traditional datasets. For example, the
training images for MNIST is around 10 MB, while in N-MNIST, it is
almost 1 GB.&lt;/p&gt;
&lt;p&gt;Another thing to take into account is that visualizing the datasets
can be difficult. Even the datasets derived from static images can be
difficult to match with the original input images. Also, the benefit of
using real data is typically to avoid a gap between training and
inference, so it would seem that the benefit of using these datasets
would depend on their similarity to the cameras used during deployment
or testing.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;figure&gt;
&lt;img src=&quot;/images/Neuromorphic-Computers-are-the-Wave-of-the-Future.png&quot;
title=&quot;A retro computer waving with the text &amp;quot;Neuromorphic Computers are the Wave of the Future!&amp;quot;&quot;
alt=&quot;Neuromorphic Computers are the Wave of the Future! Created by author with ImageFx and GIMP.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Neuromorphic Computers are the Wave of
the Future! Created by author with ImageFx and GIMP.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We are in an exciting time with neuromorphic computation. There are
still challenges for adoption, but there are proven cases where they are
more energy efficient, especially standard server GPUs while having
lower latency and similar accuracy as traditional ANNs. A lot of
companies, including Intel, IBM, Qualcomm, Analog Devices, Rain AI, and
BrainChip have been investing in neuromorphic systems. BrainChip is the
first company to make their neuromorphic chips commercially available
while both Intel and IBM are on the second generations of their research
chips (Loihi 2 and NorthPole respectively).&lt;/p&gt;
&lt;h1 id=&quot;references&quot;&gt;References&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Amir, A., Taba, B., Berg, D., Melano, T., McKinstry, J., Di Nolfo,
C., Nayak, T., Andreopoulos, A., Garreau, G., Mendoza, M., Kusnitz, J.,
Debole, M., Esser, S., Delbruck, T., Flickner, M., &amp;amp; Modha, D.
(2017). &lt;em&gt;A Low Power, Fully Event-Based Gesture Recognition
System&lt;/em&gt;. 7243–7252. &lt;a
href=&quot;https://openaccess.thecvf.com/content_cvpr_2017/html/Amir_A_Low_Power_CVPR_2017_paper.html&quot;&gt;https://openaccess.thecvf.com/content_cvpr_2017/html/Amir_A_Low_Power_CVPR_2017_paper.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Bohte, S. M. (2011). Error-Backpropagation in Networks of
Fractionally Predictive Spiking Neurons. In &lt;em&gt;Artificial Neural
Networks and Machine Learning&lt;/em&gt; &lt;a
href=&quot;https://doi.org/10.1007/978-3-642-21735-7_8&quot;&gt;https://doi.org/10.1007/978-3-642-21735-7_8&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dong, Y., Li, Y., Zhao, D., Shen, G., &amp;amp; Zeng, Y. (2023).
Bullying10K: A Large-Scale Neuromorphic Dataset towards
Privacy-Preserving Bullying Recognition. &lt;em&gt;Advances in Neural
Information Processing Systems&lt;/em&gt;, &lt;em&gt;36&lt;/em&gt;, 1923–1937.&lt;/li&gt;
&lt;li&gt;Li, C., Ma, L., &amp;amp; Furber, S. (2022). Quantization Framework for
Fast Spiking Neural Networks. &lt;em&gt;Frontiers in Neuroscience&lt;/em&gt;,
&lt;em&gt;16&lt;/em&gt;. &lt;a
href=&quot;https://doi.org/10.3389/fnins.2022.918793&quot;&gt;https://doi.org/10.3389/fnins.2022.918793&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Li, H., Liu, H., Ji, X., Li, G., &amp;amp; Shi, L. (2017). CIFAR10-DVS:
An Event-Stream Dataset for Object Classification. &lt;em&gt;Frontiers in
Neuroscience&lt;/em&gt;, &lt;em&gt;11&lt;/em&gt;. &lt;a
href=&quot;https://doi.org/10.3389/fnins.2017.00309&quot;&gt;https://doi.org/10.3389/fnins.2017.00309&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Lin, Y., Ding, W., Qiang, S., Deng, L., &amp;amp; Li, G. (2021).
ES-ImageNet: A Million Event-Stream Classification Dataset for Spiking
Neural Networks. &lt;em&gt;Frontiers in Neuroscience&lt;/em&gt;, &lt;em&gt;15&lt;/em&gt;.
[https://doi.org/10.3389/fnins.2021.726582](https://doi.org/10.3389/fnins.2021.726582&lt;/li&gt;
&lt;li&gt;Neftci, E. O., Mostafa, H., &amp;amp; Zenke, F. (2019). Surrogate
Gradient Learning in Spiking Neural Networks: Bringing the Power of
Gradient-Based Optimization to Spiking Neural Networks. &lt;em&gt;IEEE Signal
Processing Magazine&lt;/em&gt;. &lt;a
href=&quot;https://doi.org/10.1109/MSP.2019.2931595&quot;&gt;https://doi.org/10.1109/MSP.2019.2931595&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Orchard, G., Jayawant, A., Cohen, G. K., &amp;amp; Thakor, N. (2015).
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using
Saccades. &lt;em&gt;Frontiers in Neuroscience&lt;/em&gt;, &lt;em&gt;9&lt;/em&gt;. &lt;a
href=&quot;https://doi.org/10.3389/fnins.2015.00437&quot;&gt;https://doi.org/10.3389/fnins.2015.00437&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Schuman, C. D., Kulkarni, S. R., Parsa, M., Mitchell, J. P., Date,
P., &amp;amp; Kay, B. (2022). Opportunities for neuromorphic computing
algorithms and applications. &lt;em&gt;Nature Computational Science&lt;/em&gt;,
&lt;em&gt;2&lt;/em&gt;(1), 10–19. &lt;a
href=&quot;https://doi.org/10.1038/s43588-021-00184-y&quot;&gt;https://doi.org/10.1038/s43588-021-00184-y&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sengupta, A., Ye, Y., Wang, R., Liu, C., &amp;amp; Roy, K. (2019). Going
Deeper in Spiking Neural Networks: VGG and Residual Architectures.
&lt;em&gt;Frontiers in Neuroscience&lt;/em&gt;, &lt;em&gt;13&lt;/em&gt;. &lt;a
href=&quot;https://doi.org/10.3389/fnins.2019.00095&quot;&gt;https://doi.org/10.3389/fnins.2019.00095&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;resources&quot;&gt;Resources&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://open-neuromorphic.org&quot;&gt;Open Neuromorphic (ONM)
Collective&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Event-Based Vision Resources
(https://github.com/uzh-rpg/event-based_vision_resources) - Upcoming
workshops, papers, companies, neuromorphic systems, etc.&lt;/li&gt;
&lt;li&gt;Talks on Youtube
&lt;ul&gt;
&lt;li&gt;[[Neuromorphic Computing from the Computer Science Perspective video
from ICAS Lab with Dr Catherine Schuman]]&lt;/li&gt;
&lt;li&gt;[[Cosyne 2022 Tutorial on Spiking Neural Networks]]&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=7TybETlCslM&quot;&gt;ESWEEK 2021
Dr. Priyadarshini Panda’s SNN Talk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Intel Architecture All Access: Neuromorphic Computing &lt;a
href=&quot;https://www.youtube.com/watch?v=6Dcs6fQglRA&quot;&gt;Part 1&lt;/a&gt; and &lt;a
href=&quot;https://www.youtube.com/watch?v=XWds3FIVm0U&quot;&gt;Part 2&lt;/a&gt; by Mike
Davies.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=PeW-TN3P1hk&quot;&gt;Spiking Neural
Networks for More Efficient AI Algorithms Talk by Professor Chris
Eliasmith at University of Waterloo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Fri, 22 Nov 2024 00:00:00 -0500</pubDate>
        <link>https://neural.vision/blog/neuroai/Neuromorphic-Computing-Greener-Edgier-AI/</link>
        <guid isPermaLink="true">https://neural.vision/blog/neuroai/Neuromorphic-Computing-Greener-Edgier-AI/</guid>
        
        <category>neuroscience</category>
        
        <category>deep-learning</category>
        
        <category>neuromorphic</category>
        
        
        <category>NeuroAI</category>
        
      </item>
    
      <item>
        <title>CLIP, LLaVA, and the Brain - What the brain can teach us about visual processing</title>
        <description>&lt;p&gt;How do recent artificial neural networks, like the CLIP &lt;span
class=&quot;citation&quot; data-cites=&quot;CLIP2021&quot;&gt;(&lt;a href=&quot;#ref-CLIP2021&quot;
role=&quot;doc-biblioref&quot;&gt;Radford et al. 2021&lt;/a&gt;)&lt;/span&gt; and LLaVA &lt;span
class=&quot;citation&quot; data-cites=&quot;LLaVA2023&quot;&gt;(&lt;a href=&quot;#ref-LLaVA2023&quot;
role=&quot;doc-biblioref&quot;&gt;Liu et al. 2023&lt;/a&gt;)&lt;/span&gt; transformer networks,
compare to the brain? Is there similarity between the attention in these
networks to that in the brain? In this article I look at these
transformer architectures with an eye on the similarity and differences
with the mammalian brain and visual system.&lt;/p&gt;
&lt;p&gt;I come to the conclusion that the processing that vision
transformers, CLIP, and LLaVA perform is analogous to a type of
computation called pre-attentive visual processing. This processing is
done in the initial feedforward visual responses to a stimulus before
any recurrence. Although a lot can be accomplished in a feedforward way,
studies have shown that feedforward pre-attentive processing in the
brain does have difficulty with:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Distinguishing the identity or characteristics of similar types of
objects, especially when objects are close together or cluttered or the
objects are unnatural or artificial &lt;span class=&quot;citation&quot;
data-cites=&quot;vanrullenPowerFeedforwardSweep2007&quot;&gt;(&lt;a
href=&quot;#ref-vanrullenPowerFeedforwardSweep2007&quot;
role=&quot;doc-biblioref&quot;&gt;VanRullen 2007&lt;/a&gt;)&lt;/span&gt;.&lt;/li&gt;
&lt;li&gt;More complex tasks such as counting or maze or curve tracing
tasks.&lt;/li&gt;
&lt;li&gt;Perceiving objects that are more difficult to see, such as where it
is difficult to perceive the boundaries of the objects.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In contrast to the feed-forward only processing, one of the things
that really stands out with the brain the richness in the interaction of
areas, which I will discuss in more details in the next section.&lt;/p&gt;
&lt;h2 id=&quot;bidirectional-activity-in-the-brain&quot;&gt;Bidirectional Activity in
the Brain&lt;/h2&gt;
&lt;p&gt;In most current deep learning architectures, activity is propagated
in a single direction, for example, an image might be given as input to
a network and then propagated from layer to layer until you get to a
classification as the output.&lt;/p&gt;
&lt;figure id=&quot;fig:ffbb&quot;&gt;
&lt;img src=&quot;/images/Bidirectionality-in-the-Brain.png&quot;
alt=&quot;Figure 1: A simplified diagram showing some of the feed-forward and feedback connections in the Macaque brain. The areas that are earlier (or lower-level) are more white, while the areas that later or (higher-level) are more blue.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Figure 1: A simplified diagram showing
some of the feed-forward and feedback connections in the Macaque brain.
The areas that are earlier (or lower-level) are more white, while the
areas that later or (higher-level) are more blue.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The brain is much more interesting than these feedforward models. In
the visual system, a stimulus will propagate from lower to higher level
areas in a feedforward-like fashion, but then the higher level areas
will also influence the lower level areas as shown in Figure 1.&lt;/p&gt;
&lt;p&gt;Some of this feedback is the conscious top-down attention that allows
us to allocate more resources to objects and features of interest and
allows us disambiguate stimuli that is either complex or ambiguous.
Another part of this feedback is automatic and allows higher level areas
to infuse the lower level areas with information that could not be known
in just the feedforward manner.&lt;/p&gt;
&lt;p&gt;The conscious top-down attention is thought to support consciousness
of visual stimuli. Without conscious access to lower level areas that
encode borders and edges, we wouldn’t have as spatially precise
perception of borders. Tasks such as mentally tracing a curve or solving
a maze would become impossible.&lt;/p&gt;
&lt;p&gt;One example of the automatic unconscious feedback is border-ownership
which is seen in about half of the orientation-selective neurons in
visual area V2 &lt;span class=&quot;citation&quot;
data-cites=&quot;Zhou2000 Williford2013&quot;&gt;(&lt;a href=&quot;#ref-Zhou2000&quot;
role=&quot;doc-biblioref&quot;&gt;Zhou, Friedman, and von der Heydt 2000&lt;/a&gt;; &lt;a
href=&quot;#ref-Williford2013&quot; role=&quot;doc-biblioref&quot;&gt;Williford and von der
Heydt 2013&lt;/a&gt;)&lt;/span&gt;. These neurons will encode local information in
about 40 ms and, as early as 10 ms after this initial response, will
start to incorporate global context to resolve occlusions - holding the
information needed to know which object are creating borders by
occluding their backgrounds.&lt;/p&gt;
&lt;p&gt;Another example of this unconscious feedback was shown in &lt;span
class=&quot;citation&quot; data-cites=&quot;Poort2012&quot;&gt;Poort et al. (&lt;a
href=&quot;#ref-Poort2012&quot; role=&quot;doc-biblioref&quot;&gt;2012&lt;/a&gt;)&lt;/span&gt; using the
images like that in Figure 2. In the Macaque early visual cortex V1,
neurons will tend to initially (within 50-75 ms of stimulus
presentation) encode only the local features within their receptive
fields (e.g. green square). However, after around 75 ms, they will
receive feedback from the higher level areas and they will tend to have
a higher response when that texture belongs to a figure, such as this
texture defined figure above. This happens even when attention is drawn
away from the figure, however if the monkey is paying attention to the
figure the neurons will tend to respond even more.&lt;/p&gt;
&lt;figure id=&quot;fig:fdct&quot;&gt;
&lt;img src=&quot;/images/curve-tracing-texture-task.png&quot;
alt=&quot;Figure 2: Image from (Poort et al. 2012). Shapes that are defined only by texture, like the above, can be difficult to see in a pure “feed-forward” manner. The biological visual system is able to recognize shapes like these through the interaction of lower and higher level areas, including top-down attention and subconscious processes.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Figure 2: Image from &lt;span
class=&quot;citation&quot; data-cites=&quot;Poort2012&quot;&gt;(&lt;a href=&quot;#ref-Poort2012&quot;
role=&quot;doc-biblioref&quot;&gt;Poort et al. 2012&lt;/a&gt;)&lt;/span&gt;. Shapes that are
defined only by texture, like the above, can be difficult to see in a
pure “feed-forward” manner. The biological visual system is able to
recognize shapes like these through the interaction of lower and higher
level areas, including top-down attention and subconscious
processes.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;One way to look at this bidirectional interaction is that at any
given time, each neuron greedily uses all available predictive signals.
Even higher level areas can be informative.&lt;/p&gt;
&lt;h2 id=&quot;transformers&quot;&gt;Transformers&lt;/h2&gt;
&lt;p&gt;With all the talk about attention with the introduction of
transformers &lt;span class=&quot;citation&quot;
data-cites=&quot;vaswaniAttentionAllYou2017&quot;&gt;(&lt;a
href=&quot;#ref-vaswaniAttentionAllYou2017&quot; role=&quot;doc-biblioref&quot;&gt;Vaswani et
al. 2017&lt;/a&gt;)&lt;/span&gt; and with the ability to generate sentences one word
at a time, you might be led to believe that transformers have
recurrence. However, there is no “state” that is kept between the steps
of the transformer, except for the previous output. So at best the
recurrence is very limited and there is no bidirectionality that is
ubiquitous in the brain. Transformers do allow for multi-headed
attention, which could be interpreted as being able to attend to
multiple things simultaneously. In the original paper, the transformer
used 8 attention heads. Image transformers can be seen as analogous to
pre-attentive feedforward processing with some modifications, like with
the multiple attention heads.&lt;/p&gt;
&lt;h2 id=&quot;clip&quot;&gt;CLIP&lt;/h2&gt;
&lt;figure id=&quot;fig:clip-training&quot;&gt;
&lt;img src=&quot;/images/clip-training.png&quot;
alt=&quot;Figure 3: Image from Radford et al. (2021) depicting how CLIP is trained. I_1 and T_1 are the encodings of image 1 and the corresponding caption. A contrastive learning loss is used to make the I_i and T_j more similar when i=j and more dissimilar when i≠j. Weights are trained from scratch.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Figure 3: Image from &lt;span
class=&quot;citation&quot; data-cites=&quot;CLIP2021&quot;&gt;Radford et al. (&lt;a
href=&quot;#ref-CLIP2021&quot; role=&quot;doc-biblioref&quot;&gt;2021&lt;/a&gt;)&lt;/span&gt; depicting how
CLIP is trained. &lt;span class=&quot;math inline&quot;&gt;\(I_1\)&lt;/span&gt; and &lt;span
class=&quot;math inline&quot;&gt;\(T_1\)&lt;/span&gt; are the encodings of image 1 and the
corresponding caption. A contrastive learning loss is used to make the
&lt;span class=&quot;math inline&quot;&gt;\(I_i\)&lt;/span&gt; and &lt;span
class=&quot;math inline&quot;&gt;\(T_j\)&lt;/span&gt; more similar when &lt;span
class=&quot;math inline&quot;&gt;\(i=j\)&lt;/span&gt; and more dissimilar when &lt;span
class=&quot;math inline&quot;&gt;\(i≠j\)&lt;/span&gt;. Weights are trained from
scratch.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;CLIP was introduced by OpenAI in the &lt;span class=&quot;citation&quot;
data-cites=&quot;CLIP2021&quot;&gt;Radford et al. (&lt;a href=&quot;#ref-CLIP2021&quot;
role=&quot;doc-biblioref&quot;&gt;2021&lt;/a&gt;)&lt;/span&gt; paper “Learning Transferable
Visual Models from Natural Language Supervision”. The idea behind CLIP
is pretty simple and is shown in Figure 3. It takes a bunch of image and
caption pairs from the Internet, feeds the image to an image encoder or
and the text to a text encoder. It then uses a loss that brings the
encoding of the image and the encoding of the text closer together when
they are in the same pair, otherwise the loss increases the distance of
the encodings. This is what CLIP gives you: the ability to compare the
similarity between text and images. One way this can be used is for
zero-shot classification, as shown in Figure 4. CLIP does not, by
itself, generate text descriptions from images.&lt;/p&gt;
&lt;p&gt;The image encoder and text encoder are independent, meaning that
there is no way for task-driven modulation to influence the image
encoding. This means that the image encoder has to encode everything
that could be potentially relevant to the task. Typically the resolution
of the input image is pretty small, which helps prevent the computation
and memory requirements from exploding.&lt;/p&gt;
&lt;figure id=&quot;fig:clip-zero-shot&quot;&gt;
&lt;img src=&quot;/images/clip-zero-shot-prediction.png&quot;
alt=&quot;Figure 4: Image from Radford et al. (2021) depicting how CLIP can be used for zero-shot classification. Text encodings are generated for each class T_1\ldots T_N. The image is then encoded and the similarity is measured with the generated text encodings. The most similar text encoding is the chosen class.&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Figure 4: Image from &lt;span
class=&quot;citation&quot; data-cites=&quot;CLIP2021&quot;&gt;Radford et al. (&lt;a
href=&quot;#ref-CLIP2021&quot; role=&quot;doc-biblioref&quot;&gt;2021&lt;/a&gt;)&lt;/span&gt; depicting how
CLIP can be used for zero-shot classification. Text encodings are
generated for each class &lt;span class=&quot;math inline&quot;&gt;\(T_1\ldots
T_N\)&lt;/span&gt;. The image is then encoded and the similarity is measured
with the generated text encodings. The most similar text encoding is the
chosen class.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;llava&quot;&gt;LLaVA&lt;/h2&gt;
&lt;figure id=&quot;fig:llava&quot;&gt;
&lt;img src=&quot;/images/LLaVA-Architecture.png&quot;
alt=&quot;Figure 5: LLaVA architecture from Liu et al. (2023). \mathrm X_v: image, \mathrm X_c : caption, \mathrm X_q : question derived from \mathrm X_c using GPT4&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Figure 5: LLaVA architecture from &lt;span
class=&quot;citation&quot; data-cites=&quot;LLaVA2023&quot;&gt;Liu et al. (&lt;a
href=&quot;#ref-LLaVA2023&quot; role=&quot;doc-biblioref&quot;&gt;2023&lt;/a&gt;)&lt;/span&gt;. &lt;span
class=&quot;math inline&quot;&gt;\(\mathrm X_v\)&lt;/span&gt;: image, &lt;span
class=&quot;math inline&quot;&gt;\(\mathrm X_c\)&lt;/span&gt; : caption, &lt;span
class=&quot;math inline&quot;&gt;\(\mathrm X_q\)&lt;/span&gt; : question derived from &lt;span
class=&quot;math inline&quot;&gt;\(\mathrm X_c\)&lt;/span&gt; using GPT4&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Large Language and Vision Assistant (LLaVA) &lt;span class=&quot;citation&quot;
data-cites=&quot;LLaVA2023&quot;&gt;(&lt;a href=&quot;#ref-LLaVA2023&quot;
role=&quot;doc-biblioref&quot;&gt;Liu et al. 2023&lt;/a&gt;)&lt;/span&gt; is a large language and
vision architecture that extends and builds onto CLIP to add the ability
to describe and answer questions about images. This type of architecture
is interesting to me because it can attempt tasks that are similar to
those used in Neuroscience and Psychology.&lt;/p&gt;
&lt;p&gt;LLaVA takes the vision transformer model ViT-L/14 that is trained by
CLIP for image encoding Figure 5. To convert the encodings into tokens,
the first paper uses a single linear projection matrix &lt;span
class=&quot;math inline&quot;&gt;\(W\)&lt;/span&gt; for this transformation. The tokens
calculated from the images &lt;span class=&quot;math inline&quot;&gt;\(H_v\)&lt;/span&gt; and
the tokens from the text instructions &lt;span
class=&quot;math inline&quot;&gt;\(H_q\)&lt;/span&gt; are provided as input. LLaVA can then
generate the language response &lt;span class=&quot;math inline&quot;&gt;\(X_a\)&lt;/span&gt;
one token at a time, each time appending the response so far as the
input to the next iteration.&lt;/p&gt;
&lt;p&gt;I won’t go into the details of how LLaVA is trained, but it is
interesting how they use ChatGPT to expand the caption (&lt;span
class=&quot;math inline&quot;&gt;\(\mathrm X_c\)&lt;/span&gt; in Figure 5) to form
instructions (&lt;span class=&quot;math inline&quot;&gt;\(\mathrm H_q\)&lt;/span&gt;) and
responses (used to train &lt;span class=&quot;math inline&quot;&gt;\(\mathrm
X_a\)&lt;/span&gt;) about an image and the use of bounding box
information.&lt;/p&gt;
&lt;p&gt;In version 1.5 of LLaVA &lt;span class=&quot;citation&quot;
data-cites=&quot;liuImprovedBaselinesVisual2024&quot;&gt;(&lt;a
href=&quot;#ref-liuImprovedBaselinesVisual2024&quot; role=&quot;doc-biblioref&quot;&gt;Liu et
al. 2024&lt;/a&gt;)&lt;/span&gt;, some of the improvements they made include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The linear projection matrix &lt;span class=&quot;math inline&quot;&gt;\(\mathrm
W\)&lt;/span&gt; is replaced with a multilayer perceptron&lt;/li&gt;
&lt;li&gt;The image resolution is increased by using an image encoder that
takes images of size 336x336 pixels and split the images into grids that
are encoded separately.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Task driven attention in the brain is able to dynamically allocate
resources to the object, location, or features of interest, which can
allow processing of information that could otherwise be overwhelmed by
clutter or other objects. In LLaVA, the image encoder is independent of
the text instructions, so to be successful it needs to make sure any
potentially useful information is stored in the image tokens (&lt;span
class=&quot;math inline&quot;&gt;\(\mathrm H_v\)&lt;/span&gt;).&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Since LLaVA and CLIP lack bidirectional processing, the processing
that they do is limited. This is especially true for image processing,
since image processing is done independent of the text instructions.
Most convolutional neural networks also shares these limitations. This
leads me to my conjecture:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Conjecture: Most convolutional, vision transformer, and multimodal
transformer networks is restricted to something pre-attentive
feedforward visual processing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is not necessarily a criticism as much as an insight that can be
informative. Feedforward processing can do a lot and is fast. However,
it is not as dynamic as to what resources can be used to be used, which
can lead to informational bottlenecks in cluttered scenes and is unable
to encode enough information for complex tasks without an explosion of
the size of the encodings.&lt;/p&gt;
&lt;p&gt;There are some networks that are not limited to pre-attentive
feedforward networks, but currently most of the architectures lag behind
those of transformers. These include, long-short term memory models
(LSTMs) and, more recently, the Mamba architecture which has several
benefits over transformers &lt;span class=&quot;citation&quot;
data-cites=&quot;guMamba2024&quot;&gt;(&lt;a href=&quot;#ref-guMamba2024&quot;
role=&quot;doc-biblioref&quot;&gt;Gu and Dao 2024&lt;/a&gt;)&lt;/span&gt;. Extended LSTMs &lt;span
class=&quot;citation&quot; data-cites=&quot;beckXLSTM2024 alkinVisionLSTM2024&quot;&gt;(&lt;a
href=&quot;#ref-beckXLSTM2024&quot; role=&quot;doc-biblioref&quot;&gt;Beck et al. 2024&lt;/a&gt;; &lt;a
href=&quot;#ref-alkinVisionLSTM2024&quot; role=&quot;doc-biblioref&quot;&gt;Alkin et al.
2024&lt;/a&gt;)&lt;/span&gt; have been proposed that help make up some of the ground
between transformers and LSTMs.&lt;/p&gt;
&lt;h1 class=&quot;unnumbered&quot; id=&quot;references&quot;&gt;References&lt;/h1&gt;
&lt;div id=&quot;refs&quot; class=&quot;references csl-bib-body hanging-indent&quot;
data-entry-spacing=&quot;0&quot; role=&quot;list&quot;&gt;
&lt;div id=&quot;ref-alkinVisionLSTM2024&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Alkin, Benedikt, Maximilian Beck, Korbinian Pöppel, Sepp Hochreiter, and
Johannes Brandstetter. 2024. &lt;span&gt;“Vision-&lt;span&gt;LSTM&lt;/span&gt;: &lt;span
class=&quot;nocase&quot;&gt;xLSTM&lt;/span&gt; as &lt;span&gt;Generic Vision
Backbone&lt;/span&gt;.”&lt;/span&gt; June 6, 2024. &lt;a
href=&quot;http://arxiv.org/abs/2406.04303&quot;&gt;http://arxiv.org/abs/2406.04303&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-beckXLSTM2024&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Beck, Maximilian, Korbinian Pöppel, Markus Spanring, Andreas Auer,
Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes
Brandstetter, and Sepp Hochreiter. 2024. &lt;span&gt;“&lt;span
class=&quot;nocase&quot;&gt;xLSTM&lt;/span&gt;: &lt;span&gt;Extended Long Short-Term
Memory&lt;/span&gt;.”&lt;/span&gt; May 7, 2024. &lt;a
href=&quot;http://arxiv.org/abs/2405.04517&quot;&gt;http://arxiv.org/abs/2405.04517&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-guMamba2024&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Gu, Albert, and Tri Dao. 2024. &lt;span&gt;“Mamba: &lt;span&gt;Linear-Time Sequence
Modeling&lt;/span&gt; with &lt;span&gt;Selective State Spaces&lt;/span&gt;.”&lt;/span&gt; May
31, 2024. &lt;a
href=&quot;http://arxiv.org/abs/2312.00752&quot;&gt;http://arxiv.org/abs/2312.00752&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-liuImprovedBaselinesVisual2024&quot; class=&quot;csl-entry&quot;
role=&quot;listitem&quot;&gt;
Liu, Haotian, Chunyuan Li, Yuheng Li, and Yong Jae Lee. 2024.
&lt;span&gt;“Improved Baselines with Visual Instruction Tuning.”&lt;/span&gt; In
&lt;em&gt;Proceedings of the &lt;span&gt;IEEE&lt;/span&gt;/&lt;span&gt;CVF Conference&lt;/span&gt; on
&lt;span&gt;Computer Vision&lt;/span&gt; and &lt;span&gt;Pattern Recognition&lt;/span&gt;&lt;/em&gt;,
26296–306. &lt;a
href=&quot;https://openaccess.thecvf.com/content/CVPR2024/html/Liu_Improved_Baselines_with_Visual_Instruction_Tuning_CVPR_2024_paper.html&quot;&gt;https://openaccess.thecvf.com/content/CVPR2024/html/Liu_Improved_Baselines_with_Visual_Instruction_Tuning_CVPR_2024_paper.html&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-LLaVA2023&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Liu, Haotian, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023.
&lt;span&gt;“Visual &lt;span&gt;Instruction Tuning&lt;/span&gt;.”&lt;/span&gt; December 11,
2023. &lt;a
href=&quot;https://doi.org/10.48550/arXiv.2304.08485&quot;&gt;https://doi.org/10.48550/arXiv.2304.08485&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-Poort2012&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Poort, Jasper, Florian Raudies, Aurel Wannig, Victor A F Lamme, Heiko
Neumann, and Pieter R Roelfsema. 2012. &lt;span&gt;“The Role of Attention in
Figure-Ground Segregation in Areas &lt;span&gt;V1&lt;/span&gt; and &lt;span&gt;V4&lt;/span&gt;
of the Visual Cortex.”&lt;/span&gt; &lt;em&gt;Neuron&lt;/em&gt; 75 (1): 143–56. &lt;a
href=&quot;https://doi.org/10.1016/j.neuron.2012.04.032&quot;&gt;https://doi.org/10.1016/j.neuron.2012.04.032&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-CLIP2021&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, and Jack
Clark. 2021. &lt;span&gt;“Learning Transferable Visual Models from Natural
Language Supervision.”&lt;/span&gt; In &lt;em&gt;International Conference on Machine
Learning&lt;/em&gt;, 8748–63. PMLR. &lt;a
href=&quot;http://proceedings.mlr.press/v139/radford21a&quot;&gt;http://proceedings.mlr.press/v139/radford21a&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-vanrullenPowerFeedforwardSweep2007&quot; class=&quot;csl-entry&quot;
role=&quot;listitem&quot;&gt;
VanRullen, Rufin. 2007. &lt;span&gt;“The Power of the Feed-Forward
Sweep.”&lt;/span&gt; &lt;em&gt;Advances in Cognitive Psychology&lt;/em&gt; 3 (1-2): 167.
&lt;a
href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2864977/&quot;&gt;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2864977/&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-vaswaniAttentionAllYou2017&quot; class=&quot;csl-entry&quot;
role=&quot;listitem&quot;&gt;
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion
Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017.
&lt;span&gt;“Attention Is All You Need.”&lt;/span&gt; &lt;em&gt;Advances in Neural
Information Processing Systems&lt;/em&gt; 30. &lt;a
href=&quot;https://proceedings.neurips.cc/paper/7181-attention-is-all&quot;&gt;https://proceedings.neurips.cc/paper/7181-attention-is-all&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-Williford2013&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Williford, Jonathan R., and Rudiger von der Heydt. 2013.
&lt;span&gt;“Border-Ownership Coding.”&lt;/span&gt; &lt;em&gt;Scholarpedia&lt;/em&gt; 8 (10):
30040. &lt;a
href=&quot;http://scholarpedia.org/article/Border-ownership_coding&quot;&gt;http://scholarpedia.org/article/Border-ownership_coding&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-Zhou2000&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Zhou, H., H. S. Friedman, and R. von der Heydt. 2000. &lt;span&gt;“Coding of
Border Ownership in Monkey Visual Cortex.”&lt;/span&gt; &lt;em&gt;The Journal of
Neuroscience&lt;/em&gt; 20 (17): 6594–6611.
&lt;/div&gt;
&lt;/div&gt;
</description>
        <pubDate>Wed, 19 Jun 2024 00:00:00 -0400</pubDate>
        <link>https://neural.vision/blog/neuroai/CLIP-LLaVA-and-the-Brain/</link>
        <guid isPermaLink="true">https://neural.vision/blog/neuroai/CLIP-LLaVA-and-the-Brain/</guid>
        
        <category>neuroscience</category>
        
        <category>deep-learning</category>
        
        <category>transformers</category>
        
        <category>CLIP</category>
        
        <category>LLaVA</category>
        
        <category>attention</category>
        
        <category>bidirectionality</category>
        
        <category>recurrence</category>
        
        
        <category>NeuroAI</category>
        
      </item>
    
      <item>
        <title>Review of Ioffe &amp; Szegedy 2015 *Batch normalization*</title>
        <description>&lt;p&gt;Normalization of training inputs has long been shown to increase the
speed of learning in networks. The paper &lt;span class=&quot;citation&quot;
data-cites=&quot;ioffe_batch_2015&quot;&gt;(&lt;a href=&quot;#ref-ioffe_batch_2015&quot;
role=&quot;doc-biblioref&quot;&gt;Ioffe and Szegedy 2015&lt;/a&gt;)&lt;/span&gt; introduces a
major improvement in deep learning, batch normalization (BN), which
extends this idea by normalizing the activity &lt;strong&gt;within&lt;/strong&gt;
the network, across mini-batches (batches of training examples).&lt;/p&gt;
&lt;p&gt;BN has been gaining a lot of traction in the academic literature, for
example being used to improve segmentation &lt;span class=&quot;citation&quot;
data-cites=&quot;hong_decoupled_2015&quot;&gt;(&lt;a href=&quot;#ref-hong_decoupled_2015&quot;
role=&quot;doc-biblioref&quot;&gt;Hong, Noh, and Han 2015&lt;/a&gt;)&lt;/span&gt; and variational
autoencoders &lt;span class=&quot;citation&quot; data-cites=&quot;sonderby_how_2016&quot;&gt;(&lt;a
href=&quot;#ref-sonderby_how_2016&quot; role=&quot;doc-biblioref&quot;&gt;Sønderby et al.
2016&lt;/a&gt;)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The authors state that adding BN allows a version of the Inception
image classification model to learn with 14 times fewer training steps,
when additional modifications are made in order to take advantage of BN.
One of the modifications is removing the Dropout layers, because BN acts
as a regularizer and actually eliminates the need for Dropout. It also
allows for the learning rate to be increased. It does all this while
actually adding a small number of parameters to be learned during
training. A non-batch version of BN may even have a biological homolog:
homeostatic plasticity.&lt;/p&gt;
&lt;p&gt;BN separates the learning of the overall distribution of the activity
of the neuron and the specific synaptic weights. For each “activation”
&lt;span class=&quot;math inline&quot;&gt;\(x^{(k)}\)&lt;/span&gt;, the parameters for the
mean and spread of the distribution of the activation is given by the
learned parameters &lt;span class=&quot;math inline&quot;&gt;\(\beta^{(k)}\)&lt;/span&gt; and
&lt;span class=&quot;math inline&quot;&gt;\(\gamma^{(k)}\)&lt;/span&gt; respectively.&lt;/p&gt;
&lt;h1 id=&quot;details&quot;&gt;Details&lt;/h1&gt;
&lt;p&gt;The original paper &lt;span class=&quot;citation&quot;
data-cites=&quot;ioffe_batch_2015&quot;&gt;(&lt;a href=&quot;#ref-ioffe_batch_2015&quot;
role=&quot;doc-biblioref&quot;&gt;Ioffe and Szegedy 2015&lt;/a&gt;)&lt;/span&gt; states that the
normalization should be done per activation &lt;span
class=&quot;math inline&quot;&gt;\(k\)&lt;/span&gt;. In the initial part of the paper the
definition of activation is left open. In their experiments, however,
they do the normalization across each feature map (across batches
&lt;strong&gt;and&lt;/strong&gt; locations, for a specific feature).&lt;/p&gt;
&lt;h2 id=&quot;batch-normalization-step&quot;&gt;Batch normalization step&lt;/h2&gt;
&lt;p&gt;For now, this section just regurgitates some of the basic information
from the original paper.&lt;/p&gt;
&lt;p&gt;Let &lt;span class=&quot;math inline&quot;&gt;\(x^{(k)}_i\)&lt;/span&gt; be a specific
activation &lt;span class=&quot;math inline&quot;&gt;\(k\)&lt;/span&gt; for a given input&lt;span
class=&quot;math inline&quot;&gt;\(i\)&lt;/span&gt;. Batch normalization then normalizes
this activation over all the inputs of the batch (mini-batch) of inputs
&lt;span class=&quot;math inline&quot;&gt;\(i \in \{ 1 \ldots m \}\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;BN normalizes the data to a Gaussian where the mean and variation of
the Gaussian is learned during training. This is done by first
normalizing the data to the standard Gaussian (&lt;span
class=&quot;math inline&quot;&gt;\(\mu=0\)&lt;/span&gt; and &lt;span
class=&quot;math inline&quot;&gt;\(\sigma=1\)&lt;/span&gt;), and then adding the offsets
&lt;span class=&quot;math inline&quot;&gt;\(\beta^{(k)}\)&lt;/span&gt; and scaling by &lt;span
class=&quot;math inline&quot;&gt;\(\gamma^{(k)}\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Let &lt;span class=&quot;math inline&quot;&gt;\(\mathcal B = \left\{ x_{1 \ldots
m}\right\}\)&lt;/span&gt; be a given batch.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math inline&quot;&gt;\(\mu ^{k}_ \mathcal B\)&lt;/span&gt; and ${(
^{k}_B )} ^ 2 $ be the mean and variance of a given activation, &lt;span
class=&quot;math inline&quot;&gt;\(k\)&lt;/span&gt;, across the batch of training
inputs.&lt;/p&gt;
&lt;p&gt;The normalization / whitening step is then:&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[
\hat x_i = \frac
{x_i - \mu_{\mathcal B}}
{\sqrt{\sigma_{\mathcal B}^2 + \epsilon}}.
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;And then there is the re-scaling and shifting step:&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[
y^{(k)}_i = \gamma^{(k)} \hat x_i + \beta^{(k)},
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;where &lt;span class=&quot;math inline&quot;&gt;\(\gamma^{(k)}\)&lt;/span&gt; and &lt;span
class=&quot;math inline&quot;&gt;\(\beta^{(k)}\)&lt;/span&gt;, once again, are learned
parameters.&lt;/p&gt;
&lt;h2 id=&quot;discussion-on-caffes-implementation&quot;&gt;Discussion on Caffe’s
Implementation&lt;/h2&gt;
&lt;p&gt;There is an interesting discussion on Caffe’s implementation in the
pull request (PR):&lt;/p&gt;
&lt;p&gt;https://github.com/BImplementationVLC/caffe/pull/3229&lt;/p&gt;
&lt;h1 id=&quot;modifying-models-for-bn&quot;&gt;Modifying models for BN&lt;/h1&gt;
&lt;p&gt;Adding BN by itself can speedup training. However, in order to fully
take advantage of the BN, additional steps need to be made. The authors
suggestions include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increase the learning rate (how much?).&lt;/li&gt;
&lt;li&gt;Remove Dropout.&lt;/li&gt;
&lt;li&gt;Reduce the L&lt;sub&gt;2&lt;/sub&gt; weight regularization by a factor of
5.&lt;/li&gt;
&lt;li&gt;Increase the learning rate decay by 6.&lt;/li&gt;
&lt;li&gt;Perform “within-shard” shuffling - although I don’t know what this
is.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 class=&quot;unnumbered&quot; id=&quot;references&quot;&gt;References&lt;/h1&gt;
&lt;div id=&quot;refs&quot; class=&quot;references csl-bib-body hanging-indent&quot;
data-entry-spacing=&quot;0&quot; role=&quot;list&quot;&gt;
&lt;div id=&quot;ref-hong_decoupled_2015&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Hong, Seunghoon, Hyeonwoo Noh, and Bohyung Han. 2015. &lt;span&gt;“Decoupled
&lt;span&gt;Deep&lt;/span&gt; &lt;span&gt;Neural&lt;/span&gt; &lt;span&gt;Network&lt;/span&gt; for
&lt;span&gt;Semi&lt;/span&gt;-Supervised &lt;span&gt;Semantic&lt;/span&gt;
&lt;span&gt;Segmentation&lt;/span&gt;.”&lt;/span&gt; In &lt;em&gt;Advances in
&lt;span&gt;Neural&lt;/span&gt; &lt;span&gt;Information&lt;/span&gt; &lt;span&gt;Processing&lt;/span&gt;
&lt;span&gt;Systems&lt;/span&gt; 28&lt;/em&gt;, edited by C. Cortes, N. D. Lawrence, D. D.
Lee, M. Sugiyama, and R. Garnett, 1495–1503. Curran Associates, Inc. &lt;a
href=&quot;http://papers.nips.cc/paper/5858-decoupled-deep-neural-network-for-semi-supervised-semantic-segmentation.pdf&quot;&gt;http://papers.nips.cc/paper/5858-decoupled-deep-neural-network-for-semi-supervised-semantic-segmentation.pdf&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-ioffe_batch_2015&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Ioffe, Sergey, and Christian Szegedy. 2015. &lt;span&gt;“Batch
&lt;span&gt;Normalization&lt;/span&gt;: &lt;span&gt;Accelerating&lt;/span&gt; &lt;span&gt;Deep&lt;/span&gt;
&lt;span&gt;Network&lt;/span&gt; &lt;span&gt;Training&lt;/span&gt; by &lt;span&gt;Reducing&lt;/span&gt;
&lt;span&gt;Internal&lt;/span&gt; &lt;span&gt;Covariate&lt;/span&gt; &lt;span&gt;Shift&lt;/span&gt;.”&lt;/span&gt;
&lt;em&gt;arXiv:1502.03167 [Cs]&lt;/em&gt;, February. &lt;a
href=&quot;http://arxiv.org/abs/1502.03167&quot;&gt;http://arxiv.org/abs/1502.03167&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-sonderby_how_2016&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Sønderby, Casper Kaae, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby,
and Ole Winther. 2016. &lt;span&gt;“How to &lt;span&gt;Train&lt;/span&gt;
&lt;span&gt;Deep&lt;/span&gt; &lt;span&gt;Variational&lt;/span&gt; &lt;span&gt;Autoencoders&lt;/span&gt; and
&lt;span&gt;Probabilistic&lt;/span&gt; &lt;span&gt;Ladder&lt;/span&gt;
&lt;span&gt;Networks&lt;/span&gt;.”&lt;/span&gt; &lt;em&gt;arXiv:1602.02282 [Cs, Stat]&lt;/em&gt;,
February. &lt;a
href=&quot;http://arxiv.org/abs/1602.02282&quot;&gt;http://arxiv.org/abs/1602.02282&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;
</description>
        <pubDate>Sat, 16 Apr 2016 00:00:00 -0400</pubDate>
        <link>https://neural.vision/blog/article-reviews/deep-learning/ioffe-batch-2015/</link>
        <guid isPermaLink="true">https://neural.vision/blog/article-reviews/deep-learning/ioffe-batch-2015/</guid>
        
        <category>batch-normalization</category>
        
        <category>deep-learning</category>
        
        
        <category>article-reviews</category>
        
        <category>deep-learning</category>
        
      </item>
    
      <item>
        <title>Review of Ichida, Schwabe, Bressloff, &amp; Angelucci (2007) &#39;Response Facilitation From the “Suppressive” Receptive Field Surround of Macaque V1 Neurons&#39;</title>
        <description>&lt;h1 id=&quot;overview&quot;&gt;Overview&lt;/h1&gt;
&lt;p&gt;The extraclassical surround (ECS) generally suppresses the firing
rate of visual neurons in the primary visual cortex (V1), especially
when the surround stimulus has the same orientation (iso-oriented).
However, it has been shown that the ECS can actually enhance the firing
rate when the stimulus has a low contrast. In &lt;span class=&quot;citation&quot;
data-cites=&quot;ichida_response_2007&quot;&gt;(&lt;a href=&quot;#ref-ichida_response_2007&quot;
role=&quot;doc-biblioref&quot;&gt;Ichida et al. 2007&lt;/a&gt;)&lt;/span&gt;, the authors test a
prediction from a model they have published &lt;span class=&quot;citation&quot;
data-cites=&quot;schwabe_role_2006&quot;&gt;(&lt;a href=&quot;#ref-schwabe_role_2006&quot;
role=&quot;doc-biblioref&quot;&gt;Schwabe et al. 2006&lt;/a&gt;)&lt;/span&gt;: that the far ECS,
and not just the immediate ECS, can enhance the response. They find that
the far ECS can indeed enhance the response, but only when the immediate
ECS does not contain the iso-oriented stimulus.&lt;/p&gt;
&lt;h1 id=&quot;methods&quot;&gt;Methods&lt;/h1&gt;
&lt;p&gt;The authors define the classical receptive field (CRF, called the
minimum response field in the paper) as the region that can be driven
using a small high contrast 0.1° grating. The size of the CRF depends on
the contrast of the stimulus.&lt;/p&gt;
&lt;p&gt;It is larger for low contrast stimuli than high contrast stimuli. The
authors define the immediate ECS as the area beyond the high contrast
CRF where a low contrast stimulus would increase the response.&lt;/p&gt;
&lt;p&gt;Instead of using the CRF, they used what they call the high-contrast
and low-contrast summation RF. They first found the CRF (minimum
response field) by using a 0.1° grating. They then used this to center a
high-contrast grating patch. They then varied the size of the patch and
found the size that optimally simulated the cell. They called this the
high contrast summation RF (SRF_high) or simply the RF center. They used
the same protocol with a low contrast grating to find the low-contrast
summation RF (SRF_low).&lt;/p&gt;
&lt;p&gt;They called the region between SRF_high and SRF_low the near
surround. The far surround’s outer diameter was set to 14°. The inner
diameter varied but was no smaller than the SRF_low.&lt;/p&gt;
&lt;h1 class=&quot;unnumbered&quot; id=&quot;references&quot;&gt;References&lt;/h1&gt;
&lt;div id=&quot;refs&quot; class=&quot;references csl-bib-body hanging-indent&quot;
data-entry-spacing=&quot;0&quot; role=&quot;list&quot;&gt;
&lt;div id=&quot;ref-ichida_response_2007&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Ichida, Jennifer M., Lars Schwabe, Paul C. Bressloff, and Alessandra
Angelucci. 2007. &lt;span&gt;“Response &lt;span&gt;Facilitation&lt;/span&gt;
&lt;span&gt;From&lt;/span&gt; the &lt;span&gt;‘&lt;span&gt;Suppressive&lt;/span&gt;’&lt;/span&gt;
&lt;span&gt;Receptive&lt;/span&gt; &lt;span&gt;Field&lt;/span&gt; &lt;span&gt;Surround&lt;/span&gt; of
&lt;span&gt;Macaque&lt;/span&gt; &lt;span&gt;V&lt;/span&gt;1 &lt;span&gt;Neurons&lt;/span&gt;.”&lt;/span&gt;
&lt;em&gt;Journal of Neurophysiology&lt;/em&gt; 98 (4): 2168–81. &lt;a
href=&quot;https://doi.org/10.1152/jn.00298.2007&quot;&gt;https://doi.org/10.1152/jn.00298.2007&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&quot;ref-schwabe_role_2006&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Schwabe, Lars, Klaus Obermayer, Alessandra Angelucci, and Paul C.
Bressloff. 2006. &lt;span&gt;“The &lt;span&gt;Role&lt;/span&gt; of &lt;span&gt;Feedback&lt;/span&gt;
in &lt;span&gt;Shaping&lt;/span&gt; the &lt;span&gt;Extra&lt;/span&gt;-&lt;span&gt;Classical&lt;/span&gt;
&lt;span&gt;Receptive&lt;/span&gt; &lt;span&gt;Field&lt;/span&gt; of &lt;span&gt;Cortical&lt;/span&gt;
&lt;span&gt;Neurons&lt;/span&gt;: &lt;span&gt;A&lt;/span&gt; &lt;span&gt;Recurrent&lt;/span&gt;
&lt;span&gt;Network&lt;/span&gt; &lt;span&gt;Model&lt;/span&gt;.”&lt;/span&gt; &lt;em&gt;J. Neurosci.&lt;/em&gt;
26 (36): 9117–29. &lt;a
href=&quot;https://doi.org/10.1523/JNEUROSCI.1253-06.2006&quot;&gt;https://doi.org/10.1523/JNEUROSCI.1253-06.2006&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;
</description>
        <pubDate>Sat, 19 Mar 2016 00:00:00 -0400</pubDate>
        <link>https://neural.vision/blog/article-reviews/visual-neuroscience/ichida-response-2007/</link>
        <guid isPermaLink="true">https://neural.vision/blog/article-reviews/visual-neuroscience/ichida-response-2007/</guid>
        
        <category>neuroscience</category>
        
        <category>surround-suppression</category>
        
        <category>surround-facilitation</category>
        
        <category>extraclassical-surround</category>
        
        <category>Macaque</category>
        
        
        <category>article-reviews</category>
        
        <category>visual-neuroscience</category>
        
      </item>
    
      <item>
        <title>Review of Zoccolan et al. 2005 Multiple</title>
        <description>&lt;p&gt;Even though the neurons in inferotemporal cortex (IT) have very large
receptive fields, it is tempting the believe that the neurons would be
able to distinguish objects presented within their receptive fields. For
example, if a neuron responds to object A and B at different rates,
perhaps the neuron should give the maximum of these two rates when both
stimuli are presented within their receptive field. The study &lt;span
class=&quot;citation&quot; data-cites=&quot;zoccolan_multiple_2005&quot;&gt;(&lt;a
href=&quot;#ref-zoccolan_multiple_2005&quot; role=&quot;doc-biblioref&quot;&gt;Zoccolan, Cox,
and DiCarlo 2005&lt;/a&gt;)&lt;/span&gt; shows that this is not the case and, when
presented with two objects, most IT neurons’ responses are the mean of
the firing rates when the objects are presented separately - at least
for short presentation times and when the objects are not attended.&lt;/p&gt;
&lt;p&gt;There is a lot more to this paper than what I will cover in this
review / note. I hope to add more in the future, but the most important
points are straightforward. They use simple artificial shapes on a plain
background. The first results show that in the population, the cells’
responses to the presentation of multiple objects cluster around the
mean of their responses of when the objects are presented separately.
There is slight tendency to fire at a rate slightly higher than the
average, but the lack of scatter is rather amazing. There is a line in
Figure 1C and 1D for the sum responses and very few of the cells fall on
or above this line.&lt;/p&gt;
&lt;p&gt;They then show that the responses to the combined object displays are
much more like the mean of the responses to individual object displays
than a max model, at least in the mean cell population. There is a lot
of spread in these results, leaving open the possibility that some
neurons give a response that is the maximum of the response to the two
objects separately (or having an even higher response).&lt;/p&gt;
&lt;div id=&quot;refs&quot; class=&quot;references csl-bib-body hanging-indent&quot;
data-entry-spacing=&quot;0&quot; role=&quot;list&quot;&gt;
&lt;div id=&quot;ref-zoccolan_multiple_2005&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Zoccolan, Davide, David D. Cox, and James J. DiCarlo. 2005.
&lt;span&gt;“Multiple &lt;span&gt;Object&lt;/span&gt; &lt;span&gt;Response&lt;/span&gt;
&lt;span&gt;Normalization&lt;/span&gt; in &lt;span&gt;Monkey&lt;/span&gt;
&lt;span&gt;Inferotemporal&lt;/span&gt; &lt;span&gt;Cortex&lt;/span&gt;.”&lt;/span&gt; &lt;em&gt;J.
Neurosci.&lt;/em&gt; 25 (36): 8150–64. &lt;a
href=&quot;https://doi.org/10.1523/JNEUROSCI.2058-05.2005&quot;&gt;https://doi.org/10.1523/JNEUROSCI.2058-05.2005&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;
</description>
        <pubDate>Mon, 22 Feb 2016 00:00:00 -0500</pubDate>
        <link>https://neural.vision/blog/article-reviews/visual-neuroscience/zoccolan-multiple-2005/</link>
        <guid isPermaLink="true">https://neural.vision/blog/article-reviews/visual-neuroscience/zoccolan-multiple-2005/</guid>
        
        <category>neuroscience</category>
        
        <category>IT</category>
        
        <category>macaque</category>
        
        
        <category>article-reviews</category>
        
        <category>visual-neuroscience</category>
        
      </item>
    
      <item>
        <title>Review of Liu, Hashemi-Nezhad, &amp; Lyon (2015) &#39;Contrast invariance of orientation tuning in cat primary visual cortex neurons depends on stimulus size&#39;</title>
        <description>&lt;h1 id=&quot;overview&quot;&gt;Overview&lt;/h1&gt;
&lt;p&gt;There are two main findings from &lt;span class=&quot;citation&quot;
data-cites=&quot;liu_contrast_2015&quot;&gt;(&lt;a href=&quot;#ref-liu_contrast_2015&quot;
role=&quot;doc-biblioref&quot;&gt;Liu, Hashemi-Nezhad, and Lyon 2015&lt;/a&gt;)&lt;/span&gt; in
the the primary visual cortex (V1) using anesthetized cat. First, that
contrast invariance orientation tuning depends on having a stimulus that
extends beyond the CRF. If the stimulus is optimized for the CRF, then
the tuning width decreases with lower contrast (illustrated in Figure 3
of the paper). The orientation tuning profile is invariant when the
stimulus extends to the surround, but when is only covers the CRF.&lt;/p&gt;
&lt;p&gt;The second main finding (illustrated in Figure 4 of the paper) is
that contrast invariance appears with the large stimulus because the
tuning width &lt;em&gt;decreases&lt;/em&gt; in the high contrast stimulus when the
surround stimulus is added to the CRF stimulus. The tuning width for the
low contrast conditions on average stays the same with or without the
stimulus in the surround (although individual cells may be facilitated
or suppressed).&lt;/p&gt;
&lt;p&gt;This results of &lt;span class=&quot;citation&quot;
data-cites=&quot;liu_contrast_2015&quot;&gt;(&lt;a href=&quot;#ref-liu_contrast_2015&quot;
role=&quot;doc-biblioref&quot;&gt;Liu, Hashemi-Nezhad, and Lyon 2015&lt;/a&gt;)&lt;/span&gt; are
difficult to reconcile with classical results and, for me, indicate that
a better measure of contrast-invariant orientation tuning is needed.
This paper should definitely be read for anyone interested in this
feature.&lt;/p&gt;
&lt;h1 id=&quot;stimulus-and-methods&quot;&gt;Stimulus and Methods&lt;/h1&gt;
&lt;p&gt;For the main experiment, they have two contrast conditions (low and
high) that are defined for each neuron and two size conditions (CRF and
CRF+ECS) that are defined for each contrast (and neuron). The smaller of
the two sizes, the CRF / patch condition, is defined as the size that
produces the largest response from the cell. The larger size, the
CRF+ECS (extraclassical surround) condition, is defined by the size that
produces the maximum suppression.&lt;/p&gt;
&lt;p&gt;The paper almost exclusively reports the half-width at half height
(HWHH). This is half the width of the (fitted) orientation tuning curve
that elicits half of the maximum response of that tuning curve.&lt;/p&gt;
&lt;h1 id=&quot;discussion&quot;&gt;Discussion&lt;/h1&gt;
&lt;p&gt;The paper states in the discussion that most other papers on this
topic did not use the optimally sized stimulus, hence why they report
different results. They do point out that &lt;span class=&quot;citation&quot;
data-cites=&quot;finn_contrast-invariant_2007&quot;&gt;(&lt;a
href=&quot;#ref-finn_contrast-invariant_2007&quot;
role=&quot;doc-biblioref&quot;&gt;&lt;strong&gt;finn_contrast-invariant_2007?&lt;/strong&gt;&lt;/a&gt;)&lt;/span&gt;
did use a similar CRF condition, but reported different results
presumably because they used patch clamping. In Supplemental Fig. 3 of
Finn et al., there are some extracellularly recorded neurons that
reportedly are more consistent (I haven’t checked results).&lt;/p&gt;
&lt;p&gt;Deep anesthesia is known to change the properties of ECS of early
visual neurons. It is unclear to me how much the results from
anesthesized animals can be generalized to the normal awake state.&lt;/p&gt;
&lt;div id=&quot;refs&quot; class=&quot;references csl-bib-body hanging-indent&quot;
data-entry-spacing=&quot;0&quot; role=&quot;list&quot;&gt;
&lt;div id=&quot;ref-liu_contrast_2015&quot; class=&quot;csl-entry&quot; role=&quot;listitem&quot;&gt;
Liu, Yong-Jun, Maziar Hashemi-Nezhad, and David C. Lyon. 2015.
&lt;span&gt;“Contrast Invariance of Orientation Tuning in Cat Primary Visual
Cortex Neurons Depends on Stimulus Size.”&lt;/span&gt; &lt;em&gt;J Physiol&lt;/em&gt; 593
(19): 4485–98. &lt;a
href=&quot;https://doi.org/10.1113/JP271180&quot;&gt;https://doi.org/10.1113/JP271180&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;
</description>
        <pubDate>Sat, 20 Feb 2016 00:00:00 -0500</pubDate>
        <link>https://neural.vision/blog/article-reviews/visual-neuroscience/liu-contrast-2015/</link>
        <guid isPermaLink="true">https://neural.vision/blog/article-reviews/visual-neuroscience/liu-contrast-2015/</guid>
        
        <category>neuroscience</category>
        
        <category>contrast-invariant-orientation-selectivity</category>
        
        <category>cats</category>
        
        
        <category>article-reviews</category>
        
        <category>visual-neuroscience</category>
        
      </item>
    
      <item>
        <title>Change konsole appearance during SSH</title>
        <description>&lt;p&gt;&lt;em&gt;Everyone&lt;/em&gt; knows that feeling: when you have many consoles
open at the same time connected via ssh to various servers. In this post
I’m going to show a simple trick that allows you to change the
background whenever you ssh to a server and changes it back when you
logout - well, at least if you are using KDE (or have konsole
installed).&lt;/p&gt;
&lt;p&gt;For example, I have a virtual linux system that I call “Puffin”. I’ve
created an alias “ssh-puffin” to login via ssh.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;https://neural.vision/images/konsole-ssh-awesomeness-1.png&quot;
alt=&quot;Before ssh session&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;Before ssh session&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I have setup this alias to change the background of konsole:&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;https://neural.vision/images/konsole-ssh-awesomeness-2.png&quot;
alt=&quot;During ssh session&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;During ssh session&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;And then, after I log out, the konsole switches back to the local
profile (and gives a warm and fuzzy welcome-back message).&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;https://neural.vision/images/konsole-ssh-awesomeness-3.png&quot;
alt=&quot;After ssh session&quot; /&gt;
&lt;figcaption aria-hidden=&quot;true&quot;&gt;After ssh session&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;step-1-add-konsole-profiles&quot;&gt;Step 1: Add konsole profile(s)&lt;/h2&gt;
&lt;p&gt;Create konsole profiles and corresponding color schemes for your
local system (“Local”) and remote systems (“Puffin”). You only need to
really create the color schemes, but I always create a separate profile
with the same name. This is done by going to Settings of a konsole
window and selecting “Manage Profiles”. You can access the color schemes
by clicking edit (or new) and then clicking on Appearance.&lt;/p&gt;
&lt;p&gt;I created the Puffin background with GIMP using layers and an &lt;a
href=&quot;https://commons.wikimedia.org/wiki/File:Papageitaucher_Fratercula_arctica.jpg&quot;&gt;image
from Wikimedia Commons&lt;/a&gt; by &lt;a
href=&quot;https://commons.wikimedia.org/wiki/User:Richard_Bartz&quot;&gt;Richard
Bartz&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can, of course, change the console appearance in other ways.&lt;/p&gt;
&lt;h2 id=&quot;step-2-modify-.bashrc&quot;&gt;Step 2: Modify .bashrc&lt;/h2&gt;
&lt;p&gt;Add the following to your .bashrc file:&lt;/p&gt;
&lt;pre class=&quot;ssh&quot;&gt;&lt;code&gt;alias resetcolors=&amp;quot;konsoleprofile colors=Local&amp;quot;
alias ssh-puffin=&amp;quot;konsoleprofile colors=Puffin; ssh puffin; resetcolors; echo &amp;#39;Welcome back&amp;#39;&amp;quot;&amp;#39;!&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you have many remote servers, you may want to add your .bashrc
file to github or the cloud™.&lt;/p&gt;
&lt;h2 id=&quot;step-3-enjoy-awesomeness&quot;&gt;Step 3: Enjoy awesomeness&lt;/h2&gt;
&lt;p&gt;After reloading &lt;code&gt;.bashrc&lt;/code&gt;, you can then log into the
server using your alias.&lt;/p&gt;
&lt;h2 id=&quot;acknowledgements&quot;&gt;Acknowledgements&lt;/h2&gt;
&lt;p&gt;I first figured out how to do this from a &lt;a
href=&quot;https://abdussamad.com/archives/503-Changing-Konsole-colors-in-KDE.html&quot;&gt;blog
post by Abdussamad&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Fri, 01 Jan 2016 22:19:00 -0500</pubDate>
        <link>https://neural.vision/blog/linux/konsole-ssh-awesomeness/</link>
        <guid isPermaLink="true">https://neural.vision/blog/linux/konsole-ssh-awesomeness/</guid>
        
        <category>KDE</category>
        
        <category>linux</category>
        
        
        <category>linux</category>
        
      </item>
    
      <item>
        <title>Backpropagation with shared weights in convolutional neural networks</title>
        <description>&lt;p&gt;The success of deep convolutional neural networks would not be
possible without weight sharing - the same weights being applied to
different neuronal connections. However, this property also makes them
more complicated. This post aims to give an intuition of how
backpropagation works with weight sharing. For a more well-rounded
introduction to backpropagation of convolutional neural networks, see
Andrew Gibiansky’s &lt;a
href=&quot;http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/&quot;&gt;blog
post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Backpropagation is used to calculate how the error in a neural
network changes with respect to changes in a weight &lt;span
class=&quot;math inline&quot;&gt;\(w\)&lt;/span&gt; in that neural network. In other words,
it calculates:&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[\frac{\partial E}{\partial w},
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;where &lt;span class=&quot;math inline&quot;&gt;\(E\)&lt;/span&gt; is the error and &lt;span
class=&quot;math inline&quot;&gt;\(w\)&lt;/span&gt; is a weight.&lt;/p&gt;
&lt;p&gt;For traditional feed-forward neural networks, each connection between
two neurons has it’s own weight and the calculation of the
backpropagation is generally straightforward using the chain rule. For
example, if you know how the error changes with respect the node &lt;span
class=&quot;math inline&quot;&gt;\(y_i\)&lt;/span&gt; (ie. &lt;span
class=&quot;math inline&quot;&gt;\(\frac{\partial E}{\partial y_i}\)&lt;/span&gt;), then
calculating the contribution of the pre-synaptic weights of that node is
simply:&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[\frac{\partial E}{\partial
w}=\frac{\partial E}{\partial y_i}\frac{\partial y_i}{\partial w}.
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This is complicated in convolutional neural networks because the
weight &lt;span class=&quot;math inline&quot;&gt;\(w\)&lt;/span&gt; is used for multiple nodes
(often, most or all nodes in the same layer).&lt;/p&gt;
&lt;h1 id=&quot;handling-shared-weights&quot;&gt;Handling shared weights&lt;/h1&gt;
&lt;p&gt;In classical convolutional neural networks, shared weights are
handled by summing together each instance that the weight appears in
backpropagation derivation, instead of, for example, taking the average
of each occurrence. So, if layer &lt;span
class=&quot;math inline&quot;&gt;\(y^l\)&lt;/span&gt; is the layer “post-synaptic” to the
weight &lt;span class=&quot;math inline&quot;&gt;\(w\)&lt;/span&gt; and we have calculated the
effect of layer on the error (&lt;span class=&quot;math inline&quot;&gt;\(\frac{\partial
E}{\partial y^l}\)&lt;/span&gt;), then the weights are:&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[\frac{\partial E}{\partial
w}=\sum_i\frac{\partial E}{\partial y^l_i} \frac{\partial
y^l_i}{\partial w},
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;where &lt;span class=&quot;math inline&quot;&gt;\(i\)&lt;/span&gt; specifies the node
within layer &lt;span class=&quot;math inline&quot;&gt;\(l\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;So why is summation the correct operation? In essence, it is because
when the paths from a weight (applied at different locations) merge,
they do so with summation. For example, convolution involves summing the
paths (in the dot-operation). Other operations such as max pooling and
fully connected layers also involve summing the separate paths.&lt;/p&gt;
&lt;!--

A kernel is convolved across an entire layer. So, a given weight $w$ of a kernel effects the output the neural network via different paths. The paths can be merged via a kernel or by a fully connected layer. Since these operations sum

Say that the weight $w$ is used to calculate layer $l$.
If we know the effect of layer $l$ on the error, $\frac{\partial E}{\partial y^l}$, then the error $\frac{\partial E}{\partial w}$ can be calculated as:

$$\frac{\partial E}{\partial w}=\frac{\partial E}{\partial y^l}\frac{\partial y^l}{\partial w}.
$$

, given the effect of the layer $l$ to the error, $\frac{\partial E}{\partial y^l}$
given
--&gt;
&lt;h1 id=&quot;simple-example&quot;&gt;Simple example&lt;/h1&gt;
&lt;p&gt;Let’s take a very simple convolutional network.&lt;/p&gt;
&lt;p&gt;Let layer &lt;span class=&quot;math inline&quot;&gt;\(y^0\)&lt;/span&gt; be a 2D input
layer and &lt;span class=&quot;math inline&quot;&gt;\([w_0, 0, 0]\)&lt;/span&gt; a kernel that
is applied to this convolutional layer. For simplicity, lets only have a
single kernel. Then:&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[
x^1_{i}=w_0 y^0_{i}
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;An activation function is then applied to this result: &lt;span
class=&quot;math inline&quot;&gt;\(y^1_i=h(x^1_{i})\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;For the next convolutional layer, let’s say that the kernel &lt;span
class=&quot;math inline&quot;&gt;\([w_1,w_2,w_3]\)&lt;/span&gt; is applied. Then:&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[
\begin{aligned}
x^2_{i}&amp;amp;=\sum_{a=1}^3 w_a y^1_{i+a-1} \\
       &amp;amp;= w_1 y^1_i + w_2 y^1_{i+1} + w_3 y^1_{i+2} \\
       &amp;amp;= w_1 h\left(w_0 y^0_{i}\right) + w_2 h\left(w_0
y^0_{i+1}\right) + w_3 h\left(w_0 y^0_{i+2}\right). \\
\end{aligned}
\]&lt;/span&gt; and &lt;span class=&quot;math display&quot;&gt;\[
y^2_{i} = h(x^2_{i}).
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;So we are interested in &lt;span class=&quot;math inline&quot;&gt;\(\frac{\partial
E}{\partial w_0}\)&lt;/span&gt;. Let’s say that the error is only effected by
the &lt;span class=&quot;math inline&quot;&gt;\(j\)&lt;/span&gt;th node of the output: &lt;span
class=&quot;math inline&quot;&gt;\(y^2_{j}\)&lt;/span&gt;. Then:&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[\frac{\partial E}{\partial w_0} =
\frac{\partial E}{\partial y^2_{i}}\frac{\partial y^2_{j}}{\partial
x^2_j}\frac{\partial x^2_{j}}{\partial w_0}
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Assume that we have &lt;span class=&quot;math inline&quot;&gt;\(\frac{\partial
E}{\partial y^2_{j}}\)&lt;/span&gt; and &lt;span
class=&quot;math inline&quot;&gt;\(\frac{\partial y^2_{j}}{\partial x^2_j}\)&lt;/span&gt;,
then we only need to solve for &lt;span
class=&quot;math inline&quot;&gt;\(\frac{\partial x^2_{j}}{\partial
w_0}\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt;\[
\begin{aligned}
\frac{\partial x^2_{j}}{\partial w_0}&amp;amp;=\frac{\partial}{\partial w_0}
\left(\sum_{a=1}^3 w_a y^1_{j+a-1}\right)\\
&amp;amp;= \sum_{a=1}^3 w_a \frac{\partial}{\partial w_0} \left(
y^1_{j+a-1}\right)\\
&amp;amp;= \sum_{a=1}^3 w_a \frac{\partial}{\partial w_0} \left( h\left(w_0
y^0_{j+a-1}\right)\right)\\
       &amp;amp;= w_1 \frac{\partial}{\partial w_0} h\left(w_0
y^0_{j}\right) +
       w_2 \frac{\partial}{\partial w_0} h\left(w_0 y^0_{j+1}\right) +
       w_3 \frac{\partial}{\partial w_0} h\left(w_0 y^0_{j+2}\right). \\
\end{aligned}
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Notice that each occurrence of &lt;span
class=&quot;math inline&quot;&gt;\(w_0\)&lt;/span&gt; is summed separately, and hence why
backpropagation sums the shared weights in convolutional networks.&lt;/p&gt;
</description>
        <pubDate>Wed, 23 Dec 2015 00:00:00 -0500</pubDate>
        <link>https://neural.vision/blog/deep-learning/backpropagation-with-shared-weights/</link>
        <guid isPermaLink="true">https://neural.vision/blog/deep-learning/backpropagation-with-shared-weights/</guid>
        
        <category>backpropagation</category>
        
        <category>dcnn</category>
        
        <category>vision</category>
        
        <category>deep-learning</category>
        
        
        <category>deep-learning</category>
        
      </item>
    
      <item>
        <title>Passwordless ssh authentication!</title>
        <description>&lt;p&gt;In your local system, check to see if you have the following
files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~/.ssh/id_rsa&lt;/li&gt;
&lt;li&gt;~/.ssh/id_rsa.pub&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If not, type:&lt;/p&gt;
&lt;div class=&quot;sourceCode&quot; id=&quot;cb1&quot;&gt;&lt;pre
class=&quot;sourceCode bash&quot;&gt;&lt;code class=&quot;sourceCode bash&quot;&gt;&lt;span id=&quot;cb1-1&quot;&gt;&lt;a href=&quot;#cb1-1&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;&lt;span class=&quot;fu&quot;&gt;ssh-keygen&lt;/span&gt; &lt;span class=&quot;at&quot;&gt;-t&lt;/span&gt; rsa&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And follow the instructions. Note that &lt;code&gt;ssh-agent&lt;/code&gt; can be
used to securely save your passphrase.&lt;/p&gt;
&lt;p&gt;After you have generate your private and public keys, you want to
give your remote system the public key:&lt;/p&gt;
&lt;div class=&quot;sourceCode&quot; id=&quot;cb2&quot;&gt;&lt;pre
class=&quot;sourceCode bash&quot;&gt;&lt;code class=&quot;sourceCode bash&quot;&gt;&lt;span id=&quot;cb2-1&quot;&gt;&lt;a href=&quot;#cb2-1&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;&lt;span class=&quot;ex&quot;&gt;ssh-copy-id&lt;/span&gt; &lt;span class=&quot;at&quot;&gt;-i&lt;/span&gt; ~/.ssh/id_rsa.pub username@remote.system&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After entering your password, you’re done!&lt;/p&gt;
&lt;p&gt;Reference: &lt;a
href=&quot;http://www.debian-administration.org/articles/152&quot;
class=&quot;uri&quot;&gt;http://www.debian-administration.org/articles/152&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 21 Dec 2015 16:15:37 -0500</pubDate>
        <link>https://neural.vision/blog/linux/passwordless-ssh/</link>
        <guid isPermaLink="true">https://neural.vision/blog/linux/passwordless-ssh/</guid>
        
        <category>linux</category>
        
        
        <category>linux</category>
        
      </item>
    
  </channel>
</rss>
