Capsule Neural Networks can be seen as an enhancement of Convolutional Neural Networks. In order to understand capsule neural networks, lets first recap convolutional neural networks (CNN). In CNN, initial layers detect simple features like edges, curves, color gradients etc. Deeper convolutional layers start combining the simple features into comparatively complex features and so on. But in doing so, CNN does not take care of orientational and relative spatial relationships between the features or components. So, sometimes, CNN can be easily tricked.
For example, in face recognition, CNN does not take care of placements of eyes, nose, mouth, lips etc. Even if lips are near to eyes or eyes are below the mouth, it will still consider it a face. If all the features or components of face are available, it will consider it as a face without taking care of the orientation and placement of those components. Capsule networks take care of this.
I have written a separate post on CNN. Please go through it for detailed information on CNN.
Pooling layer problem in CNN: Pooling layer is used to perform down-sampling the data due to which a lot of information is lost. These layers reduce the spatial resolution, so their outputs are invariant to small changes in the inputs. This is a problem when detailed information must be preserved throughout the network. With CapsNets, detailed pose information (such as precise object position, rotation, thickness, skew, size, and so on) is preserved throughout the network. Small changes to the inputs result in small changes to the outputs—information is preserved. This is called “equivariance.”
Capsule: Human brain is organized into modules called capsules. Considering this fact, concept of capsule was put forward by Hilton. A capsule can be considered as a group of neurons. We can add as many neurons to a capsule to capture different dimensions of an image like scale thickness, stroke thickness, width, skew, translation etc. It can maintain information such as equivariance, hue, pose, albedo, texture, deformation, speed, and location of the object.
Dynamic Routing Algorithm: Human brain has a mechanism to route information among capsules. On similar mechanism, dynamic routing algorithm was suggested by Hilton. This algorithm allows capsules to communicate with each other. For more details, please visit this article:
Dynamic Routing Between Capsules
Squashing Function: Instead of ReLU, a new squashing function was suggested by Hilton known as novel squashing function. It is used to normalize the magnitude of vectors so that it falls between 0 and 1. The outputs from these squash functions tell us how to route data through various capsules that are trained to learn different concepts.
Limitations of Capsule Neural Networks
1. As compared to the CNN, the training time for the capsule network is slower because of its computational complexity.
2. It has been tested over MNIST dataset, but how will it behave on complex dataset, is still unknown.
3. This concept is still under research. So, it has a lot of scope for improvement.
I would suggest to go through this PDF for more details on Capsule Neural Networks.
For example, in face recognition, CNN does not take care of placements of eyes, nose, mouth, lips etc. Even if lips are near to eyes or eyes are below the mouth, it will still consider it a face. If all the features or components of face are available, it will consider it as a face without taking care of the orientation and placement of those components. Capsule networks take care of this.
I have written a separate post on CNN. Please go through it for detailed information on CNN.
Pooling layer problem in CNN: Pooling layer is used to perform down-sampling the data due to which a lot of information is lost. These layers reduce the spatial resolution, so their outputs are invariant to small changes in the inputs. This is a problem when detailed information must be preserved throughout the network. With CapsNets, detailed pose information (such as precise object position, rotation, thickness, skew, size, and so on) is preserved throughout the network. Small changes to the inputs result in small changes to the outputs—information is preserved. This is called “equivariance.”
Capsule: Human brain is organized into modules called capsules. Considering this fact, concept of capsule was put forward by Hilton. A capsule can be considered as a group of neurons. We can add as many neurons to a capsule to capture different dimensions of an image like scale thickness, stroke thickness, width, skew, translation etc. It can maintain information such as equivariance, hue, pose, albedo, texture, deformation, speed, and location of the object.
Dynamic Routing Algorithm: Human brain has a mechanism to route information among capsules. On similar mechanism, dynamic routing algorithm was suggested by Hilton. This algorithm allows capsules to communicate with each other. For more details, please visit this article:
Dynamic Routing Between Capsules
Squashing Function: Instead of ReLU, a new squashing function was suggested by Hilton known as novel squashing function. It is used to normalize the magnitude of vectors so that it falls between 0 and 1. The outputs from these squash functions tell us how to route data through various capsules that are trained to learn different concepts.
Limitations of Capsule Neural Networks
1. As compared to the CNN, the training time for the capsule network is slower because of its computational complexity.
2. It has been tested over MNIST dataset, but how will it behave on complex dataset, is still unknown.
3. This concept is still under research. So, it has a lot of scope for improvement.
I would suggest to go through this PDF for more details on Capsule Neural Networks.