GitXplorerGitXplorer
d

spatial-pooling.torch

public
4 stars
4 forks
0 issues

Commits

List of commits on branch master.
Unverified
93eeaa25c376bf3909b60885038247306bbde375

update init.lua

ddurandtibo committed 8 years ago
Unverified
c6a40bd7d956633a8df0014523aa65b690533f21

add init.lua

ddurandtibo committed 8 years ago
Unverified
5bf34a40a63c9455fab8e697eade47339bdae967

initial commit

ddurandtibo committed 8 years ago
Unverified
64d50c68ed355661617d25023c2838b48ba97c6a

Initial commit

ddurandtibo committed 8 years ago

README

The README file for this repository.

Spatial Pooling for Torch7

This repositery proposed the implementation of several spatial poolings used for weakly supervised learning of deep ConvNets.

Installation

$ git clone https://github.com/durandtibo/spatial-pooling.torch.git
$ cd spatial-pooling.torch
$ luarocks make rocks/spatial-pooling-scm-1.rockspec

To test the installation, you can run

$ th test/test.lua

Modules

GlobalMaxPooling (GMP)

Global Max Pooling is a spatial pooling strategy used in "Is object localization for free? – Weakly Supervised Object Recognition with Convolutional Neural Networks ".

module = nn.GlobalMaxPooling()

Applies 2D max-pooling operation on the whole image. The number of output features is equal to the number of input planes.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = max_{i,j} z^c_{i,j}
References
@inproceedings{Oquab_DeepMIL_CVPR15,
author = "Oquab, M. and Bottou, L. and Laptev, I. and Sivic, J.",
title = "Is object localization for free? – Weakly-supervised learning with convolutional neural networks",
booktitle =  "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition",
year = "2015"
}

GlobalkMaxPooling

GlobalkMaxPooling is a generalization of GlobalMaxPooling to multiple maximums. This spatial pooling strategy is inspired from Top Instances model "Multiple Instance Learning for Soft Bags via Top Instances".

module = nn.GlobalkMaxPooling(kMax)

Applies 2D k-max-pooling operation on the whole image. The number of output features is equal to the number of input planes.

The parameter is the following:

  • kMax: The number of top instances. kMax can defined the number of selected regions (kMax >= 1) or the proportion of selected regions (0 < kMax < 1). If kMax <= 0, all the regions are selected. Default is kMax = 1.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = max_{h in H_kMax} 1 / kMax sum_{i,j} h_{i,j} z^c_{i,j}

where H_k is such that h in H_k satisfies h_{i,j} in {0, 1} and sum_{i,j} h_{i,j} = k

Special cases
  • GlobalMaxPooling: if kMax = 1
  • GlobalAveragePooling: if kMax = h x w

GlobalAveragePooling (GAP)

Global Max Pooling is a spatial pooling strategy used in "Learning Deep Features for Discriminative Localization ".

module = nn.GlobalAveragePooling()

Applies 2D average-pooling operation on the whole image. The number of output features is equal to the number of input planes.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = 1 / (h * w) sum_{i,j} z^c_{i,j}

References

@inproceedings{Zhou_2016_CVPR,
author = {Zhou, Bolei and Khosla, Aditya and Lapedriza, Agata and Oliva, Aude and Torralba, Antonio},
title = {{Learning Deep Features for Discriminative Localization}},
booktitle = {CVPR},
year = {2016}
}

LogSumExpPooling

LogSumExpPooling is a spatial pooling strategy used in "From Image-level to Pixel-level Labeling with Convolutional Networks ".

module = nn.LogSumExpPooling(beta)

Applies 2D LogSumExp-pooling operation on the whole image. The number of output features is equal to the number of input planes.

The parameter is the following:

  • beta: The anti-temperature parameter. Default is beta=1.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = log 1 / (h * w) sum_{i,j} exp(beta * z^c_{i,j})
Special cases
  • GlobalMaxPooling: if beta = +inf
  • GlobalAveragePooling: if beta = 0

References

@inproceedings{pinheiro_weak_seg_cvpr15,
Author = {Pedro O. Pinheiro and Ronan Collobert},
Title = {{From Image-level to Pixel-level Labeling with Convolutional Networks}},
booktitle = {CVPR},
Year = {2015}
}

WeldonPooling

WeldonPooling is a spatial pooling module used in "WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks".

module = nn.WeldonPooling(kMax, kMin)

Applies 2D WELDON-pooling operation on the whole image. The number of output features is equal to the number of input planes.

The parameters are the following:

  • kMax: The number of top instances. It is possible to define the number of selected regions (kMax >= 1) or the proportion of selected regions (0 <= kMax < 1). If kMax < 0, kMax is set to 0. Default is kMax = 1.
  • kMin: The number of low instances. It is possible to define the number of selected regions (kMin >= 1) or the proportion of selected regions (0 <= kMin < 1). If kMin < 0, kMin is set to 0. Default is kMin = 1.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = max_{h in H_kMax} 1 / kMax sum_{i,j} h_{i,j} z^c_{i,j} + min_{h in H_kMin} 1 / kMin sum_{i,j} h_{i,j} z^c_{i,j}

where H_k is such that h in H_k satisfies h_{i,j} in {0, 1} and sum_{i,j} h_{i,j} = k

Special cases
  • GlobalMaxPooling: if kMax = 1 and kMin = 0
  • GlobalAveragePooling: if kMax = h x w and kMin = 0
  • MantraPooling: if kMax = 1 and kMin = 1
References
@inproceedings{Durand_WELDON_CVPR_2016,
author = {Durand, Thibaut and Thome, Nicolas and Cord, Matthieu},
title = {{WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks}},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2016}
}

Licence

MIT License