Learning to Promote Saliency Detectors
https://github.com/lartpang/M…
缩写标注:
- SD: Saliency Detection
- ZSL: Zero-Shot Learning
关键内容:
- 没有训练直接将图像映射到标签中的DNN。相反,将DNN拟合为一个嵌入函数,以将像素和显著/背景区域的属性映射到度量空间。显着/背景区域的属性被映射为度量空间中的锚点。然后,在该空间中构造最近邻(NN)分类器,将最近的锚点的标签分配给对应的像素.
保持分辨率的手段:
- 移除了最后两个卷积块的池化层, 使用扩张卷积来维持卷积滤波器的感受野
- 添加亚像素卷积层到每个VGG特征提取器的卷积块后, 来上采样每个卷积块的特征图到输入图像大小.
使用了迭代训练/测试的策略.
- 这里没有提到训练迭代次数如何确定
- 测试的迭代次数是人工给定的
一些想法:
类似于R3Net, 最后的添加的结构都是反复迭代测试后才确定使用多少次, 而且按照相关的测试可以看出来, 一定次数后, 提升的效果就趋于饱和了, 只能说这里的提到的方法对于现有网络的提升具有一定的助益.
对于这里提到的, 这是类似于一种ZSL的方法, 也就是利用现有的SD算法产生的结果(“过去的知识”), 添加的新结构, 不断利用过去的知识迭代, 实现对于最终”后处理”后结果的一个促进(“来对现有的SD算法进行推广”).
一些疑惑:
如何将这个方法应用到现有的架构呢? 如何改造现有架构?
改造后的结构, 训练的时候也要按照文中那样, 随机翻转真值中的像素标签么?
http://www.vartang.com/2013/0…) on an image graph model, where saliency of each region is defined as its absorbed time from boundary nodes.Yang et al. [32] rank the similarity of the image regions with foreground cues or background cues via graph-based manifold ranking(通过基于图的流形排序对图像区域与前景线索或背景线索的相似性进行排序).Since the conventional methods are not robust in complex scenes neither capable of capturing semantic objects, deep neural networks (DNNs) are introduced to overcome these drawbacks.
- Li et al. [16] train CNNs with fully connected layers to predict saliency value of each superpixel, and to enhance the spatial coherence(空间连贯性) of their saliency results using a refinement method.
- Li et al. [18] propose a FCN trained under the multi-task learning framework for saliency detection.
- Zhang et al. [34] present a generic framework to aggregate multi-level convolutional features for saliency detection.
Although the proposed method is also based on DNNs, the main difference between ours and these methods is that they learn a general model that directly maps images to labels, while our method learns a general embedding function as well as an image-specific NN classifier.
TD
Top-down (TD) saliency aims at finding salient regions specified by a task, and is usually formulated as a supervised learning problem.
- Yang and Yang [33] propose a supervised top-down saliency model that jointly learns a Conditional Random Field (CRF) and a discriminative dictionary.
- Gao et al. [9] introduced a top-down saliency algorithm by selecting discriminant features from a pre-defined filter bank(预定义的过滤器库).
TD+BU
Integration of TD and BU saliency has been exploited by some methods.
- Borji [3] combines low-level features and saliency maps of previous bottom-up models with top-down cognitive visual features to predict fixations.
- Tong et al. [26] proposed a top-down learning approach where the algorithm is bootstrapped with training samples generated using a bottom-up model(该算法使用自下而上模型生成的训练样本进行引导) to exploit the strengths of both bottom-up contrast-based saliency models and top-down learning methods.
Our method also can be viewed as an integration of TD and BU saliency. Although both our method and the method of Tonget al. [26] formulate the problem as top-down saliency detection specified by initial saliency maps, there are certain difference between the two.
- First, Tong’s method trains a strong model via boostrap learning(引导学习) with training samples generated by a weak model. In contrast, our method maps pixels and the approximate salient/background regions into a learned metric space, which is related to zero-shot learning.
- Second, thanks to deep learning, our method is capable of capturing semantically salient regions and does well on complex scenes, while Tong’s method uses hand-crafted features and heuristic priors, which are less robust.
- Third, our method produces pixel-level results, while Tong’s method computes saliency value of each image region to assemble a saliency map, which tends to be coarser.
The Proposed Method
https://zhuanlan.zhihu.com/p/…What is embedding | embedded space | feature embedding in deep neural architectures?: https://www.quora.com/What-is…有谁可以解释下word embedding? – 寒蝉鸣泣的回答 – 知乎: https://www.zhihu.com/questio…Sub-pixel Convolution(子像素卷积): https://blog.csdn.net/leviopk…