转自 | 极视平台 来源 | https://zhuanlan.zhihu.com/p/31426458 作者 | 白裳 本文来自知乎专栏,仅供学习参考使用,著作权归作者所有。如有侵权,请私信删除。
目录
1 Conv layers
2 Region Proposal Networks(RPN)
- 2.1 多通道图像卷积基础知识介绍
- 2.2 anchors
- 2.3 softmax判定positive与negative
- 2.4 bounding box regression原理
- 2.5 对proposals进行bounding box regression
- 2.6 Proposal Layer
3 RoI pooling
- 3.1 为何需要RoI Pooling
- 3.2 RoI Pooling原理
4 Classification
5 Faster RCNN训练
- 5.1 训练RPN网络
- 5.2 通过训练好的RPN网络收集proposals
- 5.3 训练Faster RCNN网络
Questions and Answer
-
Conv layers。作为一种CNN网络目标检测方法,Faster RCNN首先使用一组基础的conv+relu+pooling层提取image的feature maps。该feature maps被共享用于后续RPN层和全连接层。 -
Region Proposal Networks。RPN网络用于生成region proposals。该层通过softmax判断anchors属于positive或者negative,再利用bounding box regression修正anchors获得精确的proposals。 -
Roi Pooling。该层收集输入的feature maps和proposals,综合这些信息后提取proposal feature maps,送入后续全连接层判定目标类别。 -
Classification。利用proposal feature maps计算proposal的类别,同时再次bounding box regression获得检测框最终的精确位置。
1 Conv layers
-
所有的conv层都是:kernel_size=3,pad=1,stride=1 -
所有的pooling层都是:kernel_size=2,pad=1,stride=1,kernel_size=2,pad=0,stride=2
2 Region Proposal Networks(RPN)
2.1 多通道图像卷积基础知识介绍
-
对于单通道图像+单卷积核做卷积,第一章中的图3已经展示了; -
对于多通道图像+多卷积核做卷积,计算方式如下:
2.2 anchors
-
在原文中使用的是ZF model中,其Conv Layers中最后的conv5层num_output=256,对应生成256张特征图,所以相当于feature map每个点都是256-dimensions
-
在conv5之后,做了rpn_conv/3x3卷积且num_output=256,相当于每个点又融合了周围3x3的空间信息(猜测这样做也许更鲁棒?反正我没测试),同时256-d不变(如图4和图7中的红框)
-
假设在conv5 feature map中每个点上有k个anchor(默认k=9),而每个anhcor要分positive和negative,所以每个点由256d feature转化为cls=2k scores;而每个anchor都有(x, y, w, h)对应4个偏移量,所以reg=4k coordinates
-
补充一点,全部anchors拿去训练太多了,训练程序会在合适的anchors中随机选取128个postive anchors+128个negative anchors进行训练(什么是合适的anchors下文5.1有解释)
2.3 softmax判定positive与negative
}
blob=[batch_size, channel,height,width]
2.4 bounding box regression原理
-
给定anchor 和
-
寻找一种变换F,使得: ,其中
-
先做平移
-
再做缩放
说完原理,对应于Faster RCNN原文,positive anchor与ground truth之间的平移量 与尺度因子 如下:
2.5 对proposals进行bounding box regression
<section style="margin-right: 8px;margin-left: 8px;"><br /></section>
-
大小为 50*38*2k 的positive/negative softmax分类特征矩阵
-
大小为 50*38*4k 的regression坐标回归特征矩阵
2.6 Proposal Layer
<section style="margin-right: 8px;margin-left: 8px;"><br /></section>
<section style="margin-right: 8px;margin-left: 8px;"><br /></section>
首先解释im_info。对于一副任意大小PxQ图像,传入Faster RCNN前首先reshape到固定MxN,im_info=[M, N, scale_factor]则保存了此次缩放的所有信息。然后经过Conv Layers,经过4次pooling变为WxH=(M/16)x(N/16)大小,其中feature_stride=16则保存了该信息,用于计算anchor偏移量。
-
按照输入的positive softmax scores由大到小排序anchors,提取前pre_nms_topN(e.g. 6000)个anchors,即提取修正位置后的positive anchors
-
限定超出图像边界的positive anchors为图像边界,防止后续roi pooling时proposal超出图像边界(见文章底部QA部分图21)
-
剔除尺寸非常小的positive anchors
-
对剩余的positive anchors进行NMS(nonmaximum suppression)
-
Proposal Layer有3个输入:positive和negative anchors分类器结果rpn_cls_prob_reshape,对应的bbox reg的(e.g. 300)结果作为proposal输出
3 RoI pooling
-
原始的feature maps
-
RPN输出的proposal boxes(大小各不相同)
-
从图像中crop一部分传入网络
-
将图像warp成需要的大小后传入网络
3.2 RoI Pooling原理
-
由于proposal是对应MXN尺度的,所以首先使用spatial_scale参数将其映射回(M/16)X(N/16)大小的feature map尺度;
-
再将每个proposal对应的feature map区域水平分为 的网格;
-
对网格的每一份都进行max pooling处理。
4 Classification
-
通过全连接和softmax对proposals进行分类,这实际上已经是识别的范畴了
-
再次对proposals进行bounding box regression,获取更高精度的rect box
5 Faster RCNN训练
-
在已经训练好的model上,训练RPN网络,对应stage1_rpn_train.pt
-
利用步骤1中训练好的RPN网络,收集proposals,对应rpn_test.pt
-
第一次训练Fast RCNN网络,对应stage1_fast_rcnn_train.pt
-
第二训练RPN网络,对应stage2_rpn_train.pt
-
再次利用步骤4中训练好的RPN网络,收集proposals,对应rpn_test.pt
-
第二次训练Fast RCNN网络,对应stage2_fast_rcnn_train.pt
5.1 训练RPN网络
-
cls loss,即rpn_cls_loss层计算的softmax loss,用于分类anchors为positive与negative的网络训练
-
reg loss,即rpn_loss_bbox层计算的soomth L1 loss,用于bounding box regression网络训练。注意在该loss中乘了 ,相当于只关心positive anchors的回归(其实在回归中也完全没必要去关心negative)。
-
在RPN训练阶段,rpn-data(python AnchorTargetLayer)层会按照和test阶段Proposal层完全一样的方式生成Anchors用于训练
-
对于rpn_loss_cls,输入的rpn_cls_scors_reshape和rpn_labels分别对应 与 , 参数隐含在 与 的caffe blob的大小中
-
对于rpn_loss_bbox,输入的rpn_bbox_pred和rpn_bbox_targets分别对应 与 ,rpn_bbox_inside_weigths对应 ,rpn_bbox_outside_weigths未用到(从soomth_L1_Loss layer代码中可以看到),而 同样隐含在caffe blob大小中
5.2 通过训练好的RPN网络收集proposals
-
将提取的proposals作为rois传入网络,如图21蓝框
-
计算bbox_inside_weights+bbox_outside_weights,作用与RPN一样,传入soomth_L1_loss layer,如图21绿框
Q&A
-
为什么Anchor坐标中有负数?
-
Anchor到底与网络输出如何对应?
-
为何有ROI Pooling还要把输入图片resize到固定大小的MxN
参考文献:http://www.telesens.co/2018/03/11/object-detection-and-classification-using-r-cnns/
<pre style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-right: 8px;margin-left: 8px;max-width: 100%;white-space: normal;color: rgb(0, 0, 0);font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 0.544px;text-align: center;widows: 1;line-height: 1.75em;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;box-sizing: border-box !important;overflow-wrap: break-word !important;"></span></strong></span></strong></section><section style="margin-right: 8px;margin-left: 8px;max-width: 100%;white-space: normal;color: rgb(0, 0, 0);font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 0.544px;text-align: center;widows: 1;line-height: 1.75em;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;box-sizing: border-box !important;overflow-wrap: break-word !important;">—</span></strong>完<strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;box-sizing: border-box !important;overflow-wrap: break-word !important;">—</span></strong></span></strong></span></strong></section><section style="max-width: 100%;white-space: normal;font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 0.544px;text-align: center;widows: 1;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section powered-by="xiumi.us" style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-top: 15px;margin-bottom: 25px;max-width: 100%;opacity: 0.8;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section powered-by="xiumi.us" style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-top: 15px;margin-bottom: 25px;max-width: 100%;opacity: 0.8;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-right: 8px;margin-bottom: 15px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;color: rgb(127, 127, 127);font-size: 12px;font-family: sans-serif;line-height: 25.5938px;letter-spacing: 3px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;color: rgb(0, 0, 0);box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;font-size: 16px;font-family: 微软雅黑;caret-color: red;box-sizing: border-box !important;overflow-wrap: break-word !important;">为您推荐</span></strong></span></section><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-size: 12px;font-family: sans-serif;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;">“12306”的架构到底有多牛逼?</span></p><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-size: 12px;font-family: sans-serif;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;">中国程序员34岁生日当天在美国遭抢笔记本电脑,追击歹徒被拖行后身亡,为什么会发生此类事件?</span></p><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-size: 12px;font-family: sans-serif;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;-webkit-tap-highlight-color: rgba(0, 0, 0, 0);cursor: pointer;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;">阿里如何抗住90秒100亿?</span><span style="max-width: 100%;-webkit-tap-highlight-color: rgba(0, 0, 0, 0);cursor: pointer;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;">看这篇你就明白了!</span></p><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;font-family: sans-serif;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;color: rgb(87, 107, 149);box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;">60个Chrome神器插件大收集:</span><span style="max-width: 100%;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;">助你快速成为老司机,一键分析网站技术栈</span></span><br style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /></p><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-size: 12px;font-family: sans-serif;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;-webkit-tap-highlight-color: rgba(0, 0, 0, 0);cursor: pointer;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;">深度学习必懂的13种概率分布</span></p></section></section></section></section></section></section></section></section>
本篇文章来源于: 深度学习这件小事
本文为原创文章,版权归知行编程网所有,欢迎分享本文,转载请保留出处!
内容反馈