分销赏收藏举报申诉 / 7

立即下载开通VIP

当前位置：首页 > 学术论文 > 其他 > 基于MTCNN算法的多人脸识别研究.pdf

基于MTCNN算法的多人脸识别研究.pdf

上传人：自信****多点

文档编号：4116919

上传时间：2024-07-30

格式：PDF

页数：7

大小：10.93MB

《基于MTCNN算法的多人脸识别研究.pdf》由会员分享，可在线阅读，更多相关《基于MTCNN算法的多人脸识别研究.pdf（7页珍藏版）》请在咨信网上搜索。

1、PRINTING AND DIGITAL MEDIA TECHNOLOGY STUDY Tol.229 No.2 2024.04印刷与数字媒体技术研究 2024年第2期（总第229期）RESEARCH PAPERS研究论文Research on Multi-Face Recognition Based on MTCNN AlgorithmYANG Wen-peng1,SI Zhan-jun1,2*(1.College of Artificial Intelligence,Tianjin University of Science and Technology,Tianjin 300457,Ch

2、ina;2.College of Light Industry Science and Engineering,Tianjin University of Science and Technology,Tianjin 300457,China)Abstract With the rapid development of artificial intelligence in the field of computer vision,more and more classical artificial intelligence algorithms are applied to multiple

3、face recognition research.Among them,the MTCNN algorithm performs well in multi-face recognition,but there is still a relatively large space for improvement in recognition accuracy.In this study,based on the classical MTCNN algorithm framework,the refinement of its sub-algorithm NMS algorithm was ev

4、aluated and improved.The performance differences between the NMS algorithm and the improved NMS algorithm were compared and theoretically analyzed in each cascade network of P-Net,R-Net,and O-Net.The improved algorithm was evaluated and identified in multiple ways combining subjective and objective

5、and horizontal comparison and longitudinal comparison.The results showed that the model designed in this paper achieves a face recognition accuracy of 94.56%on the LFW dataset.It can provide a reference for multi-face recognition.Key words Multi-face recognition;MTCNN algorithm;Algorithm optimizatio

6、n基于MTCNN算法的多人脸识别研究杨文鹏1，司占军1,2*（1.天津科技大学人工智能学院，天津 300457；2.天津科技大学轻工科学与工程学院，天津 300457）摘要随着人工智能在计算机视觉领域的飞速发展，越来越多的经典人工智能算法被应用于多人脸识别研究。其中，MTCNN算法在多人脸识别方面表现较好，但在识别精度上还有较大提升空间。本研究从经典的MTCNN算法框架出发，对其子算法NMS算法进行评估与改进，并对NMS算法与改进NMS算法在P-Net、R-Net、O-Net各个级联网络中的表现差异进行比较与理论分析。对改进后的NMS算法进行主观和客观相结合、横向比较与纵向比较相结合的多

7、种维度方式的评估与鉴别。实验结果表明，本研究设计的模型在数据集LFW上的人脸识别准确率为94.56%，可为多人脸识别研究提供参考。关键词多人脸识别；MTCNN算法；算法优化中图分类号 TP39文献标识码 A文章编号 2097-2474(2024)02-116-07DOI 10.19370/10-1886/ts.2024.02.013收稿日期：2023-06-25 修回日期：2023-08-31 *为通讯作者本文引用格式：YANG Wen-peng，SI Zhan-jun.Research on Multi-Face Recognition Based on MTCNN Algorithm J

8、.Printing and Digital Media Technology Study，2024，(2)：116-122.2024年2期印刷与数字媒体技术研究（拼版）.indd 1162024年2期印刷与数字媒体技术研究（拼版）.indd 1162024/4/26 17:08:082024/4/26 17:08:08117研究论文YANG Wen-peng et al:Research on Multi-Face Recognition Based on MTCNN Algorithm0 IntroductionFace recognition has been a direction of

9、 research in both academia and industry.Multi-face recognition faces more challenges,such as overlapping due to dense faces,variable size and resolution of multi-faces,and different poses of multi-faces 1-3.Rowley 4-5proposed a solution,which generated a binary classification model by training a neu

10、ral network to detect whether the image contains a face.Viola-Jones 6 proposed a component-based face recognition algorithm named DPM(Deformable Parts Model),which was outstanding in solving the task of faces with complex poses.Influenced by this,subsequent works 7-8 focused on combining multiple mo

11、dels to obtain diverse features to improve the performance of face recognition.However,these face recognition algorithms all train classifiers with the help of a set of manually labeled features,relying on the local feature extraction of the face,so they still cannot deal with multi-face recognition

12、 in complex scenes.In recent years,deep learning has been widely used in face recognition,due to its superior performance.Yan 9 proposed AlexNet as a five-layer convolutional,three-layer fully-connected network,which achieved good recognition results in related competitions.Krizhevsky 10-11 based on

13、 the Viola-Jones method proposed a idea of cascading to train CNNs,which made the overall model have the strong discriminative ability and recognition performance.Subsequently,more efficient target recognition networks R-CNN 12 series were proposed,including Fast R-CNN 13 and Faster R-CNN 6,which fu

14、rther improved the target recognition performance by extracting the feature vectors of candidate regions through the CNN network.With the public availability of the WIDER FACE dataset,a large number of multi-face scenarios appear,and conventional target recognition networks often fail to achieve goo

15、d recognition results.Therefore,more networks optimized for face recognition networks have been proposed,focusing on the problems of small-sized faces and multi-angle faces brought by multi-face scenarios.Girshick R et al 14 proposed Densebox,which cleverly used the full convolutional network,obtain

16、ed the results of predicting the target position coordinates and the target category simultaneously,and employed a multi-scale fusion strategy to provide a good recognition effect for small-sized faces.MTCNN(Multi-task Convolutional Neural Network)is a face recognition algorithm that utilizes a mult

17、i-task convolutional neural network.It works by using a cascade approach with a range of image pyramids to detect faces of varying sizes.It also incorporates three sub-networks to form a deep convolutional network that predicts faces and the locations of their features from coarse to fine.MTCNN algo

18、rithm is a popular face detection and alignment algorithm known for its high accuracy and efficiency.But,MTCNN performance may degrade when dealing with low-resolution or noisy images,as well as in challenging lighting conditions or occlusions.The algorithm may struggle to accurately detect faces in

19、 such scenarios.Also,MTCNN may produce false positive detections,where non-face regions are incorrectly identified as faces.This can impact the overall accuracy of the algorithm.In this study,aiming at the shortcomings of MTCNN,some algorithms within MTCNN were optimized for overall enhancement,and

20、the rationality and effectiveness of the optimization were demonstrated through experiments.The algorithm of this study can provide a reference for multi-face recognition.1 Research Method1.1 MTCNN AlgorithmMTCNN is a multi-task neural network model for 2024年2期印刷与数字媒体技术研究（拼版）.indd 1172024年2期印刷与数字媒体技

21、术研究（拼版）.indd 1172024/4/26 17:08:082024/4/26 17:08:08118印刷与数字媒体技术研究2024年第2期（总第229期）face recognition tasks proposed by the Shenzhen Research Institute of Chinese Academy of Sciences in 2016,which mainly employs three cascaded networks for fast and efficient face recognition using the idea of a candida

22、te box plus classifier.Its sub-algorithm is NMS(Non-Maximum Suppression).These three cascaded networks are P-Net for fast candidate window generation,R-Net for high-precision candidate window filter selection,and O-Net for generating the final bounding box with the key points of the face.The model a

23、lso uses techniques such as image pyramid,border regression,and non-maximal suppression for dealing with image problems.The technology roadmap in this study was shown in Fig.1.ImagepyramidNormaliz-ationPhotoswithfacesPhotos with facesand locationsSoft-NMSP-netSoft-NMSImproved-MTCNNR-netSoft-NMSO-net

24、Fig.1 Technology flow chart图1 技术流程图The full name of P-Net is Proposal Network and its basic construction is a fully convolutional network.For the image pyramid constructed in the previous step(images in MTCNN should all be normalized first with image pyramid operations),the FCN(Fully Convolutional N

25、etworks)was used for preliminary feature extraction and border calibration,and the Bounding-Box Regression was used to adjust the window and NMS was used to filter most of the windows.P-Net was a region proposal network for face region,which used a face classifier to determine whether the region was

26、 a face after three convolutional layers of feature input results.Whether the region was a face or not,while using edge regression and a locator of facial key points for the initial proposal of face regions,this part would eventually output many sheets of face regions that may have faces and feed th

27、ese regions into the R-Net for further processing.The loss function Lidet of P-Net was shown in Formula(1).()()()()()log11logdetdetdetiiiiiLypyp=+(1)where pi is the probability of occurrence of the face,and yidet is the true labeling of the region.The full name of R-Net is Refine Network,its basic c

28、onstruction is a convolutional neural network,compared with the first layer of P-Net,a fully connected layer is added,so the screening of input data will be more strict.After the image passed through the P-Net,all the prediction windows into the R-Net were fed,this network would filter out a large n

29、umber of candidate boxes with poorer results and finally performed Bounding-Box Regression and NMS on the selected candidate boxes to further optimize the prediction results.Because the output of P-Net was only possible face regions with some confidence,in this network,the inputs would be selected w

30、ith refinement and most of the wrong inputs would be discarded,and Bounding-Box Regression and Facial Keypoint Locator would be used again for Bounding-Box Regression and Keypoint Localizer for the face regions,and finally the more credible face regions would be outputted for the use of O-Net.Compar

31、ed with the 1132 features output by P-Net using full convolution,R-Net used a 128 fully connected layer after the last convolutional layer,which retained more image features,and the accuracy performance was also better than that of P-Net.The loss function Libox of R-net was shown in Formula(2).(2)Wh

32、ere is the border coordinates obtained by network prediction,and yibox is the actual border coordinates(a quaternion(Xleft,Yleft,Width,Height)representing a rectangular region).O-Net is known as Output Network.The basic 2024年2期印刷与数字媒体技术研究（拼版）.indd 1182024年2期印刷与数字媒体技术研究（拼版）.indd 1182024/4/26 17:08:09

33、2024/4/26 17:08:09119研究论文YANG Wen-peng et al:Research on Multi-Face Recognition Based on MTCNN Algorithmstructure is a more complex convolutional neural network with one more convolutional layer as compared to R-Net.The difference between the effect of O-Net and that of R-Net is that the structure o

34、f this layer will recognize the region of the face through more supervision and will regress the facial feature points of a person to finally output five facial feature points of the face.O-Net has more input features,and the end of the network structure is also a larger 256 fully connected layer,wh

35、ich retains more image features,and at the same time performs face recognition,face region border regression,and face feature localization.Finally,the upper-left and lower-right coordinates of the face region with the five facial feature points are output.O-Net has features with more inputs and more

36、 complex network structure,and also has better performance.The output of this layer is used as the final network model output.The loss function of O-net was shown in Formula(3).(3)where is the prediction result,and is the actual key point location.Since a total of 5 human face key points need to be

37、predicted,with 2 coordinate values for each point,so is a 10-tuple.1.2 Improved MTCNN Algorithm In MTCNN algorithm,its sub-algorithm NMS(Non-Maximum Suppression)algorithm is typically used to address the issue of overlapping Bounding-Boxes in tasks such as object detection or bounding box Regression

38、.Specifically,in object detection tasks,the model generates multiple candidate boxes to represent regions where objects may exist.However,these candidate boxes may overlap,leading to problems such as duplicate detections or multiple detections of the same object.The role of NMS is to select the most

39、 representative candidate box from all candidates and filter out other candidate boxes that have high overlap with this most representative box.This is done to ensure that the final detection results are highly accurate,reduce redundant detections,and improve detection efficiency and precision.In MT

40、CNN,NMS is typically applied to the candidate boxes output from the final layer of the network.The candidate boxes generated by the model come with their respective confidence scores,and NMS filters the candidate boxes based on these scores,selecting the most suitable bounding boxes as the final det

41、ection results while eliminating redundant candidate boxes.The role of traditional NMS algorithm was shown in Fig.2.0.80.750.90.9NMSOriginalFig.2 Role of traditional NMS algorithms图2 传统NMS算法的作用The NMS algorithm first generates a series of recognition frames in the detected image B and the correspond

42、ing scores S.The recognition frame M with the largest score is removed from set B and placed in the final result set D.At the same time,any recognition frames in set B whose overlap with recognition frame M is greater than the overlap threshold Nt are also removed.The biggest problem in the NMS algo

43、rithm is that it forces the scores of all neighboring recognition frames to zero.In this case,if a real object is present in the overlapping region,it will fail to detect that object and reduce the average recognition rate of the algorithm.In order to address the issue of NMS algorithm,Soft NMS algo

44、rithm was introduced into MTCNN network in this study.Although it is also a greedy algorithm,compared to NMS algorithm,it employs a gentler candidate box-handling approach.The pseudo-code of NMS algorithm and Soft NMS algorithm was shown in Fig.3.2024年2期印刷与数字媒体技术研究（拼版）.indd 1192024年2期印刷与数字媒体技术研究（拼版）

45、.indd 1192024/4/26 17:08:102024/4/26 17:08:10120印刷与数字媒体技术研究2024年第2期（总第229期）beginInput：B=b1,bN,S=s1,sN,NtDmargmax SMbmDDM；BB-Mfor bi in B dosisi f(iou(M,bi)iou(M,bi)Nt thenBB-bi；SS-siifendwhile Bempty doB is the list of initial detection boxesS contains corresponding detection scoresNt is the NMS thr

46、esholdendendendNMSSoft-NMSreturn D,SFig.3 Pseudo-code of NMS algorithm and Soft NMS algorithm图3 NMS算法与Soft NMS算法的伪代码NMS algorithm differs from Soft NMS both in terms of prediction frames and confidence scores.NMS took the box with the largest score and IOUs it with other boxes of the same category i

47、n the current region,and IOUthres deleted it,otherwise retained it.Soft NMS didnt simply delete and retain by comparing IOUs with thres,but rather by confidence score filtering and retention.For confidence,NMS directly set the factor to zero for factors larger than confidence,otherwise retained it.S

48、oft NMS scaled and weighted the scores,and then picked an appropriate threshold,left pre-checked boxes larger than the threshold,and deleted those smaller than that threshold to complete the algorithmic task.The weighting of the scores for NMS was shown in Formula(4),and the linear weighting and Gau

49、ssian weighting for Soft NMS were shown in Formula(5)and Formula(6).(4)(5)(6)where si is the current processing box,and M is the current highest scoring box,and bi is the box to be processed,and IOU is the confidence score of the particular box.2 Results and Discussion2.1 DatasetWIDER FACE was selec

50、ted in this study as the face recognition dataset.WIDER FACE15 is the most commonly used open-source face benchmark dataset in face recognition research,which involves 61 event categories.For each event category,training data accounts for 40%,validation data accounts for 10%,and test data accounts f

下载提示：咨信网仅提供存储空间/不修改/不编辑

【自信AI创作助手】【自信AI导航】
1、请仔细预览页面，基本判断完整性，对于直接下载带来的问题请及时与客服沟通；下载的文档，不会出现我们的网址水印。
2、该文档所得收入（下载+内容+预览）归上传者、原创作者；如果您是本文档原作者，请点此认领！既往收益都归您。

同意并开始全文预览

举报此文档有问题？有机会获“体验VIP”奖励！

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 金币 0人已下载

申诉本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请申请举报、认领或删除 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 基于 MTCNN 算法多人脸识别研究

咨信网温馨提示：
1、咨信平台为文档C2C交易模式，即用户上传的文档直接被用户下载，收益归上传人（含作者）所有；本站仅是提供信息存储空间和展示预览，仅对用户上传内容的表现方式做保护处理，对上载内容不做任何修改或编辑。所展示的作品文档包括内容和图片全部来源于网络用户和作者上传投稿，我们不确定上传用户享有完全著作权，根据《信息网络传播权保护条例》，如果侵犯了您的版权、权益或隐私，请联系我们，核实后会尽快下架及时删除，并可随时和客服了解处理情况，尊重保护知识产权我们共同努力。
2、文档的总页数、文档格式和文档大小以系统显示为准(内容中显示的页数不一定正确)，网站客服只以系统显示的页数、文件格式、文档大小作为仲裁依据，平台无法对文档的真实性、完整性、权威性、准确性、专业性及其观点立场做任何保证或承诺，下载前须认真查看，确认无误后再购买，务必慎重购买；若有违法违纪将进行移交司法处理，若涉侵权平台将进行基本处罚并下架。
3、本站所有内容均由用户上传，付费前请自行鉴别，如您付费，意味着您已接受本站规则且自行承担风险，本站不进行额外附加服务，虚拟产品一经售出概不退款（未进行购买下载可退充值款），文档一经付费（服务费）、不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
4、如你看到网页展示的文档有www.zixin.com.cn水印，是因预览和防盗链等技术需要对页面进行转换压缩成图而已，我们并不对上传的文档进行任何编辑或修改，文档下载后都不会有水印标识（原文档上传前个别存留的除外），下载后原文更清晰；试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓；PPT和DOC文档可被视为“模板”，允许上传人保留章节、目录结构的情况下删减部份的内容；PDF文档不管是原文档转换或图片扫描而得，本站不作要求视为允许，下载前自行私信或留言给上传者【自信****多点】。
5、本文档所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用；网站提供的党政主题相关内容(国旗、国徽、党徽－－等)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
6、文档遇到问题，请及时私信或留言给本站上传会员【自信****多点】，需本站解决可联系【微信客服】、【 QQ客服】，若有其他问题请点击或扫码反馈【服务填表】；文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“【版权申诉】”（推荐），意见反馈和侵权处理邮箱：1219186828@qq.com；也可以拔打客服电话：4008-655-100；投诉/维权电话：4009-655-100。