应用科学学报 ›› 2012, Vol. 30 ›› Issue (3): 299-305.doi: 10.3969/j.issn.0255-8297.2012.03.014

• 论文 • 上一篇    下一篇

粗粒度部分动态可重构的人脸检测

肖建1;3, 刘波1, 梅晨1, 朱敏2, 杨军1, 刘雷波2, 魏少军2   

  1. 1. 东南大学国家专用集成电路系统工程技术研究中心,南京210096
    2. 清华大学信息科学技术学院,北京100084
    3. 南京邮电大学电子科学与工程学院,南京210046
  • 收稿日期:2011-06-08 修回日期:2011-08-02 出版日期:2012-05-30 发布日期:2012-05-30
  • 作者简介:肖建,博士生,研究方向:可重构计算技术,E-mail: xiaoj@seu.edu.cn;杨军,研究员,博导,研究方向:SoC 系统芯片设计、GPS基带、可重构处理器,E-mail: dragon@seu.edu.cn;魏少军,教授,博导,研究方向:超大规模集成电路设计方法学和通信专用集成电路设计,E-mail: wsj@public3.bta.net.cn
  • 基金资助:

    国家“863”高技术研究发展计划基金(No.2009AA011700);江苏省高校“青蓝工程”项目基金资助

Dynamical Coarse-Grained Partially Reconfigurable Face Detection

XIAO Jian1;3, LIU Bo1, MEI Chen1, ZHU Min2, YANG Jun1, LIU Lei-bo2, WEI Shao-jun2   

  1. 1. National ASIC System Engineering Research Center, Southeast University, Nanjing 210096, China
    2. School of Information Science and Technology, Tsinghua University, Beijing 100084, China
    3. College of Electronic Science and Engineering, Nanjing University of Posts and Telecommunications,
    Nanjing 210046, China
  • Received:2011-06-08 Revised:2011-08-02 Online:2012-05-30 Published:2012-05-30

摘要:

人脸检测系统应用在嵌入式环境中需满足多种约束,高计算密集性、控制密集性是实时实现困难的主要原因. 文中提出一种基于名为“REMUS-II”的粗粒度动态可重构架构的人脸检测系统,把层叠型AdaBoost检测算法划分成多个非连续子任务,通过邮箱通信调度、配置流和数据流优化方法来提高指令级并行度和任务级并行度.实验结果表明,检测分辨率为640£480 的图片可获得17 帧/s 的平均检测速度,正面人脸检测率保持在95% 以上.在TSMC 65 nm CMOS工艺、200 MHz工作频率下,REMUS-II 面积约为24 mm2,功率约为194 mW.

关键词: 粗粒度可重构, 动态, 人脸检测, AdaBoost

Abstract:

Face detection system needs to meet a variety of constraints in embedded environments, but the high computational/control intensive features make the real-time implementation difficult. This paper presents a face detection system based on a dynamical coarse-grained partially reconfigurable platform called “REMUSII”. The cascade AdaBoost-based detection algorithm is divided into several non-consecutive sub-tasks. Mailbox
scheduling, configuration flow and data flow optimization methods improve the instruction-level and task-level parallelism. Experiment results show that this approach with a 200 MHz clock can process about 17 frames per second on 640£480 images. Its detection rate is over 95%. The system consumes about 194 mW, and its area is about 24 mm2 in TSMC’s 65 nm logic process.

Key words: coarse-grained reconfigurable, dynamical, face detection, AdaBoost

中图分类号: