在好例子网,分享、交流、成长!
您当前所在位置:首页Others 开发实例一般编程问题 → sift算法fpga实现

sift算法fpga实现

一般编程问题

下载此实例
  • 开发语言:Others
  • 实例大小:0.72M
  • 下载次数:14
  • 浏览次数:135
  • 发布时间:2020-06-05
  • 实例类别:一般编程问题
  • 发 布 人:robot666
  • 文件格式:.pdf
  • 所需积分:2
 

实例介绍

【实例简介】
sift算法fpga实现,英文版,描述很详尽,很好的参考文档
and 320x240 pixels, respectively. The test has shown that The feature points are chosen from the local maxima or there is only moderate performance degradation for such minima in the DoG space. Each pixel in D(x, y, o will be simplification, but with a significant reduction of hardware compared with its 26 neighbour pixels, of which 8 pixels resource located in current scale image and others located in the scale It can be seen from the original SIFT algorithm that the above and below. As shown in Fig. 2, the candidate pixel in process: Firstly, all the scale images of the first octave are candidate pixel will be considered to be a feature point and its generated from the same input image directly. Secondly, all coordinate and scale will be recorded if the candidate pixel the scale images of the following octaves are generated from value is larger than all those 26 pixel values or smaller than the last scale image of the previous octave. Therefore, it is not them quite straight forward to use parallel computing to speed up the scale space construction in the original SiFT algorithIn. In order to address this, another simplification for the scale space DoG2 construction is proposed in our optimised sift algorithm,in which all the scale images of all the octaves are generated from the same input image directly. In our case, an initial size DoG1 of the gaussian kernel g=1.1 is chosen for the first scale of both octaves and an g of 1.3.1. 6 and 2.0 has been used for the other 3 scales of both octaves, respectively One more point needs to be mentioned is that the gaussian kernel has been scaled up by 1000 times for the purpose of the Fig. 2 The Detection of the local maxima and minima high precision for Gaussian pyramid construction In our optimised sift algorithm, the gaussian kernels have B. DoG Space Construction and Feature Point ldentification been re-presented in fixed-point values rather than floating With the Gaussian pyramid constructed above, the Dog point for the purpose of hardware implementation efficiency (Difference-of-Gaussian) space can be built up by applying As a consequence, the inaccuracy introduced from this he subtraction to two adjacent scale images pixel by pixel, as simplitication enables a number of unstable teature points to shown in(3)and Fig. 1. For the purpose of simplification and be extracted, which lead to unnecessary computation burden the improvement of robustness explained later, the accuracy and also degrade the matching performance in the end. In of DoG is reduced in the proposed optimised SIFT algorithm order to tackle the problem, the refining process is introduced by scaling down 1024 times with only integer values in the original SIFT algorithm to eliminate the feature points preserved. The key reason on this is that it is found from our located on the edge or with lower contrast. Although the same investigation that scale-down of 1024 times is a good technique could be adopted in our optimised SiFt algorithm, compromise between memory consumption and the detection we eliminate these unstable feature points by reducing the performance. Another obvious reason is that it can be easily accuracy of DoG space as shown in equation (3),which implenented in hardware by using shift operation further reduces the computational complexity. More (G(, y, ko)-G(x,v, o))I(, v) specifically, the feature points with low contrast have similar value to non-feature points around the local region, and the 1024 (3) feature points located on the edge have the similar value to (L(x, y, ko)-L(, v,o) non-feature points along the edge, so reducing accuracy is 1024 equivalent to eliminating the feature points with low contrast and higher edge response by truncation. We have tested such 12 DOG optimization, which supports our arguments above. Scale3 C. Gradient-Orientation Histogram Generation Scale2//> The gradient-orientation histogram [4 is adopted to Octave 1 describe a feature point, which is computed from the gradient DOGO Scale 0 magnitude and orientation of the neighbour pixels around the didate feature point. For a given pixel (x, y), the gradient magnitude and orientation are computed from equation(4) Scale3 m(x,y)=√(L(x+1,y-L(x-1,y)2+(L(x,y+1)-L(x,y-1) (4) 6(x,y)=tan-(L(x,y+1)-L(x,y-1)/( The gradient orientation of el will be normalized te Octave 0 DoGO tht direct Scaled gradient-orientation histogram of a circular local region as shown in Fig. 3(a), the direction of a particular arrow as Fig. I. The Construction of Dog sents the orientation value and the length of the arrow represents the accumulation of the entire local region of the feature point, which is required for gradient magnitude of all the pixels with the same orientation. the original Sift algorithm. Therefore, a great reduction in The histogram is usually represented in 2D coordinate as computational complexity has been achieved shown in Fig 3(c) Fig. 4 shows the process of generating a feature descriptor with our approach. Firstly, with polar arrangement, total 9 circular sub-regions of a feature point as shown in Fig. 4(a) are identified, in which the cross in the central sub-region indicates the location of the feature point. Each circular sub L region has a diameter of 20 pixels and the overlapping of sub regions is designed for robustness. Secondly, a gradient orientation histogram of all 9 circular sub-regions is built up and then the principle orientation can be identified. Assume Fig 3 Gradient-Orientation Histogra that the principle orientation is of the up-left direction pointed by the arrow with the broken line in Fig 4(a). Each of the 9 The square-root operation in(4)has been replaced with the circular sub-regions is then assigned a number from 1 to 9 absolute operation to reduce computational complexity, so we starting from the circular sub-region pointed by the principle have(5 orientation, following up with others in a clockwise fashion 1(x+1,y)-1(x-1,y)+1(x,y+1)-1(x,y- and ending up with the sub-region in the centre. Thirdly m(x, y) 1024 gradient-orientation histogram of each circular sub-region is generated and then it is re-ordered with the bin pointed by the In our optimised algorithm, all gradient magnitudes are principle orientation in the first place in the clockwise fashion divided by 1024 whereas the orientation value of range 0 to Finally, all re-ordered histograms are packed together with the 360 is normalized to an integral number from 0 to 35 order of the circular sub-region number as shown in Fig. 4(b) The grader be computed in advance before generating feature descriptors approac magnitude and orientation of each pixel are computed only ly, the gi 9 computed in parallel with the Dog construction because these processes have no data dependency from each other. 5 6 D. Feature Descriptor Generation In the stage of feature descriptor generation, the pixels Poar Arrangement Arrangement with Princpal Orientation within the local region of a feature point are transferred to a descriptor principal orientation needs to be identified first. The principal orientation is the gradient orientation corresponding to the maximum gradient magnitude in the gradient Histogram Represented orientation histogram for the entire local region of the feature in 2D Coordinate Gradient-orientation Histogram point. With the principal orientation, the raw pixels or the (a) Computation of the Histogram of Circular Sub-region 5 Sub-reg one spatial arrangement is rotated to achieve rotation invariance It can be seen from 4 that the local region of the involved of Subregions Teature point is segmented to a number of square sub-regions such as 2x2, 4x4 or etc. The gradient-orientation histogram of each sub-region are computed independently, which are finally linked together to generate a descriptor for the feature point. However, the arrangement of segmentation affects the Descriptor performance of the descriptor, we take the advantage of the Simon's polar sampled spatial arrangement 9 which improves the robustness on rotation with relatively lower computational complexity and lower descriptor dimension (b)Packing the Histogram of Circular Sub-regions to Descriptor The improvement on rotation is very attractive, which enables us to rotate the arrangement with only a small number of Fig 4 The Process of generating a Feature Descriptor arithmetic operations without much loss in accuracy due to its In our implementation, the gradient-orientation histogram isotropy characteristic. Finally, the final feature descriptor can of each circular sub-region has 8 bins covering the 360 be built up only by re-ordering both the sub-region and the degrees of orientation and there are 9 sub-regions in lotal, so bins of each sub-region rather than rotating all pixels in the the dimension of a feature descriptor is 72, which is significantly less than 128 from the original SiFT algorithm. Considering the clock frequency of 100MHz, in order for In order to further improve the performance, in terms of the real-time processing, which is considered to be 30 frames robustness, the gradient magnitude of a pixel is weighted with per second or 33 milliseconds per frame, to be achieved for a Gaussian function before accumulated to the corresponding the sift feature detection on the target images of 640x480 bin. In other words, the farther from the centre of the circular pixels, we have developed an image partition method, which sub-region has the lower contribution to the histogram enables parallel and pipeline processing and hence it maximises the throughput of the SIFT feature detection E. image matching The parallel and pipeline processing is shown in Fig.6 The matching of the descriptor is based on the nearest There are three-stage pipeline as follows: In stage one, read neigh hich is defined the minimum Euclidean D ds the first partition of the inage distance in terms of feature if the euclide vectors is less than the predefined threshold, we assume the GradOrien components operate in parallel because there is no two feature points are matched data dependency between them. The former works out the DoG space and identifies the feature points, whereas the latter IIL THE FPGA IMPLEMENTATION OF THE OPTIMISED SIFT works out the gradients and orientations for all the pixels FEATURE DETECTION TOR AN IMAGE MATCIIER These three key modules are described in the following It can be seen from Section iI that the optimised SIFT algorithm for an image matcher consists of five stages: 1)4. Read Data Component Gaussian pyramid construction. 2) DoG space construction We are going to present the image partition method here and feature identification. 3) Gradient and orientation The parallel processing can be easily achieved in the internal histogram generation. 4)Feature descriptor generation. 5) architecture of Gaussian Smooth, Feature Detect and Image matching. Considering the nature of Xilinx FPGa Grad Orien components to increase the processing throughput embedded system, a top level system partition which is similar to the required level. so Read Data component to [6] has been adopted for the FPGa implementation. more bottleneck of the entire Sift feature detection. In other words, specifically, the first three stages are implemented as a the overall processing speed depends on how fast Read Data hardware core named as sift feature detection module, component can input the image data to Gaussian Smooth whereas the last two stages are considered to be implemented component partition by partition. We need to choose the as a software module named as Sift feature generation and partition size carefully to meet the throughput requirement image matching module using Xilinx Micro Blaze software which is shown in(6) processor [10]. In this section, we are going to present the optimised SIFT feature detection module because the software ImageSize module for SIFT feature generation and image matching is out x TimeConsumedper Partition s 33ms Partition si of the scope in this paper. Data in Where Image size is the total valid size of the input images SIFT feature involved, which obviously includes the original image of detection Image Buffer 640x480 pixels and the downsampled image of 320x 240 p Read data mage Size=(640-F)×(480-F)+(320-F)×(240-F F is the height and width of the boundary region which h Gaussian smooth invalid data caused by the nature of the two dimensions Gaussian Filters. In our case, F is equal to 14 because the kernel is of size 15*15 Partition Size is the size of the image partition, which is read into the SiFT feature detection component in a single Memory Control stage with the pipeline fashion as shown in Fig. 6 Data Out Partition Size=x *y x is the height of the partition, y is the width of the partition Fig. 5. A Block Diagram of the SIFT Feature Detection Module, In terms of Time ConsumedperPartition is the number of clock cycles Data Flow consumed to input a single image partition, which can be As shown in Fig. 5, the SIFT feature detection module is calculated as shown below composed of six components, of which Gaussian Smooth, Time Cons umedperPartition=(x+ F+CIx(y+F+C2)+C3(9) Feature Detection and Gradient Orientation components Cl, C2 and C3, which are the numbers of the clock cycles correspond to the first three stages described in Section Il, required by relevant control signals, are 2, 5 and 2 respectively respectively ine Partition l Gaussian ead Data Feature Detect Smooth Gradien Partition 2 Read dala Gaussian Fcaturc Detect Smooth Graorien Partition 3 Read Data Gaussian Feature Detec Smooth GraOricn Fig. 6. The Pipeline and Parallel architecture The overall time requirement for the different partition G sizes can be calculated from equation(6), which is shown in Gaussian TABLE I Address Smooth Scale Buller TABLE I Control Gaussian DIFFERENT TIME CONSUMPTION ACCORDING TO DIFFERENT PARTITION Smooth 2 aleI Buff Gaussian Time(ms) Buffer Smooth 3 MUX 3243161 russian cale? Buff 31.75674 Gaussian 3148608 31.36150 Smoothe 30.71708 Fig. 7. The Block Diagram of Gaussian Smooth Component 15 30.07267 It can be seen from Fig. 8 that the Gaussian Smooth unit 13 2998014 mainly consists of filters and accumulators. Four scales are 11 8 2997698 calculated in parallel to improve the system performance Finally, scaled results are outputted to four Fifo buffe 29.60791 a size of 128xl8 bits each, which is large enough for one 2927073 image partition of 7x 12 pixels with a pixel value resolution of 13 29.26632 i8 bits. Moreover. with the re-use of arithmetic units and the 10 29.15044 conversion from floating-point to fixed-point calculation,the The partition size of 7x12 pixels is chosen since some extra hardware resource usages are reduced significantl time budget is required for data interfacing for our implementation Gaussian Smooth B. Gaussian Smooth Component filter Ycs Calco Considering the symmetrical nature of two-dimension Gaussian filter, a parallel architecture has been adopted in the Filter FinishYesScalel Gaussian Smooth Component as shown in Fig. 7. Basically, an image partition of 7x12 pixels arrives from Read Data Filter Finish x Ycs Scale2→ Component and then routed into seven Gaussian Smooth units via an Image Buffer under the control of Address Control Filter3 Finish、Ye Therefore, seven rows of the image partition can be processed in paralle Fig, 8. Architecture of gaussian Smooth Uni C. Feature Detect and GradOrien Components As shown in Fig. 2, the DoG scale images can be produced by simple subtraction pixel by pixel and then a feature point can be detected by comparing it with its 26 neighbours based 35 on a high parallel structure, which is able to complete 28 pairs as shown in(11) is the ratio of the total number of the falsely of comparison within one clock cycle. The coordinates of the matched points to the sum of the number of the correctly feature points detected will be outputted to four FIFOs, which matched points and the number of the falsely matched points are created to store the x and y coordinates from octaves o and t false matches 1, respectively. precislon The gradient and orientation for each pixel in Scale linage correct_ matches+# false_ matches (11) can be calculated in the GradOrien Component (Fig. 10)in parallel with the feature detect because there is no data dependency in between. a simplified look-up table is implemented for inverse tangent calculation. An external memory control unit is built up to store the gradient and orientation to an external sram of 256Kx32 bits GradOn SRAM FIFO ) calel Buffer Octave x Subtrac DoC calel Butter Octave DoGI Deteet Octave x Fig 9. Feature Detect and Gradient-orientation Calculation Fig. 11. Images Under Test GradIen Two sets of images under test are shown in Fig. 11(a)and (b), respeclively. With these images, the image matching Magnitude Extemal behaviour as shown in Fig. 12 has been oblained based on the Pre-process Memory SRAM software model of the optimised image matcher In Fig. Look Up Control ach line between two images indicates a pai sponding feature points For the purpose of comparison, the corresponding software models of the original Sift [ 4, SURF [5] and polar sampling Fig. 10. Architecture of GradOrien Cuimponlent descriptor [9 have also been implemented. The matching IV TEST AND VERIFICATION performance for each of these models are also obtained and presented with the curves of recall versus I-precision as Test and verification has been conducted on both the shown in Fig. 13 by varying the matching threshold. It can be software model of the optimised SiFT algorithm for an image seen that the performance based on our optimised sift is matcher and the FPGA implementation of the optimised sIFT more or less the same as that from SuRF [5] even though it is feature detection for an image matcher lower than the original SiFl 4A. Software Model of the Image Matcher Based on the Optimised sift algorithm a software model of the entire image matcher based on the optimised SiFT algorithm has been built up on Microsoft Visual C++ 2005 Express Edition The standard evaluation [3 has been adopted to assess the matching performance, which is presented as a curve of recall versus 1-precision Given two images representing the same scene, the recall as hown in (10) is the ratio of the number of the correctly matched points to the number of corresponding matched points correct matches recall= (10 correspondence The number of corresponding points is determined by the overlapping of the points in different images. The I-precision Fig 12. Image Matching behavioR ORiginal sift existing systems. It can be seen from the matching performance comparison that our optimized SIfT algorithm 0.9 K- Polar has achieved a similar level to surf. which is one of the well H SURF recognized image matching algorithms 0.7 .Optimised SIFT ⅤI. FURTHER WORK 0.6 80.5 The key further work will be: 1) The software module named as sift feature generation and image matching module will be implemented based on Xilinx MicroBlaze soft processor with a custom parallel processor to achieve a similar 0.2 processing speed the feature detection module has offered and hence the overall real-time processing speed can be achieved 2)Further optimisation will be conducted to improve the matching performance close to that from the original SIFT 0.2 0.4 0.6 0.8 1.2 1-Precision algorithm without significantly increasing hardware complexity Fig 13. Image Matching Performance Comparison ACKNOWLEDGMENT B. FPGA implementation of the optimised SIFT Feature Detection for an Image Matcher This work was supported by the partnership grant from Innovation China UK The Feature detection module of the optimised sift algorithm has been implemented on Xilinx ML507 REFERENCES development board. The test on this module with the same [1] B. D. Lucas and T Kanade, An iterative image registration technique images in Fig. 1 l has shown that the matching performance ith an application to stereo vision, "in Proc. I. CA787,1981, pp 674- based on this hardware module is identical to that from the 679 corresponding software model and hence confirmed the 2E. De Castro and C morandi, Registration of Translated and rotated Images using Finite Fourier Transform, IEEE Trans on Pattern correctness of the FPGA implementation. Furthermore, the Analysis and Machine Intelligence, vol 9(5), pp 700-703, 1987 test has also shown that the feature detection module only [] k. mikolajczyk and C. Schmid,A performance evaluation of local requires 3 1 ms to process a typical image of 640x480 pixels descriptors "IEEE Transactions on Pattern Analysis and machine Intelligence, vol 27, pp. 1615-1630, 2005 However, 60 ms is required to detect features from an image [41 D.G. Lowe,"Distinctive Image Features from Scale-Invariant of the same size in [11. 33 ms is required for an image of Keypoints, in Proc. 1CV04, 2004, vol. 20(2), pp 91-110 320x240 pixels in [6]. Therefore, our feature detection module [5] I Bay, T. Tuytelaars, and L. Van GooL,"SURF: Speeded up robust offers the fastest processing speed Regarding FPGa resource usage, it is not easy to compare 6V. Bonato, E. Marques, and G. A. Constantinides, "A Parallel Hardware Architecture for Scale and rotation Invariant Feature between our feature detection module and the module from [6] Detection, IEEE Transactions on Circuits and Systems for video because of a different FPGa lechnology used. However, it can Technology, vol. 18, NO. 12, Dcc. 2008 be seen from our FPGA implementation that our hardware [7 D. Kim, K. Kim, J.-Y. Kim, S. Lee, and Il.-J. Yoo,"An 81.6 gops module consumes 35, 889 LUTs, 19, 529 Registers, Block ohject recognition processor hased on noc and visual image processing Inerlory,in Proc. IEEE Custom Integrated Circuits Conf, San Jose RAM of3, 240 Kbits and 97 DSP48E, but che module from 6 CA,2007,pp.443-446 has used 43, 366 LUTs, 19, 100 Registers, Block RAM of [8] Alan J. Laub, "Matrix Analysis for Scientists and Engineers, " ISBN-13 1,350 Kbits and 64 dsp blocks. There fore. our feature 15767,pp65-7 [9 SWinder and M. Brown, "Learning Local Image Descriptors, in Proc detection module has used a similar level of hardware CPR07,2007,pp1-8 resources to that in 6 except that our module requires [10] Xilinx, "Virtex-5 FXT PuwerPC 440 and MicroBlaze Embedded Kit significantly more block RAMs because we need more block Reference Systems, Jan 26, 2009 RAMs to buffer four times larger image than that from o [11] S. H.K. Ng, P. Jasiobedzki, and T.J. Moyung, "Vision based modelling and localization for planetary exploration rovers, in Proc. V. CONCLUSIONS JAC, Oct 2004. pp. 1l-ll An entire optimised SifT algorithIm has been proposed for FPGa based real-time image malching applications in thi paper. Not only has the corresponding software model been built up, but the key part of the optimized sifT algorithm, which is the sift feature detection module. has also been implemented on a Xilinx Virtex-5 FPGa. It has been confirmed from the standard test that our sift feature detection module is able to extract features from a typical image of 640x480 pixels with 31 ms, which is faster than any 37 【实例截图】
【核心代码】

标签:

实例下载地址

sift算法fpga实现

不能下载?内容有错? 点击这里报错 + 投诉 + 提问

好例子网口号:伸出你的我的手 — 分享

网友评论

发表评论

(您的评论需要经过审核才能显示)

查看所有0条评论>>

小贴士

感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。

  • 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
  • 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
  • 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
  • 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。

关于好例子网

本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明

;
报警