实例介绍
surf算法是图像识别,该算法能够 抵抗平移、缩放、旋转等变形
Lomax X 0 H X. 0 X. O Here laz(x, a )refers to the convolution of the second order gaussian derivative a with the image at point x=(a, y) and similarly for Lxg and Lzy. These derivatives are known as Laplacian of gaussians Working from this we can calculate the determinant of the Hessian for each pixel in the image and use the value to find interest points. This variation of the Hessian detector is similar to that proposed by Beaudet 2 Lowe 9 found performance increase in approximating the Laplacian of Gaussians by a difference of Gaussians. In a similiar manner, Bay [1 proposed an approximation to the Laplacian of gaussians by using box filter representations of the respective kernels Figure 2 illustrates the similarity between the discretised and cropped kernels and their box filter counterparts. Considerable performance increase is found when these filters are used in conjunction with the integral image described in Section 1.1. To qauntify the difference we can consider the number of array accesses and operations required in the convolution. For a 9 filter we would require &1 array accesses and operations for the original real valued filter and only 8 for the box filter representation. As the filter is increased in size, the computation cost increases significantly for the original laplacian while the cost for the box filters is independent of size Figure 2: Laplacian of Gaussian Approximation. Top Row: The discretised and cropped second order gaussian derivatives in the x, y and xy-directions. We refer to these as Lxz, Lyy, Lry Bottom Row: Weighted Box filter approximations in the x, y and xy-dircctions. Wc refer to ese as Dr, Dyy, Dry In Figure 2 the weights applied to each of the filter sections is kept simple. The black regions are weighted with a value of 1, the white regions with a value of-1 and the remaining areas not weighted at all. Simple weighting allows for rapid calculation of areas but in using these weights we need to address the difference in response values between the original and approximated kernels. Bay 1 proposes the following formula as an accurate approximation for the Hessian determinant using the approximated gaussians det(Approx)= Dez Dyy(0. 9 Dry In 1 the two filters are compared in detail and the results conclude that the box representations negligible loss in accuracy is far outweighed by the considerable increase in efficiency and speed. The determinant here is referred to as the blob response at location x=(, y, a). The search for local maxima of this function over both space and scale yields the interest points for an image. The exact method for extracting interest points is explained in the following section 1.2.2 Constructing the Scale-Space In order to detect interest points using the determinant of Hessian it is first necessary to introduce the notion of a scale-space. A scale-space is a continuous function which can be used to find extrema across all possible scales20. In computer vision the scale-space is ty pically inplemented as an inage pyramid where the input inlage is iteratively convolved with gaussian kernel and repeatedly sub-Sampled (reduced in size). This nethod is used to great effect in SiFT 9 but since each layer relies on the previous, and images need to be resized it is not computationally efficient. As the processing time of the kernels used in SURF is size invariant, the scale-space can be created by applying kernels of increasing size to the original inage. This allows for multiple layers of the scale-space pyramid to be processed simultaneously and negates the need to subsainple the image hence providing performance increase. Figure 3 illustrates the difference between the traditional scale-space structure and the SurF counterpart Figure 3: Filter Pyramid. The traditional approach to constructing a scale-space(left) The image size is varied and the Guassian filter is repeatedly applied to smooth subse quent layers. The SURF approach (right) leaves the original image unchanged and varies only the filter size The scale-space is divided into a number of octaves, where an octave refers to a series of response maps of covering a doubling of scale. In surf the lowest level of the scale- space is obtained from the output of the 9x9 filters shown in 2. These filters correspond to a real valued Gaussian with o= 1. 2. Subsequent layers are obtained by upscaling the filters whilst maintaining the same filter layout ratio. As the filter size increases so too does the value of the associated Gaussian scale, and since ratios of the layout remain constant we can calculate this scale by the formula Base filter scale approx Current Filter Size Base filter size Current Filter size When constructing larger filters, there are a number of factors which must be take into consideration. The increase in size is restricted by the length of the positive and negative lobes of the underlying second order Gaussian derivatives. In the approximated filters the lobe size is set at one third the side length of the filter and refers to the shorter side length of the weighted black and white regions. Since we require the presence of a central pixel, the dimensions must be increased equally around this location and hence the lobe size can increase by a minimum of 2. Since there are three lobes in each filter which must be the same size, the smallest step size between consecutive filters is 6. For the Dxr and Duu filters the longer side length of the weighted regions increases by 2 on each side to preserve structure. Figure 4 illustrates the structure of the filters as they increase in size Figure 4: Filter Structure. Subsequent filters sizes must differ by a minimum of 6 to preserve filter structure 1.2.3 Accurate Interest point localisation The task of localising the scale and rotation invariant interest points in the image can be divided into three steps. First the responses are thresholded such that all values below the predetermined threshold are removed. Increasing the threshold lowers the number of detected interest points, leaving only the strongest while decreasing allows for many more to detected. Therefore the threshold can be adapted to tailor the detection to the application After thresholding, a non-maximal suppression is performed to find a set of candidate points. To do this each pixel in the scale-space is compared to its 26 neighbours, comprised of the 8 points in the native scale and the 9 in each of the scales above and below Figure 5 illustrates the non-maximal suppression step. At this stage we have a set of interest points with minimum strength determined by the threshold value and which are also local maxima/minima in the scale-space Scale Figure 5: Non-Maximal Suppression. The pixel marked X is selected as a maxima if it greater than the surrounding pixels on its interval and intervals above and below The final step in localising the points involves interpolating the nearby data to find the location in both space and scale to sub-pixel accuracy. This is done by fitting a 3D quadratic as proposed by Brown 3. In order to do this we express the determinant of the Hessian function, H(, 3, o), as a. Taylor expansion up to quadratic terms centered at detected location. This is expressed as aH 102H H(x)=H+ ⅹ+-X The interpolated location of the extremum, i=(a, 3,0), is found by taking the derivative of this function and setting it to zero such that H OH O The derivatives here are approximated by finite differences of neighbouring pixels. If i is greater than 0.5 in the x, y or o directions we adjust the location and perform the interpolation again. This procedure is repeated until is less than 0.5 in all directions or the the number of predetermined interpolation steps has been exceeded. Those points which do not converge are dropped from the set of interest points leaving only the most stable and repeatable. 1.3 Interest Point Descriptor The surF descriptor describes how the pixel intensities are distributed within a scale dependent neighbourhood of each interest point detected by the Fast-Hessian. This ap proach is similar to that of SiFT [9 but integral images used in conjunction with filters known as Haar wavelets are used in order to increase robustness and decrease computa- tion time. Haar wavelets are simple filters which can be used to find gradients in the x and y directions Figure 6: Haar Wavelets. The left filter computes the response in the x-direction and the right Lhe y-direclion. Weights are 1 for black regions and -1 for the while. When used with integral images each wavelet requires just six operations to compute Extraction of the descriptor can be divided into two distinct tasks. first each in terest point is assigned a reproducible orientation before a scale dependent window is constructed in which a 64-dimensional vector is extracted. It is important that all cal- culations for the descriptor are based on measurements relative to the detected scale order to achieve scale invariant results. The procedure for extracing the descriptor is explained further in the following 1.3.1 Orientation Assignment In order to achieve invariance to image rotation each detected interest point is assigned a reproducible orientation. Extraction of the descriptor components is performed relative to this direction so it is important that this direction is found to be repeatable under varying conditions. To determine the orientation, Haar wavelet responses of size 40 are calculated for a set pixels within a radius of 6a of the detected point, where g refers to the scale at which the point was detected. The specific set of pixels is determined by sampling those from within the circle using a step size of o The responses are weighted with a gaussian centered at the interest point. In keeping with the rest the gaussian is dependent on the scale of the point and chosen to have standard deviation 2.50. Once weighted the responses are represented as points in vector space, with the x-responses along the abscissa and the y-responses along the ordinate. The domimant orientation is selected by rotating a circle segment covering an angle of 3 around the origin. At each position, the x and y-responses within the segment are summed and used to form a new vector. The longest vector lends its orientation the interest point. This process is illustrated in Figure 7 Figure 7: Orientation Assignment: As the window slides around the origin the components of the responses are sunned to yield the vectors shown here in blue. The largest such vector determines the dominant orientation In some applications, rotation invariance in not required so this step can be omitted hence providing further performance increase. In 1 this version of the descriptor is reffered to as Upright SURF (or U-SURF) and has been shown to maintain robustnes for image rotations of up to +/-15 d 1.3.2 Descriptor Components The first step in extracting the SurF descriptor is to construct a square window around the interest point. This window contains the pixels which will form entries in the descri tor vector and is of size 200, again where o refers to the detected scale. Furthermore the window is oriented along the direction found in Section 1.3. 1 such that all subsequent calculations are relative to this direction The descriptor window is divided into 4x 4 regular subregions. Within each of these subregions Haar wavelets of size 2o are calculated for 25 regularly distributed sample points. If we refer to the and y wavelet responses by dc and dy respectively then for these 25 sample points i.e. each subregion) we collect Figure 8: Descriptor Windows. The window size is 20 times the scale of the detected point and is oriented along the dominant direction shown in green m9m位∑∑咖∑如∑w Therefore each subregion contributes four values to the descriptor vector leading to an overall vector of length 4 x 4x4=64. The resulting SurF descriptor is invariant to rotation, scale, brightness and, after reduction to unit length, contrast ∑dx 象 ∑|dx y ∑|dy Figure 9: Descriptor Components. The green square bounds one of the 16 subregions and blue circles represent the samplc points at which we compute the wavelet responses. As illustratcd Che x and y responses are calculaled relative lo the doninant orientation 1.4 Design This section outlines the design choices which have been made to implement the surF point correspondence library 4.1 Language and Environment C++has been chosen as the programming language to develop the surf library for the following reasons: 1. Speed: Low level image processing needs to be fast and C++ will facilitate the implementation of a highly efficient library of functions 2. Usability: In my research, almost all image processing appears to be carried out in C++, C and matlab. Once complete the library is to be fully documented and freely available, so C++ seems the obvious choice in making a useful contribution to the field 3. Portability: While C++ may not be entirely portable across platforms it is possible, by following strict standards, to write code which is portable across many platforms and compilers 4. Image Processing Libraries: OpenCV is a library of C++ functions which lends itself well to real time computer vision. It provides functionality for reading data from image files, video files as well as live video feeds direct from a webcam or other vision device. The library is well supported and works on both Linux and Windows The chosen development environment for the implementation of the library is Mi- crosoft Visual C++ Express 20082. VC++ is a powerful IDE(Integrated Development Environment)allowing for easy code creation and visual project organisation. OpenCV also integrates well with the compiler and is fully supported. As both Opencv and vi- sual C++ are free, this will allow the finished SURF library to be distributed without licensing restrictions 1.4.2 Architecture Design The architecture design provides a top-down decomposition of the library into modules and classes lOpen Source Computer Vision Library. Provides a simple aPi for working with images and videos inC++.Availablefromhttp://opencvlibrary.sourceforge.net/ FreeC++ideforWindowsAvailablefromwww.microsoftcom/express/ 【实例截图】
【核心代码】
标签:
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
网友评论
我要评论