An Analytical Review of Stereovision Techniques to Reconstruct 3D Coordinates

Автор: Raheel Ahmed, Muhammad Naeem Ahmed Khan

Журнал: International Journal of Information Technology and Computer Science(IJITCS) @ijitcs

Статья в выпуске: 7 Vol. 5, 2013 года.

Бесплатный доступ

Stereovision based on 3D environment reconstruction provides a true picture of real world situations for detection of objects’ locations. This approach has specific use in the scenarios like identifying traffic jams on the roads, locating curves and bends on the roads, finding obstacles in the construction sites, etc. This paper describes different methods used in stereovision to detect images like use of trinocular stereovision, calculating correlation between left and right contours for achieving accuracy, use of prior information with intrinsic and extrinsic parameters, detection of side lane and 3D points of guardrails and fences, use of dense stereovision information, especially in urban environment. The paper also discusses Forward Collision Detection method that uses Elevation Map with Dense Stereovision, tracking of multiple objects using two-level approach and building an enhanced grid that involves obstacle cells. Hybrid dense stereo engine, which is used in urban detection scenarios is also discussed in the paper along with a solution of lane estimation in different situations using particle filtering method. Pattern matching using 3D image for pedestrian detection and lane estimation based on the particle filtering with greyscale images are also explored. The use of the rectangular digital elevation map for transforming stereo based information and the methodology used to enhance the sub pixel accuracy are also part of the paper.

Еще

Stereovision, 3D Imaging, Lane Detection, Particle Filtering, Obstacle Detection, Elevation Maps

Короткий адрес: https://sciup.org/15011928

IDR: 15011928

Текст научной статьи An Analytical Review of Stereovision Techniques to Reconstruct 3D Coordinates

Published Online June 2013 in MECS

Stereovision is a branch of image processing, which deeply analyze 3D information gained from two or more digital cameras. The focus of the algorithms developed in the past was to detect the main features, e.g., ellipse, line segments, low level points and correlation features with an edge point, etc. to match in

II. Literature Review

Stereovision, also known as stereoscopic vision or stereopsis, relates to visual perception in three dimensions. The stereovision technology is based on a passive sensor to measure accurately the 3D position of a 3D point. Since a solid object consists of numerous

3D points, therefore, a stereovision algorithm necessitates processing thousands of 3D points to establish shape and size of the object in an image. In the trinocular stereovision system, the horizontal points are collected from vertical camera pair, and the vertical points are collected from horizontal camera pair and edge points are used to correlate different features of the image. The bottom camera is used for correlation if there is a horizontal edge point on the surface. In case of an oblique edge point, left camera is used for calculating correlation. The third camera is ordinarily a spare one which is used for validation purposes.

Fig. 1: Edge point detection (source: [28])

Detection of edge points in Stereovision can enhance the accuracy of lanes using sub-pixel accuracy with the help of contours, and it can be achieved by following specific sequence of tasks like image filtering, calculating derivatives, tracking the contour, extraction and finally closing. Stereovision based 3D information can be used for detecting guardrails and fences as well. The algorithm for this purpose searches dark and light patterns, which help determine the presence or absence of lane. The weights assigned to different patterns lie in the interval of 0.1 to 1; however, discrete 3D points after processing all the points can also be used. Both left and right side information can be used for making two different histograms to give an idea about the discrete 3D space.

The guard rails are detected using the same method used to detect lanes on the road except the one difference that instead of feeding the points of road surface, the points which are above the surface of the road are supplied as input. This same detection system can be used for multiple objects by filtering particles with the help of elevation maps. Particle weighting by measurement can be processed in elevation map by using binary values, where the value zero represents the hurdle, and 255 represents the free area [7]. Three points each can be considered from outside of the perimeters which are grey and black points represent hurdles whereas the white points denote the inner points [7]. Integration of occupancy grids is possible in dense stereovision with three different types of cells road, traffic isles and obstacles [8]. Elevation map can be integrated in two ways: static and dynamic obstacle cells [8]. Static obstacle cells have a long lifetime as compared to the dynamic obstacle cells. The accuracy of the cues in static obstacle cells is dependent on the obstacle size and speed. Some other technologies are also used for detecting obstacles like radars or laserbased setup. The target of these systems is to detect in urban scenarios, and it detects only those 3D points which are above the road surface. It will be easier for any algorithm if it receives the points with the constant density. Compressed space can be created in which cells correspond to the trapezoidal tiles which are in the form of a bi-dimensional histogram [9]. Detection of cells is possible only in that situation where a number of points are present, and if there are few points then those cells are rejected. However, the algorithm does not detect empty corners of the cuboids confidently. If there is a free space on the corner, then it is divided into more sub-obstacles [9]. Driving assistance systems can be helpful, especially in case of urban scenarios. Stereovision techniques are commonly targeted for highway traffic but few systems have been developed specially for such a kind of situations. Special hardware boards have been developed to reconstruct 3D scenes [10]. In an urban environment, the standardized distance is minimum 0.5m in front of a car which can be about 2.5m from camera and the maximum range for this can be up to 50m [10]. In urban scenarios, the images are quite complex and therefore, an efficient and reliable lane extracting algorithm is required to be implemented. Lane tracking has great importance in driving assistance; there can be difficult road scenarios as well where there can be a chance of errors in the lane detection. Lane particles use a coordinate system in which road is considered as a graph in which the origin is in front of the car relatively at the centre of the width [11]. The positive values are towards the right side of the car which can be denoted as the x-axis, the second axis is positive towards the ground which is called y-axis and the third axis is called z-axis, which is towards the forward on the road [11]. Another important concept in road scenarios is the presence of pedestrian who are required to be detected to avoid collision based on the information provided by the stereovision base system.

These objects are detected using complete 3D information with the pedestrian hypothesis based on the disparity map [12]. The feature of this collision avoidance system is required to detect the obstacle from the distance of 20m [12]. Pattern matching is used for this purpose in which we compare a set of human shapes with the detected points in the form of 2D image to reduce the calculation time [12].

Fig. 3: Pedestrian Detection (source: [29])

Furthermore, for the improvement of results, there can be some refined techniques as well in the field of stereovision. Particle filtering and grey-scale images can be more useful for calculating the lane detection in the complex situation on the highway or city area [13]. The most common situations in the lane detection can be appearance and disappearance of the lane, lane joining, there can be a sharp change in the direction of lane and there can be a chance of sensor failure because of some internal or external conditions. Just like lane tracking, another advance approach has been developed for achieving high accuracy in detection of traffic isles. The stereo information has transformed into the rectangular Digital Elevation Map [14] and has been divided into two classifications: density and road surface. In the next step, fusion and error filtering are conducting. Further accuracy can be achieved by designing a function for sub pixel accuracy. Generally there can be two methodologies used in stereovision based algorithms, first one is using a histogram to represent the environment and the second is to use synthetic images of the given environment [15]. For vehicle distance estimation, we use stereo vision sensors. The stereo images are obtained and the region of interest of the license plate is estimated using color features [16]. Simulated environment is used with specialized software which shows successful extraction not on the approaching vehicles but also the obstacles and pedestrians [17]. A data-driven approach is used to model and predict the maneuvers of surround drivers. By observing surrounding and accruing vehicle trajectories over time, it is possible to learn what are typical highway activities, using simple detectors and trackers that operate in real-time, based on experience rather than explicit modeling [18]. Epipolar geometry

[19] is the theoretical base of the pinhole camera model [19], which is used in stereovision. The correctness of 3D reconstruction mainly depends on the left and right features on the same line as used to fit a parabola to a neighborhood around the minimum position of correlation function [20]. A number of standard methods are available in the literature to locate the intrinsic parameters [21-24]. Calibration can be performed after a detailed observation of calibration objects with their geometrical properties known in 3D space. We can also use plane undergoing in a way of precise known translation. In such a type of technique normally calibration objects are not used. In a static scene, movement of the camera gives generally two constraints due to rigidity of the scene, internal parameters from displacement of one camera with the help of image information alone. That is why by taking images with the same camera by using fixed internal parameters, comparisons in three different images will be enough to reconstruct 3D structure nearest to the real situation by using both internal and external parameters. This method of reconstructing 3D structure is famous, but it is little flexible that’s why we can say that it is not very mature. Many other parameters can be used to create 3D coordinates, which are quite helpful to get reliable results. In the present methodology, there is a requirement of a camera to observe different orientations of planar patterns shown in the image. There can be one procedure in which you can take a printout of this pattern using a laser printer and attach it on a planar surface and camera can be moved or in another way that planar pattern can be moved by hand. The presented methodology lies between the photogrammetric calibration and self-calibration, reason is that we are using 2D metric data instead of 3D or purely implicit one. Two procedures can be used to test the results of given technique; one is computer simulation and other is by using real data. The proposed technique is more flexible as compared to the classical technique. Considerably, the degree of robustness is more as compared with self-calibration. According to the analysis of this technique, it is considered that it will advance 3D computer vision one step forward from laboratory environments to the situations like the real world.

Physical camera provides two types of parameters called extrinsic and intrinsic parameters. Extrinsic parameters are used for transforming coordinates of an object to a centered coordinate frame of a camera. If multiple cameras are used, then extrinsic parameters obtained from multiple cameras can be used to describe the relationship between the cameras. Collinearity principle is used in the pinhole camera model in which every coordinate is drawn in the form of a straight line in the object space through the projection center of an image plane. The origin of the camera coordinate system is considered as a center projection of the location by comparing the object coordinate system. Three axes are called X0, Y0 and Z0 where the z-axis is on 90 degree to the image plane. The rotation of these angles can be represented by ω, ф, and κ which will describe a sequence of rotation around the three angle x, y and z. These rotations are normally clockwise; this rotation is performed in a sequence like first of all it will be around the x-axis then around the y-axis and then in the last it will be around the z-axis.

The focus of all the development of stereovision based systems is on more and more interaction between the two elements that is vehicles and the driving environment. First of all, using several sensor system detection and interpretation of whole driving environment is performed. ADAS application itself determined the abstraction layer of the environment which is necessary. Distance and speed measurements of the driving target are required in a longitudinal control task in ACC. Where potentially dangerous obstacles are present in more complex situations in that case warning and safety functions also become little complex to calculate results. For more reliable results for the detection of images, it requires better results in case if the vehicle is in urban area like traffic jams which are idiosyncratic with the city areas. An approach is presented, which is performing a full 3D reconstruction of the visible scene but there will be only one limitation where coordinates will line lie on the vertical edges. The criteria for grouping the coordinates based on solely on density and vicinity. Using this technique the coordinate system will detect all the objects or obstacles, which will be presented as an output in the form of a number of cuboids with different sizes and positions with respect to the coordinate present in an image. Multiple tracking object algorithm is used further to detect multiple objects, which can be helpful to refine the grouping, positioning and detects the speed. There is one thing very important in case of stereovision imaging that number of 3D coordinates will be reduced if the distance increases. To avoid this type of problem, we can compress the satellite view of the space and it will be dependent on the distance of an object. In a compressed space without caring about the distance if it is compressed space, then image region where the object is located will be with the same density of points. In this way, this approach can develop a stereovision based obstacle detection system, which will be useful for reconstructing the 3D points which will be really effective in different situations like complex traffic scenarios with real-time constraints, system can be used to integrate in a driving assistance application which is very suitable in vehicle environment perception. In coming future the functionality of this type of systems can be enhanced to improve performance as well. There is a requirement to develop a system which should have the ability to disambiguate, not reject, repetitive patterns and reconstruct points from horizontal edges. As stereovision based technique is able to reconstruct the feature in sight that’s why it can be said that it will be possible for the system to reconstruct road features as well e.g. 3D lane detection algorithms can work with this system to reconstruct features. Moreover, different type of object images can be detected, and this algorithm can construct a base of different type of object detection, e.g. detection of vehicles, pedestrian detection or sometimes detection of traffic signs [25].

III. Critical Evaluation

In this section, a critical review of the different issues, methodologies and techniques of Stereovision are summarized (Table I) that have been observed during the course of literature review of this study. The different aspects of Stereovision that have been accounted for in this analysis include: 3D environment reconstruction using different techniques based on the requirements, use of different hardware to detect lanes using particle filtering, guardrails and fences, obstacle, pedestrian detection for avoiding forward collision using elevation map with dense stereovision information and techniques used to enhance the pixel accuracy.

IV. Future Work

The prospective work for this research can be to explore correlation prediction, validation and rebuilding more coordinates specially in the field of motion objects. By using similar coordinates; we can increase the accuracy level by increasing the number of cues in our analysis. More features of stereovision can be used in a combined form to build a package for making a generic system for driving assistance, which will create results based on the information provided by more efficient sensors and other devices. We can reduce the chance of error in the algorithms by using different techniques together to reduce the chance of collision.

Future driver-assistance and safety systems aim to assist the driver in increasingly more multifarious driving situations as compared to the present situation, the purpose is to assist safe and stress-free driving. Faster and reliable knowledge of the moving objects will be the target of future automated systems, as well as their movement patterns relative to the ego-vehicle, will be an essential basis for such systems. For future work, we are considering to training the stereo images with the help of more detailed information with intensity. Using this technique, we will be able to calculate a better estimate of parameters of the likelihood depth. In this way, it will be possible for us to extract information and test the system with more extra features from depth images. In order to disambiguate in the same image region we can make a use of output, which will consist of probability densities of many detectors with more detailed reasoning. This type of detailed depth information can be very beneficial to give detailed reasoning in case of providing conclusions.

	Table I: Critical Analysis of Stereovision Techniques and Methodologies
Ref #	Research Theme Key Feature Specific Considerations
[1]	Frequent errors can be occurred due to Reconstruction of Images using Three cameras are configured to gain the complex comparison of pixels with trinocular stereovision. appropriate stereo pair. same intensity. Achieving accuracy with sub pixels Filtering the image, calculating first and In case, 3D information will be far away
[2]	which can be gained by correlating second partial derivatives, tracking the (60 to 90 m) then there is a chance of an left and right contours. contours, extracting and closing. error using this methodology. Use of internal and extrinsic
[3]	Capturing the fixed pairs of images, extraction parameters to gain quality in of the lane and inputting the points with - reconstruction for Stereovision reconstruction algorithms. System used in Vehicle Applications.
[4]	Detection of Side Lane and Guardrails 5000 reliable 3D coordinates per frame are High quality of image is precondition using stereovision information. delivered by the stereovision engine. for the success of this algorithm. A special hardware board is used for Traffic issues are more complex in
[5]	Accurate detection of dense stereo performing reconstruction of 3D coordinates. It urban areas, therefore, the technique is information in urbane scenarios. detects all the 3D points which are above the effective if accurate images are road or they are below the height of the car. available.
[6]	^{Forward Collision Warning (FCW)} Description of tracked objects, car parameters Highly good results of FCW will be System for urban /т nvmp scenarios . . ^{1 1} . < " .. . . . . System for urban driving scenarios is important for assistance and detected object related to the quality of 3D information based on Elevation map from dense delimiters using the elevation maps. inputted by the system. stereovision.
[7]	An approach used to track multiple 3D box reconstruction is not required as 3D dense stereo data is represented bv the .. . . .. . . \ 3D dense stereo data is represented by the objects with the help of filtering direct working is possible with simple . . . . . ¹ ° СИРИЯ е evatmn man « < ± digital elevation map. particle and elevation maps. measurement.
[8]	Temporal integration at multiple Obstacle cells might overlap depending 1 133 О Obstacle detection in two ways that is static levels with three types, road, traffic upon the size of an obstacle and can and dynamic. isles and obstacles. generate errors. Method uses Cartesian view space with regular
[9]	Dense Stereovision 3D points density on X-axis and it has same regular A caveat associated with the technique reconstruction even in case of few density on the Z-axis. Approach makes is that errors can be occurring while textural images used in Urban ACC compressed matrix which is like a trapezoidal rebuilding the 3D points. (Automatic Cruise Control) System. tiles of Cartesian Space that resembles a bi- dimensional histogram.
[10]	Hybrid Dense Reconstruction Engine The technique does not divide real Geometric model is fitted using least square used to detect clothoid and no- obstacle object into fine-grained pieces. method. Detects obstacles with suitable height clothoid lanes, pedestrians and Do not merge more distinct obstacles at least 1.5 meters. drivable areas in urban areas. into a single one.
[11]	Tracking the lanes in the complex Improves the lane detection using particle Lane disappearance, forking lane and situations using Stereovision filtering using KF solution. sharp lanes can be difficult to handle. Classification for matching patterns, 3D boxes, Pedestrian detection using
[12]	2D processing for reducing processing time, sterenvismn based cues and ^, stereovision based cues and avoiding - Validation based on the motion i.e. walking pedestrian collision. pedestrians.
[13]	Lane Tracking System based on the System shows remarkable results only Improvement in lane tracking as Kalman filter particle filtering using grayscale when the surrounding conditions are solution has problems in difficult scenarios. image based cues. suitable. Detection of road, isles and obstacles 3D information is transformed into the Due to the poor conditions of the road, it
[14]	using dense stereo information with rectangular digital elevation map; road surface can receive wrong textures and can elevation maps. is modeled like quadratic equation. produce false results.
[15]	Design of interpolation functions for Idea of presenting correlation among stereo achieving sub pixel accuracy. based algorithms and sub pixel interpolation.

V. Conclusion

In this paper, we have discussed stereovision technique that is especially used for driving assistance based on the different scenarios of urban area and highway. After studying the different range of traffic situations, many results show good outcomes. Stereovision based systems could detect the massive objects such as vehicle, but and less massive, such as pedestrians. We have discussed different parameters like there are many issues in reconstruction of 3D environment, which are based on the real-world conditions. Using cameras and sensors, we can enhance reliability of the assistance system to avoid collision with expected obstacle or pedestrians.

Список литературы An Analytical Review of Stereovision Techniques to Reconstruct 3D Coordinates

S. Nedevschi, S. Bota, T. Marita, F. Oniga, C. Pocol. “Real-Time 3D Environment Reconstruction Using High Precision Trinocular Stereovision.” Automation, Quality and Testing, Robotics, 2006 IEEE International Conference on 25-28 May 2006.
S. Nedevschi, F. Oniga, R. Danescu. “Increased Accuracy Stereo Approach for 3D Lane Detection.” Intelligent Vehicles Symposium 2006, June 13-15, 2006, Tokyo, Japan.
S. Nedevschi, C. Vancea, T. Marita, T. Graf. “On-Line Calibration Method for Stereovision systems Used in Vehicle Applications.” Intelligent Transportation Systems Conference, 2006. ITSC '06. IEEE Dated: 17-20 Sept. 2006.
R. Danescu, S. Sobol, S. Nedevschi, T. Graf. “Stereovision-Based Side Lane and Guardrail Detection”. Intelligent Transportation Systems Conference, 2006. ITSC '06.IEEE Dated: 17-20 Sept. 2006.
S. Nedevschi, R. Danescu, C. Pocol. M.M. Meinecke. “Stereo Image Processing for ADAS and Pre-Crash Systems”. 5th International Workshop on Intelligent Transportation -- WIT 2008.”. Dated: 19 March 2008.
S. Nedevschi, A. Vatavu, F. Oniga. “Forward Collision Detection based on Elevation Map from Dense Stereo”. Intelligent Computer Communication and Processing, 2008. ICCP 2008. 4th International Conference on Dated: 28-30 Aug. 2008
R. Danescu, F. Oniga, S. Nedevschi, M.M. Meinecke. “Tracking Multiple Objects Using Particle Filters and Digital Elevation Maps”. Intelligent Vehicles Symposium, 2009, IEEE Dated: 3-5 June 2009, Digital Object Identifier : 10.1109/IVS.2009.5164258
F. Oniga, S. Nedevschi, M.M. Meinecke. “Temporal Integration of Occupancy Grids Detected from Dense Stereo Using an Elevation Map Representation”. 6th International Workshop on Intelligent Transportation (WIT 2009), Hamburg, Germany, March 24-25, 2009, pp. 133-138.
C. Pocol, S. Nedevschi, M.M. Meinecke “Obstacle Detection Based on Dense Stereovision for Urban ACC Systems”. WIT 2008: 5th International Workshop on Intelligent Transportation, 18-19 March 2008, Hamburg, Germany.
S. Nedevschi, Radu Danescu, T, Marita, F. Oniga, C. Pocol, S. Sobol, C. Tomiuc, C. Vancea, M.M. Meinecke, T. Graf, T. Binh To, M.A. Obojski “A Sensor for Urban Driving Assistance Systems Based on Dense Stereovision”. Intelligent Vehicles Symposium, 2007 IEEE Dated: 13-15 June 2007.
Radu Danescu, Sergiu Nedevschi “Problilistic Lane Tracking in Difficult Road Scenarios Using Stereovision”. IEEE Transactions on Intelligent Transportation Systems, Vol 10, No. 2 June 2009.
S. Nedevschi, S. Bota, C. Tomiuc “Stereo-Based Pedestrian Detection for Collision-Avoidance Applications”. IEEE Transactions on Intelligent Transportation Systems, Vol 10, No. 3 September 2009.
R. Danescu, S. Nedevschi, M.M. Meinecke, T. Binh To “A Stereovision-Based Probabilistic Lane Tracker for Difficult Road Scenarios”. Intelligent Vehicles Symposium, 2008 IEEE Dated: 4-6 June 2008.
Florin Oniga , Sergiu Nedevschi “Processing Dense Stereo Data Using Elevation Maps: Road Surface, Traffic Isle and Obstacle Detection”. Vehicular Technology, IEEE Transactions on Dated: March 2010.
Istvan Haller, Sergiu Nedevschi “Design of Interpolation Functions for Sub-Pixel Accuracy Stereo-Vision Systems”. IEEE Transactions on Image Process. 2011 Jul 29. DOI: 10.1109/TIP.2011.2163163.
Zhibin Zhang, Shuangshuang Liu , Gang Xu , Juangjuang Wang, “A Vehicle Distance Measurement Based On Binocular Stereo Vision”. Journal of Theoretical and Applied Information Technology 31st October 2012. Vol. 44 No.2.
T. Surgailis, A. Valinevicius, V. Markevicius, D. Navikas, D. Andriukaitis, “Avoiding Forward Car Collision using Stereo Vision System”. Elektronika ir elektrotechnika, issn 1392-1215, vol. 18, no. 8, 2012.
Sayanan Sivaraman, Brendan Morris, and Mohan Trivedi, “Learning Multi-Lane Trajectories using Vehicle-Based Vision”. 2011 IEEE International Conference on Computer Vision Workshops 978-1-4673-0063-6/11.
E. Trucco, A. Verri, “Introductory Techniques for 3D Computer Vision.” Prentice Hall, 1998.
T. A. Williamson,”A High-Performance Stereo Vision System for Obstacle Detection”, technical report CMU-RI-TR-98-24, Robotics Institute, Carnegie Mellon University, September 1998.
R. Y. Tsai. “A versatile camera calibration technique for high accuracy 3D machine vision metrology using off the shelf TV cameras and lenses.” IEEE Journal of Robotics and Automation, RA-3(4)/1987: 99.323.344. 1987.
Z. Zhang. “Flexible Camera Calibration by Viewing a Plane From Unknown Orientations.”International Conference on Computer Vision (ICCV’99), Corfu, Greece, September 1999, pp. 666-673. 1999.
J. Heikkila, O. Silven, “A four-step camera calibration procedure with implicit image correction,” Proc. ICCC Computer Society Conf, 1997. 1106-1112. 1997.
J. Y. Bouguet. Camera Calibration Toolbox for Matlab, http://www.vision.caltech.edu/bouguetj/calib_doc
S. Nedevschi, R. Danescu, D. Fretiu, T. Marita, F. Oniga, C. Pocol, R. Schmidt, T. Graf, “ High Accuracy Stereo Vision System for Far Distance Obstacle Detection”, IEEE Intelligent Vehicles Symposium (IV 2004), Parma Italy, pp. 292-297, 2004.
S. Nedevschi, R. Schmidt, T. Graf, R. Danescu, D. Frentiu, T. Marita, F. Oniga, C. Pocol, “3D Lane Detection System Based on Stereovision”, IEEE Intelligent Transportation Systems Conference (ITSC), Washington, USA. Pp 161-166, 2004.
J. Canny, “A Computational Approach to Edge Detection.” IEEE Trans Pattern Analysis and Machine Intelligence. Vol. 8 No. 6 June 1986. Pp. 679-698.
Image Obtained from the website: http://users.fmrib.ox.ac.uk/~steve/susan/susan/img90.gif
Image Obtained from the website:http://ecx.images-amazon.com/images/I/51NslhcP0PL. _SL500_AA300_.jpg

Еще

Статья научная