Intuitive Estimation of Speed using Motion and Monocular Depth Information
Abstract
Advances in deep learning make monocular vision approaches attractive for the autonomous driving domain. This work investigates a method for estimating the speed of the ego-vehicle using state-of-the-art deep neural network based optical flow and single-view depth prediction models. Adopting a straightforward intuitive approach and approximating a single scale factor, several application schemes of the deep networks are evaluated and meaningful conclusions are formulated, such as: combining depth information with optical flow improves speed estimation accuracy as opposed to using optical flow alone; the quality of the deep neural network results influences speed estimation performance; using the depth and optical flow data from smaller crops of wide images degrades performance. With these observations in mind, a RMSE of less than 1 m/s for ego-speed estimation was achieved on the KITTI benchmark using monocular images as input. Limitations and possible future directions are discussed as well.
References
[2] Banerjee, K., Van Dinh, T., and Levkova, L. Velocity estimation from monocular video for automotive applications using convolutional neural networks. In IEEE IV Symposium (2017), pp. 373–378.
[3] Dogan, S., Temiz, M. S., and K ˘ ul¨ ur, S. ¨ Real time speed estimation of moving vehicles from side view images from an uncalibrated video camera. In Sensors (2010).
[4] Dong, H., Wen, M., and Yang, Z. Vehicle speed estimation based on 3d convnets and non-local blocks. Future Internet 11, 6 (2019).
[5] Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. Vision meets robotics: The KITTI dataset. IJRR (2013).
[6] Geiger, A., Lenz, P., and Urtasun, R. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR (2012).
[7] Godard, C., Aodha, O. M., and Brostow, G. J. Unsupervised monocular depth estimation with left-right consistency. In CVPR (2017), pp. 6602–6611.
[8] Han, I. Car speed estimation based on cross-ratio using video data of car-mounted camera (black box). Forensic Science International 269 (2016), 89–96.
[9] Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In CVPR (2017), pp. 1647–1655.
[10] Jiang, H., Larsson, G., Maire, M., Shakhnarovich, G., and Learned-Miller, E. Self-supervised relative depth learning for urban scene understanding. In ECCV (2018), Springer, pp. 20–37.
[11] Kampelmuhler, M., M ¨ uller, M., and Feichtenhofer, C. ¨ Camera-based vehicle velocity estimation from monocular video. In CVWW (2018).
[12] Kumar, A., Khorramshahi, P., Lin, W.-A., Dhar, P., Chen, J.-C., and Chellappa, R. A semi-automatic 2d solution for vehicle speed estimation from monocular videos. In CVPR Workshops (2018).
[13] Li, Z., and Snavely, N. Megadepth: Learning single-view depth prediction from internet photos. In CVPR (2018), pp. 2041–2050.
[14] Luvizon, D. C., Nassu, B. T., and Minetto, R. A video-based system for vehicle speed measurement in urban roadways. IEEE Transactions on Intelligent Transportation Systems 18, 6 (2017), 1393–1404.
[15] Menze, M., and Geiger, A. Object scene flow for autonomous vehicles. In CVPR (2015).
[16] NVIDIA. Nvidia drive AGX, 2019. https://www.nvidia.com/en-us/self-drivingcars/drive-platform/hardware/.
[17] Qimin, X., Xu, L., Mingming, W., Bin, L., and Xianghui, S. A methodology of vehicle speed estimation based on optical flow. In Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics (2014), pp. 33–37.
[18] Salahat, S., Al-Janahi, A., Weruaga, L., and Bentiba, A. Speed estimation from smart phone in-motion camera for the next generation of self-driven intelligent vehicles. In IEEE 85th VTC (2017), pp. 1–5.
[19] Sun, D., Yang, X., Liu, M.-Y., and Kautz, J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In CVPR (2018), pp. 8934–8943.
[20] Temiz, M. S., Kulur, S., and Dogan, S. ˘ Real time speed estimation from monocular video. ISPRS Archives XXXIX-B3 (2012), 427–432.
[21] Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., and Geiger, A. Sparsity invariant CNNs. In 3DV (2017).
[22] Xu, Q., Li, X., and Chan, C.-Y. A cost-effective vehicle localization solution using an interacting multiple model-unscented kalman filters (IMM-UKF) algorithm and grey neural network. Sensors 17, 6 (2017).
[23] Y. G. Anil Rao, N. Sujith Kumar, H. S. Amaresh, and H. V. Chirag. Realtime speed estimation of vehicles from uncalibrated view-independent traffic cameras. In TENCON 2015 - IEEE Region 10 Conference (2015), pp. 1–6.
[24] Yao, B., and Feng, T. Machine learning in automotive industry. Advances in Mechanical Engineering (2018).
[25] Zou, Y., Luo, Z., and Huang, J.-B. DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In ECCV (2018), Springer, pp. 38–55.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Transfer of copyright agreement: When the article is accepted for publication, I, as the author and the representative of the coauthors, hereby agree to transfer to Studia Universitatis Babes-Bolyai, Series Informatica, all rights, including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the authors specifically retain: the authors can use the material however they want as long as it fits the NC ND terms of the license. The authors have all rights for reuse according to the below license.