Intuitive Estimation of Speed using Motion and Monocular Depth Information

  • R.A. Rill Faculty of Informatics, Eötvös Loránd University. H-1117 Budapest, ´ Pázmány P. stny 1/C, Hungary and Faculty of Mathematics and Computer Science, Babeș-Bolyai University, No. 1 Mihail Kogalniceanu St., RO-400084 Cluj-Napoca, Romania.


Advances in deep learning make monocular vision approaches attractive for the autonomous driving domain. This work investigates a method for estimating the speed of the ego-vehicle using state-of-the-art deep neural network based optical flow and single-view depth prediction models. Adopting a straightforward intuitive approach and approximating a single scale factor, several application schemes of the deep networks are evaluated and meaningful conclusions are formulated, such as: combining depth information with optical flow improves speed estimation accuracy as opposed to using optical flow alone; the quality of the deep neural network results influences speed estimation performance; using the depth and optical flow data from smaller crops of wide images degrades performance. With these observations in mind, a RMSE of less than 1 m/s for ego-speed estimation was achieved on the KITTI benchmark using monocular images as input. Limitations and possible future directions are discussed as well.


[1] Abuella, H., Miramirkhani, F., Ekin, S., Uysal, M., and Ahmed, S. ViLDAR - visible light sensing based speed estimation using vehicle’s headlamps. arXiv e-prints (2018), arXiv:1807.05412.
[2] Banerjee, K., Van Dinh, T., and Levkova, L. Velocity estimation from monocular video for automotive applications using convolutional neural networks. In IEEE IV Symposium (2017), pp. 373–378.
[3] Dogan, S., Temiz, M. S., and K ˘ ul¨ ur, S. ¨ Real time speed estimation of moving vehicles from side view images from an uncalibrated video camera. In Sensors (2010).
[4] Dong, H., Wen, M., and Yang, Z. Vehicle speed estimation based on 3d convnets and non-local blocks. Future Internet 11, 6 (2019).
[5] Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. Vision meets robotics: The KITTI dataset. IJRR (2013).
[6] Geiger, A., Lenz, P., and Urtasun, R. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR (2012).
[7] Godard, C., Aodha, O. M., and Brostow, G. J. Unsupervised monocular depth estimation with left-right consistency. In CVPR (2017), pp. 6602–6611.
[8] Han, I. Car speed estimation based on cross-ratio using video data of car-mounted camera (black box). Forensic Science International 269 (2016), 89–96.
[9] Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In CVPR (2017), pp. 1647–1655.
[10] Jiang, H., Larsson, G., Maire, M., Shakhnarovich, G., and Learned-Miller, E. Self-supervised relative depth learning for urban scene understanding. In ECCV (2018), Springer, pp. 20–37.
[11] Kampelmuhler, M., M ¨ uller, M., and Feichtenhofer, C. ¨ Camera-based vehicle velocity estimation from monocular video. In CVWW (2018).
[12] Kumar, A., Khorramshahi, P., Lin, W.-A., Dhar, P., Chen, J.-C., and Chellappa, R. A semi-automatic 2d solution for vehicle speed estimation from monocular videos. In CVPR Workshops (2018).
[13] Li, Z., and Snavely, N. Megadepth: Learning single-view depth prediction from internet photos. In CVPR (2018), pp. 2041–2050.
[14] Luvizon, D. C., Nassu, B. T., and Minetto, R. A video-based system for vehicle speed measurement in urban roadways. IEEE Transactions on Intelligent Transportation Systems 18, 6 (2017), 1393–1404.
[15] Menze, M., and Geiger, A. Object scene flow for autonomous vehicles. In CVPR (2015).
[16] NVIDIA. Nvidia drive AGX, 2019.
[17] Qimin, X., Xu, L., Mingming, W., Bin, L., and Xianghui, S. A methodology of vehicle speed estimation based on optical flow. In Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics (2014), pp. 33–37.
[18] Salahat, S., Al-Janahi, A., Weruaga, L., and Bentiba, A. Speed estimation from smart phone in-motion camera for the next generation of self-driven intelligent vehicles. In IEEE 85th VTC (2017), pp. 1–5.
[19] Sun, D., Yang, X., Liu, M.-Y., and Kautz, J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In CVPR (2018), pp. 8934–8943.
[20] Temiz, M. S., Kulur, S., and Dogan, S. ˘ Real time speed estimation from monocular video. ISPRS Archives XXXIX-B3 (2012), 427–432.
[21] Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., and Geiger, A. Sparsity invariant CNNs. In 3DV (2017).
[22] Xu, Q., Li, X., and Chan, C.-Y. A cost-effective vehicle localization solution using an interacting multiple model-unscented kalman filters (IMM-UKF) algorithm and grey neural network. Sensors 17, 6 (2017).
[23] Y. G. Anil Rao, N. Sujith Kumar, H. S. Amaresh, and H. V. Chirag. Realtime speed estimation of vehicles from uncalibrated view-independent traffic cameras. In TENCON 2015 - IEEE Region 10 Conference (2015), pp. 1–6.
[24] Yao, B., and Feng, T. Machine learning in automotive industry. Advances in Mechanical Engineering (2018).
[25] Zou, Y., Luo, Z., and Huang, J.-B. DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In ECCV (2018), Springer, pp. 38–55.
How to Cite
RILL, R.A.. Intuitive Estimation of Speed using Motion and Monocular Depth Information. Studia Universitatis Babeș-Bolyai Informatica, [S.l.], v. 65, n. 1, p. 33-45, apr. 2020. ISSN 2065-9601. Available at: <>. Date accessed: 04 dec. 2020. doi: