Challenges and Practices in Perception for Autonomous Driving- 大数跨境

首页

Challenges and Practices in Perception for Autonomous Driving

元戎启行招聘

2020-12-18

导读：DeepRoute Lab知识分享栏目第三集开播啦~

#嘉宾介绍

视频涉及课件请详见以下：

Brief Introduction for Perception

▍A Typical Autonomous System

(Picture from: Chen S, Liu B, Feng C, et al. 3d point cloud processing and learning for autonomous driving[J]. arXiv preprint arXiv:2003.00601, 2020.MLA)

▍Perception

1) Perceiving the surrounding environment

2) Extracting information that is related to navigation

▍DeepRoute_Sense Ⅱ

▍Framework

Challenges in Perception

▍A comparison between image and point cloud data

(Picture from: Cui Y, Chen R, Chu W, et al. Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review[J]. 2020.）

1) The scene is very diverse and complex

2) Big differences within-class

3) High similarity between different classes

4) Undefined categories

(Picture from: http://www.robosense.cn/news/1574090713599)

5) Adverse weather conditions

(Picture from: Kenk M A, Hassaballah M. DAWN: Vehicle Detection in Adverse Weather Nature Dataset[J]. arXiv preprint arXiv:2008.05402,2020.)

6) Noisy

7) Real time (computing resources on car are very limited)

▪ Many points to be processed: >13w. In particular, 340w pts/s for Hesai Pandar128

▪ Multil-view images

▪ Long range

✔ Summary

1) The scene is very diverse and complex

2) Big differences within-class

3) High similarity between different classes

4) Undefined categories

5) Adverse weather conditions

6) Noisy

7) Real time

DL-based Perception

▍3D Object Detection in Point Cloud

1) Target

▪ Detect and locate the instances

▪ Return their geometric 3D location, orientation and semantic instance label

2) Characteristic

▪ Record the range from the Lidar to the detected object's surface and directly provide a precise 3D representation of a scene

▪ Sparse, irregular, orderless and continuous

▪ Lack of color information

3) Challenges

▪ Diversified point density and reflective intensity

(Picture from: http://www.robosense.cn/news/1574090713599)

▪ Incompleteness

4) Methods

▪ DL-based and data driven

▪ Noise insensitive, segmentation results are more accurate

▪ Mostly for well defined objects and can not be interpreted

▪ Physical or rule-based

▪ Suitable for any objects and can be interpreted

▪ Easily products over-segmentation or under-segmentation

▪ Noise sensitive

(Picture from: Yang G, Mentasti S, Bersani M, et al. LiDAR point-cloud processing based on projection methods: a comparison[J]. arXiv preprint arXiv:2008.00706, 2020.)

5) Deep Learning

▪ Hierarchy and deep

▪ DL for point cloud：

▪ How to model the relationship between points to build the receptive field of hierarchy

▪ In continuous space or discrete space ?

▪ Neighborhood

▪ K-nearest neighbor

▪ Tree

▪ Voxel index

(Picture from: Qi C R, Yi L, Su H, et al. Pointnet++: Deep hierarchical feature learning on point sets in a metric space[J]. Advances in neural information processing systems, 2017, 30: 5099-5108.)

6) Point Cloud in Autonomous Driving：Interesting!

7) Voxelization

▪ Transforms the continuous irregular data structure to a discrete regular data structure

▪ Makes it possible to apply standard 3D or 2D discrete convolution and leverage existing network structures to process point cloud

▪ Loss of some spatial resolution, which might contain fine-grained 3D structure information

8) Flexible

▪ Easy to Multi-scale

▪ Features are easy to be aggregation

9) Survey

(Picture from: Guo Y, Wang H, Hu Q, et al. Deep learning for 3d point clouds: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2020.)

10) OpenPCDet

▪ Flexible and clear model structure to easily support various 3D detection models

(Picture from: https://github.com/open-mmlab/OpenPCDet.)

11) Apollo::lidar::cnnseg

▪ Birdview based hand-craft pointcloud representation + 2D CNN + Anchor free detector or Instance segmentation

hand-craft feature → learning

(Picture from: https://github.com/ApolloAuto/apollo/blob/master/docs/specs/3d_obstacle_perception_cn.md)

12) PointPillar

(Lang A H, Vora S, Caesar H, et al. Pointpillars: Fast encoders for object detection from point clouds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 12697-12705.MLA)

13) HVNet

(Ye M, Xu S, Cao T. HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1631-1640.）

14) PV-RCNN

(Shi S, Guo C, Jiang L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10529-10538.MLA)

15) AFDet

(Ge R, Ding Z, Hu Y, et al. Afdet: Anchor free one stage 3d object detection[J]. arXiv preprint arXiv:2006.12671, 2020.)

16) Range conditioned dilated block (RCD)

▪ The range image is compact and does not suffer from sparsity related issues

(Bewley A, Sun P, Mensink T, et al. Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection[J]. arXiv preprint arXiv:2005.09927, 2020.)

17) Range conditioned dilated block (RCD)

(Bewley A, Sun P, Mensink T, et al. Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection[J]. arXiv preprint arXiv:2005.09927, 2020.）

18) Range conditioned dilated block (RCD)

(Bewley A, Sun P, Mensink T, et al. Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection[J]. arXiv preprint arXiv:2005.09927, 2020.)

✔ Summary

1) Point-wise, pixel-wise and voxel-wise feature can be interactive flexibly

2) Feature representation can be aggregated by multi structure, and build interaction between them is better

3) Feature representation can be aggregated by multi scale

4) Feature representation can be aggregated by multi frames

▍Image and Point Cloud Fusion

1) Shape is not enough, color and texture is necessary

2) Point Cloud in Autonomous Driving：Interesting!

▪ The transformation is relatively simple, but not one-to-one, need interpolation

3) Interpolation

(Wang G, Tian B, Zhang Y, et al. Multi-View Adaptive Fusion Network for 3D Object Detection[J]. arXiv preprint arXiv:2011.00652, 2020.）

4) Methods

▪ Result-level fusion

▪ Information aggregation occurs at the result level

▪ Good for sensor redundancy

▪ Cannot leverage the complementary among different sensors

▪ Feature-level fusion

▪ Jointly reason over multi-view inputs, the intermediate features of which are deeply fused

▪ How to take the essence and discard the dross

5) Survey

(Cui Y, Chen R, Chu W, et al. Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review[J]. arXiv preprint arXiv:2004.05224, 2020.）

▪ Feature-level:

▪ ROI proposal

▪ Voxel-wise or Pixel-wise

▪ Point-wise manner

(Cui Y, Chen R, Chu W, et al. Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review[J]. arXiv preprint arXiv:2004.05224, 2020.)

6) MV3D

(Chen X, Ma H, Wan J, et al. Multi-view 3d object detection network for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 1907-1915.)

7) Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation

(Meyer G P, Charland J, Hegde D, et al. Sensor fusion for joint 3d object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2019: 0-0.)

8) PointPainting

(Vora S, Lang A H, Helou B, et al. Pointpainting: Sequential fusion for 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 4604-4612.)

9) SemanticVoxels

(Fei J, Chen W, Heidenreich P, et al. SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation[J]. arXiv preprint arXiv:2009.12276, 2020.)

10) EPNet

(Huang T, Liu Z, Chen X, et al. Epnet: Enhancing point features with image semantics for 3d object detection[C]//European Conference on Computer Vision. Springer, Cham, 2020: 35-52.)

11) MVAF-Net

(Wang G, Tian B, Zhang Y, et al. Multi-View Adaptive Fusion Network for 3D Object Detection[J]. arXiv preprint arXiv:2011.00652, 2020.)

(Wang G, Tian B, Zhang Y, et al. Multi-View Adaptive Fusion Network for 3D Object Detection[J]. arXiv preprint arXiv:2011.00652, 2020.）

✔ Summary

1) Becoming an emerging research theme for autonomous driving

2) Open problem:

▪ The improvement is still not obvious

▪ How will the model behave when a kind of sensor data is noisy or invalid

▪ Still need to find more effective methods

Physical Perception

▍Methods for 3D object detection

1) DL-based and data driven

▪ Noise insensitive, segmentation results are more accurate

▪ Mostly for well defined objects and can not be interpreted

2) Physical or rule-based

▪ Suitable for any objects and can be interpreted

▪ Easily products over-segmentation or under-segmentation

▪ Noise sensitive

▍Long tail Cases: Undefined categories

▍Motivation

1) Undefined categories are not suitable for deep learning based methods

2) Bbox is not suitable to describe some objects

3) Recall some long tail cases

(Picture from: http://www.robosense.cn/news/1574090713599)

▍Methods

1) What you see is what you get

2) Cluster points by distance

3) More stable and faster

(Arya Senna Abdul Rachman A. 3D-LIDAR Multi Object Tracking for Autonomous Driving: Multi-target Detection and Tracking under Urban Road Uncertainties[J]. 2017.)

▍Problems

1) Need to distinguish between static and motion which will impact prediction

2) But clusters between frames are unstable make it difficult