Localization and Mapping in Autonomous Driving
in 3D Vision
Knowledge Tree for SDC Localization and Mapping
Autonomous Driving is a brand-new area for me to explore. It is a technology that combining all the cutting-edge technologies together, from 5G and cloud to CV and AI. I think it will be the most ground-breaking product of our time. It can be viewed as a very primitive step of the intelligent machine. Look forward to the human history to the future, maybe the ultimate task of the our generation is to deliver such technology. So it seems pretty reasonable and necessary for me to spend a weekend to understand how self-driving cars works. Here is what I learned.
Self-driving Basics
- Levels of SDC: I used to have misunderstandings for this concept. I thought the level is determined based on the technology used. But it actually defined based on the human involvement of the whole driving process. Current industry are targeting L4 SDC
- SDC is also classified into 3 different use cases:
- Passenger car: Safety first, large scale operation, traffic rules and complex road scenario.
- Delivery car: Less safety and comfortable concerns. Less accuracy LM
- Cleaning car: Aim to cover the ground of a whole area
- Besides the car itself, SDC basically adding 5 new tech:
- Perception: Real-time perceive the dynamic objects on the road, like pedestrian or other cars
- Localization and Mapping: Sensor fusion, SLAM, map building, HD map.
- Planning and Control: Route arranging, control and navigation, safety and comfort.
- Simulation: Physically based simulation env. Test env for all components of SDC
- In-car user experience: Next generation in-car experience. Safety monitoring and entertainment. This is very important, once we achieve L5, we need to find something for human to do in the car
- ADAS: Advanced Driver Assistance Systems. Traditional smart drive technology. ADASIS: interface for ADAS. SDC technology is an addOn to the ADAS.
Localization and Mapping (LM)
- LM is a broad concept in robotics. It apply to any kind of the Robotics system. Here list some basic differences between AR glasses and SDC SLAM:
- AR: map is relative small and unknown, using VO/VIO, frontend is for rendering base on real-time headpose, backend is for optimize the map, FPS is top req
- SDC LM: map is large scale and use prior cloud HD Map, life-long SLAM, using LiDAR, safety is top req, frontend is for localization in the HD map. Usually no need to optimize/update the map.
- SDC LM sensors:
- GNSS and RTK: GPS is not always guaranteed to be stable
- IMU, Wheel speed and rotation: Dead Reckoning, but have accumulated drifting error
- Camera: Vision for detect semantic loop closure and road segmentation, using deep learning to do semantic segmentation
- LiDAR: 360 degree Point cloud, work bad in fog or rainy day
- Ultrasonic and Radar: works well in fog or rainy day
Mapping
- HD Map: High Definition Map. Resolution < 10cm. It contains different levels of details
- Road Topology Map: Use for basic navigation
- Lane Map: High accuracy lane geographical model. Use for path planning
- Landmark Map: Use for storing the sign/traffic light/other road info
- Localization Feature Map: LiDAR point cloud or features for localization, like corners of line splitter
- Dynamic Map: store some info like weather and construction site
- Map need annotation. Like the sign on the road, the bump of the road and any other road information that cannot directly obtain from point cloud. Here are some ways of labeling:
- Manually Labelling
- Satellite Yaw Scan mapping
- AI annotation
- HD Map Annotation content:
- Road boundary estimation
- Lane Detection
- Crosswalk estimization
- Lane/road topology
- HD Map stored follow the OpenDrive spec
- Map compression, map can be very big and cannot download on the fly. So we can compress some redundant info
- OctoMap: Divide the point cloud space to tree structure grid, like BVH.
- Point Cloud Compression (Voxel Filtering): for a single grid, we can compress all points into 1 point. So that we reduced the feature map
- Occupancy Grid Map: Occupied, Idle, Unknown. Probability of occupied of a grid
- Download on the fly or at user home
- Map Engine: take care of map assembly
- Download using 5G on the fly
- If download at home then only download the HD Map for the route
- 3 ways to build the map
- Use professional LiDAR based collection car. The data is more trustable.
- Use any car that utilize the the HD Map. Camera based.
- Mix usage of both
- Map alignment: some times, due to drifting issue, the 2 slices of map have certain offset. We might need to add more constrain when optimize the map to remove the offset, or just have human manually align it.
- Map Building: Offline, cloud, heavy computing.
- Offline SfM Pose graph optimization: optimize both car pose and the features.
- Partial/global BA update
- If feature matching result is bad. Use consistent data.
- Offline SfM Pose graph optimization: optimize both car pose and the features.
- LiDAR raw point cloud need to have prune, DL based method:
- Rain/fog removal
- Dynamic Object removal: car, people
- Point cloud for certain object like road sign need to be rectified
- Map always need verification through simulation before publish
- Testing
- Fleet on actual road
- Structure test on fake road
- Simulation
- Drive log playback
- Design test cases
- When start service at a new city, always need to collecting and test the map for 30 days
- Map update strategy:
- Need to use consistent data across different cars in the fleet and different times (loop closure). Such that we have an unsupervised map update mechanism
- When download slice on the fly, we might need some overlapping area between slices. So that car can do the gap close loop closure
- We should know the transform between adjacent slices
- Each slice will have it local coordinate system
- HD Map will be segmented into slices. And download slices base on request
- How about indoor map for parking structure?
- For LiDAR feature point, there is no descriptor.
- We can use filter to remove standalone point in the raw map
- HD Map update need to have OTA
- Other Constraints when BA optimize the pose graph:
- height: car should always on ground
- Use 3D geometry constraints to make things obey the rules
- Loop Closure
- Multilevel Loop Closure: see the same feature at a far distance
- Global constraints
- Manually Labeling constraints
- Map should only store the key frame data?
- When collecting data, scanner car need to make sure of loop closure
Localization
- Use the real-time LiDAR data to match the features in the ground truth HD Map, so that inference the car’s pose
- Horizontal localization: using the lane splitter
- Vertical localization: using the road signs
- Map building is mostly doing pose graph optimize features, Localization is using ICP and Particle Filter to estimate the car’s pose.
- ICP, Iterative Closest Points, 3D feature matching to 3D:
- Good thing of ICP is that the SVD solution (linear solution) can give us the global minimum of lose. So not really need to optimize over the pose graph. Unless it is for the map building, which need to refine the feature points and other states
- SVD solution of ICP check slambook p173
- Other methods for localization:
- NDT: normal Distribution Transform
- Particle Filter
- Grid search?
- LiDAR Odometry: fuse LiDAR with RTK and IMU
- GPS data might be bad during runtime. We can remove the GPS outliner.
- When localization, if the LiDAR is bad due to rain or fog, or blocked by side objects, DR (Dead Reconking) takes more weight when output prediection
Some Thoughts of SDC
- Since we still have a long road to go before L5, I can see there are some controversial concepts in SDC. They are reasonable for current stage. However, ultimately they may block the truly self-driving car’s evolvement.
- I see cars are adding in-car camera to monitoring driver’s attention on the road. But if this is a self driving car then why driver need attention on the road
- Also most company rely on HD Map to guide the car driving. If we want the car move itself like a human driver, then it should learn the road by itself.
- Also when user arrive at some unknown area, HD Map will not work. Someone need to drive the scanner car to there and scan it. So this makes SDC difficult to extend to larger scale.
- Assume we have L5 drive technology and have it for decades, in some special scenarios, user still need to take over the driving, which leads to the question that if user haven’t been driving for a long time, will they still capable of driving?
- HD Map requires a lot effort to labelling it. Make it feel like human is working for the machine instead.