Deep Learning Research Papers for Robot Perception

A collection of deep learning research papers with coverage in perception and associated robotic tasks. Within each research area outlined below, the course staff has identified a core and extended set of research papers. The core set of papers will form the basis of our seminar-style lectures starting in week 8. The extended set provides additional coverage of even more exciting work being done within each area.


Table of contents

  1. RGB-D Architectures
  2. Point Cloud Processing
  3. Object Pose, Geometry, SDF, Implicit surfaces
  4. Dense Descriptors, Category-level Representations
  5. Recurrent Networks and Object Tracking
  6. Visual Odometry and Localization
  7. Semantic Scene Graphs and Explicit Representations
  8. Neural Radiance Fields and Implicit Representations
  9. Datasets
  10. Self-Supervised Learning
  11. Grasp Pose Detection
  12. Tactile Perception for Grasping and Manipulation
  13. Pre-training for Robot Manipulation
  14. Perception Beyond Vision
  15. More Frontiers
    1. Interpreting Deep Learning Models
    2. Fairness and Ethics
    3. Certifiable Perception
    4. Articulated Objects
    5. Deformable Objects
    6. Transparent Objects
    7. Dynamic Scenes
    8. Beyond 2D Convolutions
    9. Reinforcement Learning
    10. Generative Modeling

RGB-D Architectures

Scheduled Week 8, Lec 14

Core List

  1. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes, Xiang et al., 2018

  2. A Unified Framework for Multi-View Multi-Class Object Pose Estimation, Li et al., 2018

  3. PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation, He et al., 2020

  4. Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation, Li et al., 2021

Extended List

Point Cloud Processing

Scheduled Week 8, Lec 15

Core List

  1. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, Qi et al., 2017

  2. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, Qi et al., 2017

  3. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation, Xu et al., 2018

  4. DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion, Wang et al., 2019

Extended List

Object Pose, Geometry, SDF, Implicit surfaces

Scheduled Week 9, Lec 16

Core List

  1. SUM: Sequential scene understanding and manipulation, Sui et al., 2017

  2. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation, Park et al., 2019

  3. Implicit surface representations as layers in neural networks, Michalkiewicz et al., 2019

  4. iSDF: Real-Time Neural Signed Distance Fields for Robot Perception, Oriz et al., 2022

Extended List

Dense Descriptors, Category-level Representations

Scheduled Week 9, Lec 17

Core List

  1. Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation, Florence et al., 2018

  2. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation, Wang et al., 2019

  3. kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation, Manuelli et al., 2019

  4. Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image, Lin et al., 2022

Extended List

Recurrent Networks and Object Tracking

Scheduled Week 10, Lec 18

Core List

  1. DeepIM: Deep Iterative Matching for 6D Pose Estimation, Li et al., 2018

  2. PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking, Deng et al., 2019

  3. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints, Wang et al., 2020

  4. XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model, Cheng and Schwing, 2022

Extended List

Visual Odometry and Localization

Scheduled Week 10, Lec 19

Core List

  1. Backprop KF: Learning Discriminative Deterministic State Estimators, Haarnoja et al., 2016

  2. Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors, Jonschkowski et al., 2018

  3. Multimodal Sensor Fusion with Differentiable Filters, Lee et al., 2020

  4. Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation, Karkus et al., 2021

Extended List

Semantic Scene Graphs and Explicit Representations

Scheduled Week 11, Lec 20

Core List

  1. Image Retrieval using Scene Graphs, Johnson et al., 2015

  2. Semantic Robot Programming for Goal-Directed Manipulation in Cluttered Scenes, Zeng et al., 2018

  3. Semantic Linking Maps for Active Visual Object Search, Zeng et al., 2020

  4. Hydra: A Real-time Spatial Perception System for 3D Scene Graph Construction and Optimization, Hughes et al., 2022

Extended List

Neural Radiance Fields and Implicit Representations

Scheduled Week 11, Lec 21

Core List

  1. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, Mildenhall et al., 2020

  2. iMAP: Implicit Mapping and Positioning in Real-Time, Sucar et al., 2021

  3. NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields, Rosinol et al., 2022

  4. NARF22: Neural Articulated Radiance Fields for Configuration-Aware Rendering, Lewis et al., 2022

  5. Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, Shen et al., 2023

Extended List

Datasets

Scheduled Week 12, Lec 22

Core List

  1. Deep Learning for Robots: Learning from Large-Scale Interaction, Levine et al., 2016

  2. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning, Makoviychuk et al., 2021

  3. Grounding Predicates through Actions, Migimatsu and Bohg, 2022

  4. All You Need is LUV: Unsupervised Collection of Labeled Images using Invisible UV Fluorescent Indicators, Thananjeyan et al., 2022

Extended List

Collecting data with robots

RGB-D Datasets

Semantic Datasets

Object Model Datasets

Simulators

Self-Supervised Learning

Scheduled Week 12, Lec 23

Core List

  1. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, Lee et al., 2019

  2. VICRegL: Self-Supervised Learning of Local Visual Features, Bardes et al., 2022

  3. Fully Self-Supervised Class Awareness in Dense Object Descriptors, Hadjivelichkov and Kanoulas, 2022

  4. Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild, Zhang et al., 2022

Extended List

Grasp Pose Detection

Scheduled Week 13, Lec 24

Core List

  1. Real-Time Grasp Detection Using Convolutional Neural Networks, Redmon and Angelova, 2015

  2. Using Geometry to Detect Grasps in 3D Point Clouds, ten Pas and Platt, 2015

  3. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics, Mahler et al., 2017

  4. Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes, Sundermeyer et al., 2021

  5. Sample Efficient Grasp Learning Using Equivariant Models, Zhu et al., 2022

Extended List

Tactile Perception for Grasping and Manipulation

Scheduled Week 13, Lec 25

Core List

  1. More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch, Calandra et al., 2018

  2. Tactile Object Pose Estimation from the First Touch with Geometric Contact Rendering, Bauza et al., 2020

  3. Visuotactile Affordances for Cloth Manipulation with Local Control, Sunil et al., 2022

  4. ShapeMap 3-D: Efficient shape mapping through dense touch and vision, Suresh et al., 2022

Extended List

Pre-training for Robot Manipulation

Scheduled Week 14, Lec 26

Core List

  1. SORNet: Spatial Object-Centric Representations for Sequential Manipulation, Yuan et al., 2021

  2. CLIPort: What and Where Pathways for Robotic Manipulation, Shridhar et al., 2021

  3. Real-World Robot Learning with Masked Visual Pre-training, Radosavovic et al., 2022

  4. R3M: A Universal Visual Representation for Robot Manipulation, Nair et al., 2022

  5. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, Ahn et al., 2022

  6. RT-1: Robotics Transformer for Real-World Control at Scale, Brohan et al., 2022

Extended List

Perception Beyond Vision

Specialized Sensors

More Frontiers

Scheduled Week 14, Lec 27

Interpreting Deep Learning Models

Fairness and Ethics

Certifiable Perception

Articulated Objects

Deformable Objects

Transparent Objects

Dynamic Scenes

Beyond 2D Convolutions

Reinforcement Learning

Generative Modeling