Deep Learning Research Papers for Robot Perception: Archive

This page contains historical and extended papers that were previously covered but have since been succeeded by newer methodologies.

Table of contents

RGB-D Architectures
Point Cloud Processing
Object Pose, Geometry, SDF, Implicit surfaces
Dense Descriptors, Category-level Representations
Recurrent Networks and Object Tracking
Visual Odometry and Localization
Semantic Scene Graphs and Explicit Representations
Neural Radiance Fields and Implicit Representations
Datasets
Self-Supervised Learning
Grasp Pose Detection
Tactile Perception for Grasping and Manipulation
Pre-training for Robot Manipulation
Perception Beyond Vision (And More Frontiers)

RGB-D Architectures

A Unified Framework for Multi-View Multi-Class Object Pose Estimation, Li et al., 2018
PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation, He et al., 2020
Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation, Li et al., 2021
3D ShapeNets: A Deep Representation for Volumetric Shapes, Wu et al., 2015
Multi-view Convolutional Neural Networks for 3D Shape Recognition, Su et al., 2015
Volumetric and Multi-View CNNs for Object Classification on 3D Data, Qi et al., 2016
Robust 6D Object Pose Estimation with Stochastic Congruent Sets, Mitash et al., 2018
What’s Behind the Couch? Directed Ray Distance Functions (DRDF) for 3D Scene Reconstruction, Kulkarni et al., 2022

Point Cloud Processing

PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation, Xu et al., 2018
Just Go with the Flow: Self-Supervised Scene Flow Estimation, Mittal et al., 2019
PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows, Yang et al., 2019
3D Object Detection with Pointformer, Pan et al., 2021
Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories, Harley et al., 2022

Object Pose, Geometry, SDF, Implicit surfaces

SUM: Sequential scene understanding and manipulation, Sui et al., 2017
Implicit surface representations as layers in neural networks, Michalkiewicz et al., 2019
Local Deep Implicit Functions for 3D Shape, Genova et al., 2020
Implicit geometric regularization for learning shapes, Gropp et al., 2020
TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation, Pan et al., 2022
Improving Object Pose Estimation by Fusion With a Multimodal Prior – Utilizing Uncertainty-Based CNN Pipelines for Robotics, Richter-Klug et al., 2022

Dense Descriptors, Category-level Representations

Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image, Lin et al., 2022
Visual Descriptor Learning from Monocular Video, Deekshith et al., 2020

Recurrent Networks and Object Tracking

DeepIM: Deep Iterative Matching for 6D Pose Estimation, Li et al., 2018
PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking, Deng et al., 2019
6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints, Wang et al., 2020
The Unreasonable Effectiveness of Recurrent Neural Networks, Karpathy, 2015
RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization, Xu et al., 2022

Visual Odometry and Localization

Backprop KF: Learning Discriminative Deterministic State Estimators, Haarnoja et al., 2016
Multimodal Sensor Fusion with Differentiable Filters, Lee et al., 2020
Particle Filter Recurrent Neural Networks, Ma et al., 2019
Differentiable Algorithm Networks for Composable Robot Learning, Karkus et al., 2019
Chasing Ghosts: Instruction Following as Bayesian State Tracking, Anderson et al., 2019
Differentiable Factor Graph Optimization for Learning Smoothers, Yi et al., 2021
How to train your differentiable filter, Kloss et al., 2021
Differentiable Nonparametric Belief Propagation, Opipari et al., 2021
A Robot Web for Distributed Many-Device Localisation, Murai et al., 2022

Semantic Scene Graphs and Explicit Representations

Image Retrieval using Scene Graphs, Johnson et al., 2015
Semantic Robot Programming for Goal-Directed Manipulation in Cluttered Scenes, Zeng et al., 2018
Semantic Linking Maps for Active Visual Object Search, Zeng et al., 2020
RoboSherlock: Unstructured information processing for robot perception, Beetz et al., 2015
Image Generation from Scene Graphs, Johnson et al., 2018
Differentiable Scene Graphs, Raboh et al., 2020

Neural Radiance Fields and Implicit Representations

iMAP: Implicit Mapping and Positioning in Real-Time, Sucar et al., 2021
NARF22: Neural Articulated Radiance Fields for Configuration-Aware Rendering, Lewis et al., 2022
NeRF Explosion 2020, Dellaert, 2020
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, Sitzmann et al., 2019
Local Implicit Grid Representations for 3D Scenes, Jiang et al., 2020
Convolutional occupancy networks, Peng et al., 2020
Object-Centric Neural Scene Rendering, Guo et al., 2020
INeRF: Inverting Neural Radiance Fields for Pose Estimation, Yen-Chen et al., 2021
ILabel: Interactive Neural Scene Labelling, Zhi et al., 2021
Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation, Simeonov et al., 2021
BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering, Xiangli et al., 2021
Block-NeRF: Scalable Large Scene Neural View Synthesis, Tancik et al., 2022
NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields, Yen-Chen et al., 2022

Datasets

Grounding Predicates through Actions, Migimatsu and Bohg, 2022
All You Need is LUV: Unsupervised Collection of Labeled Images using Invisible UV Fluorescent Indicators, Thananjeyan et al., 2022
TossingBot: Learning to Throw Arbitrary Objects, Zeng et al., 2019
(NYU Depth v2) Indoor Segmentation and Support Inference from RGBD Images, Silberman et al., 2012
SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite, Song et al., 2015
YCB-Video Dataset, Xiang et al., 2018
BOP: Benchmark for 6D Object Pose Estimation, Hodaň et al., 2019
ProgressLabeller: Visual Data Stream Annotation for Training Object-Centric 3D Perception, Chen et al., 2022
TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes, Xu et al., 2022
Understanding Human Hands in Contact at Internet Scale, Shan et al., 2020
Habitat-Matterport 3D Semantics Dataset, Yadav et al., 2022
PartNet-Mobility Dataset
Pybullet, a python module for physics simulation for games, robotics and machine learning, Coumans et al., 2015
NVIDIA Isaac Sim
SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation, Lin et al., 2020

Self-Supervised Learning

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, Lee et al., 2019
VICRegL: Self-Supervised Learning of Local Visual Features, Bardes et al., 2022
Fully Self-Supervised Class Awareness in Dense Object Descriptors, Hadjivelichkov and Kanoulas, 2022
Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild, Zhang et al., 2022

Grasp Pose Detection

Real-Time Grasp Detection Using Convolutional Neural Networks, Redmon and Angelova, 2015
Using Geometry to Detect Grasps in 3D Point Clouds, ten Pas and Platt, 2015
Sample Efficient Grasp Learning Using Equivariant Models, Zhu et al., 2022
Deep Learning for Detecting Robotic Grasps, Lenz et al., 2013
High precision grasp pose detection in dense clutter, Gualtieri et al., 2016
GlassLoc: Plenoptic Grasp Pose Detection in Transparent Clutter, Zhou et al., 2019
MetaGraspNet_v0: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis, Chen et al., 2021
Grasp Learning: Models, Methods, and Performance, Platt, 2022

Tactile Perception for Grasping and Manipulation

Tactile Object Pose Estimation from the First Touch with Geometric Contact Rendering, Bauza et al., 2020
Visuotactile Affordances for Cloth Manipulation with Local Control, Sunil et al., 2022
ShapeMap 3-D: Efficient shape mapping through dense touch and vision, Suresh et al., 2022
The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?, Calandra et al., 2017
Soft-bubble: A highly compliant dense geometry tactile sensor for robot manipulation, Alspach et al., 2019
A Review of Tactile Information: Perception and Action Through Touch, Li et al., 2020
TACTO: A Fast, Flexible, and Open-source Simulator for High-Resolution Vision-based Tactile Sensors, Wang et al., 2020
Active Extrinsic Contact Sensing: Application to General Peg-in-Hole Insertion, Kim et al., 2021
Active Visuo-Haptic Object Shape Completion, Rustler et al., 2022
Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces, Kerr and Huang et al., 2022
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation, Li et al., 2022
Learning to Grasp the Ungraspable with Emergent Extrinsic Dexterity, Zhou and Held, 2022

Pre-training for Robot Manipulation

SORNet: Spatial Object-Centric Representations for Sequential Manipulation, Yuan et al., 2021
Real-World Robot Learning with Masked Visual Pre-training, Radosavovic et al., 2022
R3M: A Universal Visual Representation for Robot Manipulation, Nair et al., 2022
Attention and Augmented Recurrent Neural Networks, Olah & Carter, 2016
Feature-wise transformations, Dumoulin et al., 2018
Masked Autoencoders Are Scalable Vision Learners, He et al., 2021
Interactive Language: Talking to Robots in Real Time, Lynch et al., 2022
Transformers are Adaptable Task Planners, Jain et al., 2022

Perception Beyond Vision (And More Frontiers)

Pigeons (Columba livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images, Levenson et al., 2015
Automatic color correction for 3D reconstruction of underwater scenes, Skinner et al., 2017
Classification of Household Materials via Spectroscopy, Erickson et al., 2018
Through-Wall Human Pose Estimation Using Radio Signals, Zhao et al., 2018
A bio-hybrid odor-guided autonomous palm-sized air vehicle, Anderson et al., 2020
Event-based, Direct Camera Tracking from a Photometric 3D Map using Nonlinear Optimization, Bryner et al., 2019
SoundSpaces: Audio-Visual Navigation in 3D Environments, Chen et al., 2019
Neural Implicit Surface Reconstruction using Imaging Sonar, Qadri et al., 2022
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, Simonyan et al., 2013
The Building Blocks of Interpretability, Olah et al., 2018
Multimodal Neurons in Artificial Neural Networks, Goh et al., 2021
Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing, Raji et al., 2020
Autonomous Tool Construction Using Part Shape and Attachment Prediction, Nair et al., 2019
Parts-Based Articulated Object Localization in Clutter Using Belief Propagation, Pavlasek et al., 2020
DensePose: Dense Human Pose Estimation In The Wild, Xiao et al., 2018
FabricFlowNet: Bimanual Cloth Manipulation with a Flow-based Policy, Weng et al., 2021
LIT: Light-field Inference of Transparency for Refractive Object Localization, Zhou et al., 2019
Multi-modal Transfer Learning for Grasping Transparent and Specular Objects, Weng et al., 2020
D-NeRF: Neural Radiance Fields for Dynamic Scenes, Pumarola et al., 2020
3D Neural Scene Representations for Visuomotor Control, Li et al., 2021
HexPlane: A Fast Representation for Dynamic Scenes, Cao and Johnson, 2023
Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks, Tolstaya et al., 2019
A Gentle Introduction to Graph Neural Networks, Sanchez-Lengeling et al., 2021
Understanding RL Vision, Hilton et al., 2020
WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images, Li et al., 2017
Differentiable Particle Filters through Conditional Normalizing Flow, Chen et al., 2021