Deep Learning Research Papers for Robot Perception: Archive
This page contains historical and extended papers that were previously covered but have since been succeeded by newer methodologies.
Table of contents
- RGB-D Architectures
- Point Cloud Processing
- Object Pose, Geometry, SDF, Implicit surfaces
- Dense Descriptors, Category-level Representations
- Recurrent Networks and Object Tracking
- Visual Odometry and Localization
- Semantic Scene Graphs and Explicit Representations
- Neural Radiance Fields and Implicit Representations
- Datasets
- Self-Supervised Learning
- Grasp Pose Detection
- Tactile Perception for Grasping and Manipulation
- Pre-training for Robot Manipulation
- Perception Beyond Vision (And More Frontiers)
RGB-D Architectures
- A Unified Framework for Multi-View Multi-Class Object Pose Estimation, Li et al., 2018
- PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation, He et al., 2020
- Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation, Li et al., 2021
- 3D ShapeNets: A Deep Representation for Volumetric Shapes, Wu et al., 2015
- Multi-view Convolutional Neural Networks for 3D Shape Recognition, Su et al., 2015
- Volumetric and Multi-View CNNs for Object Classification on 3D Data, Qi et al., 2016
- Robust 6D Object Pose Estimation with Stochastic Congruent Sets, Mitash et al., 2018
- What’s Behind the Couch? Directed Ray Distance Functions (DRDF) for 3D Scene Reconstruction, Kulkarni et al., 2022
Point Cloud Processing
- PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation, Xu et al., 2018
- Just Go with the Flow: Self-Supervised Scene Flow Estimation, Mittal et al., 2019
- PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows, Yang et al., 2019
- 3D Object Detection with Pointformer, Pan et al., 2021
- Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories, Harley et al., 2022
Object Pose, Geometry, SDF, Implicit surfaces
- SUM: Sequential scene understanding and manipulation, Sui et al., 2017
- Implicit surface representations as layers in neural networks, Michalkiewicz et al., 2019
- Local Deep Implicit Functions for 3D Shape, Genova et al., 2020
- Implicit geometric regularization for learning shapes, Gropp et al., 2020
- TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation, Pan et al., 2022
- Improving Object Pose Estimation by Fusion With a Multimodal Prior – Utilizing Uncertainty-Based CNN Pipelines for Robotics, Richter-Klug et al., 2022
Dense Descriptors, Category-level Representations
- Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image, Lin et al., 2022
- Visual Descriptor Learning from Monocular Video, Deekshith et al., 2020
Recurrent Networks and Object Tracking
- DeepIM: Deep Iterative Matching for 6D Pose Estimation, Li et al., 2018
- PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking, Deng et al., 2019
- 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints, Wang et al., 2020
- The Unreasonable Effectiveness of Recurrent Neural Networks, Karpathy, 2015
- RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization, Xu et al., 2022
Visual Odometry and Localization
- Backprop KF: Learning Discriminative Deterministic State Estimators, Haarnoja et al., 2016
- Multimodal Sensor Fusion with Differentiable Filters, Lee et al., 2020
- Particle Filter Recurrent Neural Networks, Ma et al., 2019
- Differentiable Algorithm Networks for Composable Robot Learning, Karkus et al., 2019
- Chasing Ghosts: Instruction Following as Bayesian State Tracking, Anderson et al., 2019
- Differentiable Factor Graph Optimization for Learning Smoothers, Yi et al., 2021
- How to train your differentiable filter, Kloss et al., 2021
- Differentiable Nonparametric Belief Propagation, Opipari et al., 2021
- A Robot Web for Distributed Many-Device Localisation, Murai et al., 2022
Semantic Scene Graphs and Explicit Representations
- Image Retrieval using Scene Graphs, Johnson et al., 2015
- Semantic Robot Programming for Goal-Directed Manipulation in Cluttered Scenes, Zeng et al., 2018
- Semantic Linking Maps for Active Visual Object Search, Zeng et al., 2020
- RoboSherlock: Unstructured information processing for robot perception, Beetz et al., 2015
- Image Generation from Scene Graphs, Johnson et al., 2018
- Differentiable Scene Graphs, Raboh et al., 2020
Neural Radiance Fields and Implicit Representations
- iMAP: Implicit Mapping and Positioning in Real-Time, Sucar et al., 2021
- NARF22: Neural Articulated Radiance Fields for Configuration-Aware Rendering, Lewis et al., 2022
- NeRF Explosion 2020, Dellaert, 2020
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, Sitzmann et al., 2019
- Local Implicit Grid Representations for 3D Scenes, Jiang et al., 2020
- Convolutional occupancy networks, Peng et al., 2020
- Object-Centric Neural Scene Rendering, Guo et al., 2020
- INeRF: Inverting Neural Radiance Fields for Pose Estimation, Yen-Chen et al., 2021
- ILabel: Interactive Neural Scene Labelling, Zhi et al., 2021
- Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation, Simeonov et al., 2021
- BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering, Xiangli et al., 2021
- Block-NeRF: Scalable Large Scene Neural View Synthesis, Tancik et al., 2022
- NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields, Yen-Chen et al., 2022
Datasets
- Grounding Predicates through Actions, Migimatsu and Bohg, 2022
- All You Need is LUV: Unsupervised Collection of Labeled Images using Invisible UV Fluorescent Indicators, Thananjeyan et al., 2022
- TossingBot: Learning to Throw Arbitrary Objects, Zeng et al., 2019
- (NYU Depth v2) Indoor Segmentation and Support Inference from RGBD Images, Silberman et al., 2012
- SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite, Song et al., 2015
- YCB-Video Dataset, Xiang et al., 2018
- BOP: Benchmark for 6D Object Pose Estimation, Hodaň et al., 2019
- ProgressLabeller: Visual Data Stream Annotation for Training Object-Centric 3D Perception, Chen et al., 2022
- TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes, Xu et al., 2022
- Understanding Human Hands in Contact at Internet Scale, Shan et al., 2020
- Habitat-Matterport 3D Semantics Dataset, Yadav et al., 2022
- PartNet-Mobility Dataset
- Pybullet, a python module for physics simulation for games, robotics and machine learning, Coumans et al., 2015
- NVIDIA Isaac Sim
- SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation, Lin et al., 2020
Self-Supervised Learning
- Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, Lee et al., 2019
- VICRegL: Self-Supervised Learning of Local Visual Features, Bardes et al., 2022
- Fully Self-Supervised Class Awareness in Dense Object Descriptors, Hadjivelichkov and Kanoulas, 2022
- Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild, Zhang et al., 2022
Grasp Pose Detection
- Real-Time Grasp Detection Using Convolutional Neural Networks, Redmon and Angelova, 2015
- Using Geometry to Detect Grasps in 3D Point Clouds, ten Pas and Platt, 2015
- Sample Efficient Grasp Learning Using Equivariant Models, Zhu et al., 2022
- Deep Learning for Detecting Robotic Grasps, Lenz et al., 2013
- High precision grasp pose detection in dense clutter, Gualtieri et al., 2016
- GlassLoc: Plenoptic Grasp Pose Detection in Transparent Clutter, Zhou et al., 2019
- MetaGraspNet_v0: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis, Chen et al., 2021
- Grasp Learning: Models, Methods, and Performance, Platt, 2022
Tactile Perception for Grasping and Manipulation
- Tactile Object Pose Estimation from the First Touch with Geometric Contact Rendering, Bauza et al., 2020
- Visuotactile Affordances for Cloth Manipulation with Local Control, Sunil et al., 2022
- ShapeMap 3-D: Efficient shape mapping through dense touch and vision, Suresh et al., 2022
- The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?, Calandra et al., 2017
- Soft-bubble: A highly compliant dense geometry tactile sensor for robot manipulation, Alspach et al., 2019
- A Review of Tactile Information: Perception and Action Through Touch, Li et al., 2020
- TACTO: A Fast, Flexible, and Open-source Simulator for High-Resolution Vision-based Tactile Sensors, Wang et al., 2020
- Active Extrinsic Contact Sensing: Application to General Peg-in-Hole Insertion, Kim et al., 2021
- Active Visuo-Haptic Object Shape Completion, Rustler et al., 2022
- Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces, Kerr and Huang et al., 2022
- See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation, Li et al., 2022
- Learning to Grasp the Ungraspable with Emergent Extrinsic Dexterity, Zhou and Held, 2022
Pre-training for Robot Manipulation
- SORNet: Spatial Object-Centric Representations for Sequential Manipulation, Yuan et al., 2021
- Real-World Robot Learning with Masked Visual Pre-training, Radosavovic et al., 2022
- R3M: A Universal Visual Representation for Robot Manipulation, Nair et al., 2022
- Attention and Augmented Recurrent Neural Networks, Olah & Carter, 2016
- Feature-wise transformations, Dumoulin et al., 2018
- Masked Autoencoders Are Scalable Vision Learners, He et al., 2021
- Interactive Language: Talking to Robots in Real Time, Lynch et al., 2022
- Transformers are Adaptable Task Planners, Jain et al., 2022
Perception Beyond Vision (And More Frontiers)
- Pigeons (Columba livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images, Levenson et al., 2015
- Automatic color correction for 3D reconstruction of underwater scenes, Skinner et al., 2017
- Classification of Household Materials via Spectroscopy, Erickson et al., 2018
- Through-Wall Human Pose Estimation Using Radio Signals, Zhao et al., 2018
- A bio-hybrid odor-guided autonomous palm-sized air vehicle, Anderson et al., 2020
- Event-based, Direct Camera Tracking from a Photometric 3D Map using Nonlinear Optimization, Bryner et al., 2019
- SoundSpaces: Audio-Visual Navigation in 3D Environments, Chen et al., 2019
- Neural Implicit Surface Reconstruction using Imaging Sonar, Qadri et al., 2022
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, Simonyan et al., 2013
- The Building Blocks of Interpretability, Olah et al., 2018
- Multimodal Neurons in Artificial Neural Networks, Goh et al., 2021
- Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing, Raji et al., 2020
- Autonomous Tool Construction Using Part Shape and Attachment Prediction, Nair et al., 2019
- Parts-Based Articulated Object Localization in Clutter Using Belief Propagation, Pavlasek et al., 2020
- DensePose: Dense Human Pose Estimation In The Wild, Xiao et al., 2018
- FabricFlowNet: Bimanual Cloth Manipulation with a Flow-based Policy, Weng et al., 2021
- LIT: Light-field Inference of Transparency for Refractive Object Localization, Zhou et al., 2019
- Multi-modal Transfer Learning for Grasping Transparent and Specular Objects, Weng et al., 2020
- D-NeRF: Neural Radiance Fields for Dynamic Scenes, Pumarola et al., 2020
- 3D Neural Scene Representations for Visuomotor Control, Li et al., 2021
- HexPlane: A Fast Representation for Dynamic Scenes, Cao and Johnson, 2023
- Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks, Tolstaya et al., 2019
- A Gentle Introduction to Graph Neural Networks, Sanchez-Lengeling et al., 2021
- Understanding RL Vision, Hilton et al., 2020
- WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images, Li et al., 2017
- Differentiable Particle Filters through Conditional Normalizing Flow, Chen et al., 2021