PointNet, deep neural network that consumes point cloud.

Today I’ve read the paper “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation” by Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas.

What is PointNet?

PointNet is a neural network that directly consumes point cloud, unordered point set. While the architecture is simple, it provides an approach to object classification, part segmentation and semantic segmentation with a good performance.

Novelty of the study

While typical CNN requires volume data, like voxel, or a collection of images, these representations cause a lack of detail while the sampling process. So PointNet take point clouds directly as an input. The input is (x, y, z) coordinates of N points, which are given in no particular order.

Applications of PointNet

PointNet can perform well in 3 tasks below.

  • 3D Object Classification
  • 3D Object Part Segmentation
  • Semantic Segmentation in Scene
from PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation page1



Each one of input points is input to the same mlp(Multilayer perceptron), and features are extracted.

from PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation page3

Symmetric function

As a strategy to make a model invariant to input permutation, a symmetric function is used. A symmetric function is the function whose value is the same no matter the order of the given n arguments, n points in this case. In this model, the symmetric function is mlp network and max-pooling which aggregates point features.

Local and global information aggregation

Both the local and global information is required for point segmentation. PointNet concatenate the global feature with point features and extract new per point features.

Invariance to the transformation

The semantic labeling need to be invariant to geometric transformations of shapes. PointNet predicts an affine transformation matrix by a mini-network (T-Net). The mini-network is composed by basic modules of feature extraction, max pooling and fully connected layers.


They visualize critical point sets and upper-bound shapes in this paper. It enables us to summarize an input point cloud by a sparse set of key points.