I've been trying to figure out the same thing. Surprisingly there is no clear cut instruction anywhere. Anyhow, I wanted to convert 2D pixel coordinates (u,v) to X,Y,Z from a point cloud that I got from kinect. I wrote a function to do it, I pass in the point cloud message, u and v coordinates (for a feature in 2D image) and pass a reference to a geometry_msgs point which will get the X,Y,Z values. These X,Y,Z values are in the camera's frame, (X is seen as going from left to right in the image plane, Y is top to bottom and Z pointing into the world).
#include <ros/ros.h>
#include <geometry_msgs/Point.h>
#include <sensor_msgs/PointCloud2.h>
/**
Function to convert 2D pixel point to 3D point by extracting point
from PointCloud2 corresponding to input pixel coordinate. This function
can be used to get the X,Y,Z coordinates of a feature using an
RGBD camera, e.g., Kinect.
*/
void pixelTo3DPoint(const sensor_msgs::PointCloud2 pCloud, const int u, const int v, geometry_msgs::Point &p)
{
// get width and height of 2D point cloud data
int width = pCloud.width;
int height = pCloud.height;
// Convert from u (column / width), v (row/height) to position in array
// where X,Y,Z data starts
int arrayPosition = v*pCloud.row_step + u*pCloud.point_step;
// compute position in array where x,y,z data start
int arrayPosX = arrayPosition + pCloud.fields[0].offset; // X has an offset of 0
int arrayPosY = arrayPosition + pCloud.fields[1].offset; // Y has an offset of 4
int arrayPosZ = arrayPosition + pCloud.fields[2].offset; // Z has an offset of 8
float X = 0.0;
float Y = 0.0;
float Z = 0.0;
memcpy(&X, &pCloud.data[arrayPosX], sizeof(float));
memcpy(&Y, &pCloud.data[arrayPosY], sizeof(float));
memcpy(&Z, &pCloud.data[arrayPosZ], sizeof(float));
// put data into the point p
p.x = X;
p.y = Y;
p.z = Z;
}