This has been discussed several times. A partial implementation of shared memory message passing was implemented as an experiment.
It turns out that the main overhead is serializing and deserializing the data, not copying the bytes. The shared memory transport still incurs that, so the speed-up was disappointing.
A better solution for high throughput applications is using nodelets, which allow zero-copy transport within a single address space while still providing standard ROS message semantics to nodes in other processes or machines.