ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange |
1 | initial version |
I managed to compile and run with success, but I discourage this approach, do I will not close the question until I found a better way to solve the problem. A normal system update will ruin this, and I'm not sure if this will not have any consequences in the future when I need to use BOOST library. Anyway, here's my approach:
As I said, in the first place you need to add this line in the beginning of your code:
#define __CUCACC__
#include <ros/ros.h>
A normal catkin_make would give two errors with this approach. For the first one:
sudo nano /usr/include/boost/type_traits/is_floating_point.hpp
Replace this line:
#if defined(BOOST_HAS_FLOAT128)
with:
#if defined(BOOST_HAS_FLOAT128) && !defined(__PGI)
For the following errors:
sudo nano /usr/include/boost/core/swap.hpp
Comment all the line with:
BOOST_GPU_ENABLED
There should be 3 lines.
With this I compiled and run the code. I added a pi generator example to test the speed in CPU and GPU. If someone wants to test..:
#define __CUDACC__
#include <ros/ros.h>
#include <iostream>
#include "std_msgs/String.h"
#include <sstream>
#define N 2000000000
#define vl 1024
int main(int argc, char **argv)
{
ros::init(argc, argv, "pgi_test_node");
ros::NodeHandle n;
ros::Publisher chatter_pub = n.advertise<std_msgs::String>("chatter", 1000);
ros::Rate loop_rate(10);
int count = 0;
while (ros::ok())
{
std_msgs::String msg;
std::stringstream ss;
ss << "hello world " << count;
msg.data = ss.str();
ROS_INFO("%s", msg.data.c_str());
double pi = 0.0f;
long long i;
#pragma acc parallel vector_length(vl)
#pragma acc loop reduction(+:pi)
for (i=0; i<N; i++) {
double t= (double)((i+0.5)/N);
pi +=4.0/(1.0+t*t);
}
printf("pi=%11.10f\n", pi/N);
chatter_pub.publish(msg);
ros::spinOnce();
loop_rate.sleep();
++count;
}
return 0;
}
Timers are not even necessary, if you just comment the pragmas, the loop will run on CPU and you can clearly see the difference.
2 | No.2 Revision |
I managed to compile and run with success, but I discourage this approach, do I will not close the question until I found a better way to solve the problem. A normal system update will ruin this, and I'm not sure if this will not have any consequences in the future when I need to use BOOST library. Anyway, here's my approach:
As I said, in the first place you need to add this line in the beginning of your code:
#define __CUCACC__
#include <ros/ros.h>
A normal catkin_make would give two errors with this approach. For the first one:
one:
sudo nano /usr/include/boost/type_traits/is_floating_point.hpp
Replace this line:
line:
#if defined(BOOST_HAS_FLOAT128)
with:
with:
#if defined(BOOST_HAS_FLOAT128) && !defined(__PGI)
For the following errors:
errors:
sudo nano /usr/include/boost/core/swap.hpp
Comment all the line with:
with:
BOOST_GPU_ENABLED
There should be 3 lines.
lines.
With this I compiled and run the code. I added a pi generator example to test the speed in CPU and GPU. If someone wants to test..:
test..:
#define __CUDACC__
#include <ros/ros.h>
#include <iostream>
#include "std_msgs/String.h"
#include <sstream>
#define N 2000000000
#define vl 1024
int main(int argc, char **argv)
{
ros::init(argc, argv, "pgi_test_node");
ros::NodeHandle n;
ros::Publisher chatter_pub = n.advertise<std_msgs::String>("chatter", 1000);
ros::Rate loop_rate(10);
int count = 0;
while (ros::ok())
{
std_msgs::String msg;
std::stringstream ss;
ss << "hello world " << count;
msg.data = ss.str();
ROS_INFO("%s", msg.data.c_str());
double pi = 0.0f;
long long i;
#pragma acc parallel vector_length(vl)
#pragma acc loop reduction(+:pi)
for (i=0; i<N; i++) {
double t= (double)((i+0.5)/N);
pi +=4.0/(1.0+t*t);
}
printf("pi=%11.10f\n", pi/N);
chatter_pub.publish(msg);
ros::spinOnce();
loop_rate.sleep();
++count;
}
return 0;
}
Timers are not even necessary, if you just comment the pragmas, the loop will run on CPU and you can clearly see the difference.
3 | No.3 Revision |
I managed to compile and run with success, but I discourage this approach, do so I will not close the question until I found a better way to solve the problem. A normal system update will ruin this, and I'm not sure if this will not have any consequences in the future when I need to use BOOST library. Anyway, here's my approach:
As I said, in the first place you need to add this line in the beginning of your code:
#define __CUCACC__
#include <ros/ros.h>
A normal catkin_make would give two errors with this approach. For the first one:
sudo nano /usr/include/boost/type_traits/is_floating_point.hpp
Replace this line:
#if defined(BOOST_HAS_FLOAT128)
with:
#if defined(BOOST_HAS_FLOAT128) && !defined(__PGI)
For the following errors:
sudo nano /usr/include/boost/core/swap.hpp
Comment all the line with:
BOOST_GPU_ENABLED
There should be 3 lines.
With this I compiled and run the code. I added a pi generator example to test the speed in CPU and GPU. If someone wants to test..:
#define __CUDACC__
#include <ros/ros.h>
#include <iostream>
#include "std_msgs/String.h"
#include <sstream>
#define N 2000000000
#define vl 1024
int main(int argc, char **argv)
{
ros::init(argc, argv, "pgi_test_node");
ros::NodeHandle n;
ros::Publisher chatter_pub = n.advertise<std_msgs::String>("chatter", 1000);
ros::Rate loop_rate(10);
int count = 0;
while (ros::ok())
{
std_msgs::String msg;
std::stringstream ss;
ss << "hello world " << count;
msg.data = ss.str();
ROS_INFO("%s", msg.data.c_str());
double pi = 0.0f;
long long i;
#pragma acc parallel vector_length(vl)
#pragma acc loop reduction(+:pi)
for (i=0; i<N; i++) {
double t= (double)((i+0.5)/N);
pi +=4.0/(1.0+t*t);
}
printf("pi=%11.10f\n", pi/N);
chatter_pub.publish(msg);
ros::spinOnce();
loop_rate.sleep();
++count;
}
return 0;
}
Timers are not even necessary, if you just comment the pragmas, the loop will run on CPU and you can clearly see the difference.