Very slow frame rate in C++ N-body simulation using Barnes-Hutt and RK4












3














I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.



My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.



Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.



Here is some of my code:



Tree class:



class tree
{
public:
obj* body; // obj class stores the mass, velocity, position ans acceleration of an object
bool parent = false;
double total_mass;
region R; // region class defines boundaaries and can check if an object is in them
vect moment;


tree* nw, *ne, *sw, *se; // the children nodes of the tree

tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}

tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)
{
for (int i = 0; i < a.chest.size(); i++)
{
insert(a.chest[i]); // this constructer id called for the root node only
}
}

~tree()
{
delete nw;
delete sw;
delete ne;
delete se;
}

void insert(obj* i)
{
if (!R.in(i->pos)) // cant insert into a region its not in
{
return;
}
else
{
if (body == nullptr) // no other bodies in this region
{
if (!parent)
{
body = i; // region is not divides so you can insert particle here
}
else
{
nw->insert(i); //region is divided so try to insert into children nodes
ne->insert(i);
sw->insert(i);
se->insert(i);
}
}
else // trying to have more than one particle in a node
{

divide(); // splits node into its childrem

nw->insert(i);
ne->insert(i);
sw->insert(i);
se->insert(i);
nw->insert(body); // insert current object and the object that was previouly in the parent node into the children nodes
ne->insert(body);
sw->insert(body);
se->insert(body);

body = nullptr; // no longer bodies in the node
}
total_mass += i->mass;
moment += (i->pos) * (i->mass);
}
}

void divide()
{
double l = R.length / 2;

nw = new tree(R.point + vect(0, l), l);
ne = new tree(R.point + vect(l, l), l);
sw = new tree(R.point + vect(0, 0), l);
se = new tree(R.point + vect(l, 0), l);

parent = true;
}

vect COM()
{

return moment / total_mass;

}
};


Accelerator:



constexpr double theta = 0.5; //theta criterion
double G = 1 * pow(10,-11); // gravitational constant

void accelerate(obj& i, tree& t)
{
vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node
if (!t.parent) //checks if node is undivided
{
i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2();
}//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle

else
{
if (t.R.in(i.pos) || t.R.length / r.mag() > theta)
{
accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children
accelerate(i, *t.ne);
accelerate(i, *t.sw);
accelerate(i, *t.se);
}
else
{
i.a += r.norm() * G * t.total_mass / r.mag2();
}
}
}


RK4:



void move(cloud& a)  // cloud class stores an array of pointers to objects
{
tree* t1 = new tree(a); //adds objects in cloud to a new tree
for (obj* i : a.chest) //chest is the array of pointer to objects
{
accelerate(*i, *t1); //uses tree to refresh the accelration of the particle
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;

vect dr1 = i->v * h;
vect dv1 = i->a * h;

i->pos = i->ppos + dr1 / 2;
i->v = i->pv + dv1 / 2;
i->tpos += dr1 / 6;
i->tv += dv1 / 6;
}

delete t1;
tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration

for (obj* i : a.chest)
{
accelerate(*i, *t2);

vect dr2 = i->v * h;
vect dv2 = i->a * h;

i->pos = i->ppos + dr2 / 2;
i->v = i->pv + dv2 / 2;
i->tpos += dr2 / 3;
i->tv += dv2 / 3;
}

delete t2;
tree* t3 = new tree(a);

for (obj* i : a.chest)
{
accelerate(*i, *t3);

vect dr3 = i->v * h;
vect dv3 = i->a * h;

i->pos = i->ppos + dr3;
i->v = i->pv + dv3;
i->tpos += dr3 / 3;
i->tv += dv3 / 3;
}

delete t3;
tree* t4 = new tree(a);

for (obj* i : a.chest)
{
accelerate(*i, *t4);

vect dr4 = i->v * h;
vect dv4 = i->a * h;

i->tpos += dr4 / 6;
i->tv += dv4 / 6;

i->pos = i->tpos;
i->v = i->tv;
i->a = i->pa;

}

delete t4;
}


Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?










share|improve this question









New contributor




Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 2




    There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
    – Hulk
    yesterday










  • (Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
    – greybeard
    yesterday










  • Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
    – bruglesco
    yesterday






  • 1




    Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
    – Cris Luengo
    yesterday










  • The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
    – Callum
    yesterday
















3














I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.



My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.



Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.



Here is some of my code:



Tree class:



class tree
{
public:
obj* body; // obj class stores the mass, velocity, position ans acceleration of an object
bool parent = false;
double total_mass;
region R; // region class defines boundaaries and can check if an object is in them
vect moment;


tree* nw, *ne, *sw, *se; // the children nodes of the tree

tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}

tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)
{
for (int i = 0; i < a.chest.size(); i++)
{
insert(a.chest[i]); // this constructer id called for the root node only
}
}

~tree()
{
delete nw;
delete sw;
delete ne;
delete se;
}

void insert(obj* i)
{
if (!R.in(i->pos)) // cant insert into a region its not in
{
return;
}
else
{
if (body == nullptr) // no other bodies in this region
{
if (!parent)
{
body = i; // region is not divides so you can insert particle here
}
else
{
nw->insert(i); //region is divided so try to insert into children nodes
ne->insert(i);
sw->insert(i);
se->insert(i);
}
}
else // trying to have more than one particle in a node
{

divide(); // splits node into its childrem

nw->insert(i);
ne->insert(i);
sw->insert(i);
se->insert(i);
nw->insert(body); // insert current object and the object that was previouly in the parent node into the children nodes
ne->insert(body);
sw->insert(body);
se->insert(body);

body = nullptr; // no longer bodies in the node
}
total_mass += i->mass;
moment += (i->pos) * (i->mass);
}
}

void divide()
{
double l = R.length / 2;

nw = new tree(R.point + vect(0, l), l);
ne = new tree(R.point + vect(l, l), l);
sw = new tree(R.point + vect(0, 0), l);
se = new tree(R.point + vect(l, 0), l);

parent = true;
}

vect COM()
{

return moment / total_mass;

}
};


Accelerator:



constexpr double theta = 0.5; //theta criterion
double G = 1 * pow(10,-11); // gravitational constant

void accelerate(obj& i, tree& t)
{
vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node
if (!t.parent) //checks if node is undivided
{
i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2();
}//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle

else
{
if (t.R.in(i.pos) || t.R.length / r.mag() > theta)
{
accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children
accelerate(i, *t.ne);
accelerate(i, *t.sw);
accelerate(i, *t.se);
}
else
{
i.a += r.norm() * G * t.total_mass / r.mag2();
}
}
}


RK4:



void move(cloud& a)  // cloud class stores an array of pointers to objects
{
tree* t1 = new tree(a); //adds objects in cloud to a new tree
for (obj* i : a.chest) //chest is the array of pointer to objects
{
accelerate(*i, *t1); //uses tree to refresh the accelration of the particle
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;

vect dr1 = i->v * h;
vect dv1 = i->a * h;

i->pos = i->ppos + dr1 / 2;
i->v = i->pv + dv1 / 2;
i->tpos += dr1 / 6;
i->tv += dv1 / 6;
}

delete t1;
tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration

for (obj* i : a.chest)
{
accelerate(*i, *t2);

vect dr2 = i->v * h;
vect dv2 = i->a * h;

i->pos = i->ppos + dr2 / 2;
i->v = i->pv + dv2 / 2;
i->tpos += dr2 / 3;
i->tv += dv2 / 3;
}

delete t2;
tree* t3 = new tree(a);

for (obj* i : a.chest)
{
accelerate(*i, *t3);

vect dr3 = i->v * h;
vect dv3 = i->a * h;

i->pos = i->ppos + dr3;
i->v = i->pv + dv3;
i->tpos += dr3 / 3;
i->tv += dv3 / 3;
}

delete t3;
tree* t4 = new tree(a);

for (obj* i : a.chest)
{
accelerate(*i, *t4);

vect dr4 = i->v * h;
vect dv4 = i->a * h;

i->tpos += dr4 / 6;
i->tv += dv4 / 6;

i->pos = i->tpos;
i->v = i->tv;
i->a = i->pa;

}

delete t4;
}


Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?










share|improve this question









New contributor




Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 2




    There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
    – Hulk
    yesterday










  • (Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
    – greybeard
    yesterday










  • Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
    – bruglesco
    yesterday






  • 1




    Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
    – Cris Luengo
    yesterday










  • The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
    – Callum
    yesterday














3












3








3







I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.



My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.



Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.



Here is some of my code:



Tree class:



class tree
{
public:
obj* body; // obj class stores the mass, velocity, position ans acceleration of an object
bool parent = false;
double total_mass;
region R; // region class defines boundaaries and can check if an object is in them
vect moment;


tree* nw, *ne, *sw, *se; // the children nodes of the tree

tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}

tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)
{
for (int i = 0; i < a.chest.size(); i++)
{
insert(a.chest[i]); // this constructer id called for the root node only
}
}

~tree()
{
delete nw;
delete sw;
delete ne;
delete se;
}

void insert(obj* i)
{
if (!R.in(i->pos)) // cant insert into a region its not in
{
return;
}
else
{
if (body == nullptr) // no other bodies in this region
{
if (!parent)
{
body = i; // region is not divides so you can insert particle here
}
else
{
nw->insert(i); //region is divided so try to insert into children nodes
ne->insert(i);
sw->insert(i);
se->insert(i);
}
}
else // trying to have more than one particle in a node
{

divide(); // splits node into its childrem

nw->insert(i);
ne->insert(i);
sw->insert(i);
se->insert(i);
nw->insert(body); // insert current object and the object that was previouly in the parent node into the children nodes
ne->insert(body);
sw->insert(body);
se->insert(body);

body = nullptr; // no longer bodies in the node
}
total_mass += i->mass;
moment += (i->pos) * (i->mass);
}
}

void divide()
{
double l = R.length / 2;

nw = new tree(R.point + vect(0, l), l);
ne = new tree(R.point + vect(l, l), l);
sw = new tree(R.point + vect(0, 0), l);
se = new tree(R.point + vect(l, 0), l);

parent = true;
}

vect COM()
{

return moment / total_mass;

}
};


Accelerator:



constexpr double theta = 0.5; //theta criterion
double G = 1 * pow(10,-11); // gravitational constant

void accelerate(obj& i, tree& t)
{
vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node
if (!t.parent) //checks if node is undivided
{
i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2();
}//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle

else
{
if (t.R.in(i.pos) || t.R.length / r.mag() > theta)
{
accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children
accelerate(i, *t.ne);
accelerate(i, *t.sw);
accelerate(i, *t.se);
}
else
{
i.a += r.norm() * G * t.total_mass / r.mag2();
}
}
}


RK4:



void move(cloud& a)  // cloud class stores an array of pointers to objects
{
tree* t1 = new tree(a); //adds objects in cloud to a new tree
for (obj* i : a.chest) //chest is the array of pointer to objects
{
accelerate(*i, *t1); //uses tree to refresh the accelration of the particle
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;

vect dr1 = i->v * h;
vect dv1 = i->a * h;

i->pos = i->ppos + dr1 / 2;
i->v = i->pv + dv1 / 2;
i->tpos += dr1 / 6;
i->tv += dv1 / 6;
}

delete t1;
tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration

for (obj* i : a.chest)
{
accelerate(*i, *t2);

vect dr2 = i->v * h;
vect dv2 = i->a * h;

i->pos = i->ppos + dr2 / 2;
i->v = i->pv + dv2 / 2;
i->tpos += dr2 / 3;
i->tv += dv2 / 3;
}

delete t2;
tree* t3 = new tree(a);

for (obj* i : a.chest)
{
accelerate(*i, *t3);

vect dr3 = i->v * h;
vect dv3 = i->a * h;

i->pos = i->ppos + dr3;
i->v = i->pv + dv3;
i->tpos += dr3 / 3;
i->tv += dv3 / 3;
}

delete t3;
tree* t4 = new tree(a);

for (obj* i : a.chest)
{
accelerate(*i, *t4);

vect dr4 = i->v * h;
vect dv4 = i->a * h;

i->tpos += dr4 / 6;
i->tv += dv4 / 6;

i->pos = i->tpos;
i->v = i->tv;
i->a = i->pa;

}

delete t4;
}


Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?










share|improve this question









New contributor




Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.



My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.



Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.



Here is some of my code:



Tree class:



class tree
{
public:
obj* body; // obj class stores the mass, velocity, position ans acceleration of an object
bool parent = false;
double total_mass;
region R; // region class defines boundaaries and can check if an object is in them
vect moment;


tree* nw, *ne, *sw, *se; // the children nodes of the tree

tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}

tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)
{
for (int i = 0; i < a.chest.size(); i++)
{
insert(a.chest[i]); // this constructer id called for the root node only
}
}

~tree()
{
delete nw;
delete sw;
delete ne;
delete se;
}

void insert(obj* i)
{
if (!R.in(i->pos)) // cant insert into a region its not in
{
return;
}
else
{
if (body == nullptr) // no other bodies in this region
{
if (!parent)
{
body = i; // region is not divides so you can insert particle here
}
else
{
nw->insert(i); //region is divided so try to insert into children nodes
ne->insert(i);
sw->insert(i);
se->insert(i);
}
}
else // trying to have more than one particle in a node
{

divide(); // splits node into its childrem

nw->insert(i);
ne->insert(i);
sw->insert(i);
se->insert(i);
nw->insert(body); // insert current object and the object that was previouly in the parent node into the children nodes
ne->insert(body);
sw->insert(body);
se->insert(body);

body = nullptr; // no longer bodies in the node
}
total_mass += i->mass;
moment += (i->pos) * (i->mass);
}
}

void divide()
{
double l = R.length / 2;

nw = new tree(R.point + vect(0, l), l);
ne = new tree(R.point + vect(l, l), l);
sw = new tree(R.point + vect(0, 0), l);
se = new tree(R.point + vect(l, 0), l);

parent = true;
}

vect COM()
{

return moment / total_mass;

}
};


Accelerator:



constexpr double theta = 0.5; //theta criterion
double G = 1 * pow(10,-11); // gravitational constant

void accelerate(obj& i, tree& t)
{
vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node
if (!t.parent) //checks if node is undivided
{
i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2();
}//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle

else
{
if (t.R.in(i.pos) || t.R.length / r.mag() > theta)
{
accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children
accelerate(i, *t.ne);
accelerate(i, *t.sw);
accelerate(i, *t.se);
}
else
{
i.a += r.norm() * G * t.total_mass / r.mag2();
}
}
}


RK4:



void move(cloud& a)  // cloud class stores an array of pointers to objects
{
tree* t1 = new tree(a); //adds objects in cloud to a new tree
for (obj* i : a.chest) //chest is the array of pointer to objects
{
accelerate(*i, *t1); //uses tree to refresh the accelration of the particle
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;

vect dr1 = i->v * h;
vect dv1 = i->a * h;

i->pos = i->ppos + dr1 / 2;
i->v = i->pv + dv1 / 2;
i->tpos += dr1 / 6;
i->tv += dv1 / 6;
}

delete t1;
tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration

for (obj* i : a.chest)
{
accelerate(*i, *t2);

vect dr2 = i->v * h;
vect dv2 = i->a * h;

i->pos = i->ppos + dr2 / 2;
i->v = i->pv + dv2 / 2;
i->tpos += dr2 / 3;
i->tv += dv2 / 3;
}

delete t2;
tree* t3 = new tree(a);

for (obj* i : a.chest)
{
accelerate(*i, *t3);

vect dr3 = i->v * h;
vect dv3 = i->a * h;

i->pos = i->ppos + dr3;
i->v = i->pv + dv3;
i->tpos += dr3 / 3;
i->tv += dv3 / 3;
}

delete t3;
tree* t4 = new tree(a);

for (obj* i : a.chest)
{
accelerate(*i, *t4);

vect dr4 = i->v * h;
vect dv4 = i->a * h;

i->tpos += dr4 / 6;
i->tv += dv4 / 6;

i->pos = i->tpos;
i->v = i->tv;
i->a = i->pa;

}

delete t4;
}


Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?







c++ algorithm tree simulation physics






share|improve this question









New contributor




Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday









200_success

128k15152413




128k15152413






New contributor




Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









Callum

183




183




New contributor




Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Callum is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 2




    There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
    – Hulk
    yesterday










  • (Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
    – greybeard
    yesterday










  • Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
    – bruglesco
    yesterday






  • 1




    Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
    – Cris Luengo
    yesterday










  • The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
    – Callum
    yesterday














  • 2




    There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
    – Hulk
    yesterday










  • (Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
    – greybeard
    yesterday










  • Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
    – bruglesco
    yesterday






  • 1




    Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
    – Cris Luengo
    yesterday










  • The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
    – Callum
    yesterday








2




2




There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
yesterday




There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
yesterday












(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
yesterday




(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
yesterday












Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
yesterday




Would half a minute be normal when simulating this many particles? No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
yesterday




1




1




Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
yesterday




Please include the cloud class and a main with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
yesterday












The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
yesterday




The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
yesterday










1 Answer
1






active

oldest

votes


















3














This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.



Performance



It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.



That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move() into 4 different functions, you can allocate t1-t4 on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.



Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.



But if you want to get real performance gains I recommend one or more of the following:




  1. Write a SIMD version of the code that operates on more than one particle at a time.

  2. Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

  3. Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.


Data Hiding



All the members of your tree class are public. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud and obj classes suffer from the same problem.



Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.



Naming



You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment and total_mass, so it's a little odd that arguments to your functions have names like a and i. Why do that?



Your class names are better than 1 character variable names, but could be expanded. For example, tree is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree) to distinguish it from a typical binary search tree or a red-black tree?



The class name obj is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj is entirely free from meaning. Why not call it particle? Or body?



Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().



Data Structures



It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct or class unto itself. And instead of manually keeping all three in sync by doing this:



    i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;


You could do something like this:



struct movement {
float3 pos;
float3 velocity;
float3 acceleration;
};
...
new_movement = movement;
prev_movement = movement;


So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.






share|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    Callum is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210936%2fvery-slow-frame-rate-in-c-n-body-simulation-using-barnes-hutt-and-rk4%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.



    Performance



    It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.



    That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move() into 4 different functions, you can allocate t1-t4 on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.



    Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.



    But if you want to get real performance gains I recommend one or more of the following:




    1. Write a SIMD version of the code that operates on more than one particle at a time.

    2. Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

    3. Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.


    Data Hiding



    All the members of your tree class are public. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud and obj classes suffer from the same problem.



    Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.



    Naming



    You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment and total_mass, so it's a little odd that arguments to your functions have names like a and i. Why do that?



    Your class names are better than 1 character variable names, but could be expanded. For example, tree is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree) to distinguish it from a typical binary search tree or a red-black tree?



    The class name obj is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj is entirely free from meaning. Why not call it particle? Or body?



    Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().



    Data Structures



    It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct or class unto itself. And instead of manually keeping all three in sync by doing this:



        i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
    i->tv = i->pv = i->v;
    i->ta = i->pa = i->a;


    You could do something like this:



    struct movement {
    float3 pos;
    float3 velocity;
    float3 acceleration;
    };
    ...
    new_movement = movement;
    prev_movement = movement;


    So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.






    share|improve this answer


























      3














      This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.



      Performance



      It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.



      That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move() into 4 different functions, you can allocate t1-t4 on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.



      Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.



      But if you want to get real performance gains I recommend one or more of the following:




      1. Write a SIMD version of the code that operates on more than one particle at a time.

      2. Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

      3. Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.


      Data Hiding



      All the members of your tree class are public. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud and obj classes suffer from the same problem.



      Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.



      Naming



      You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment and total_mass, so it's a little odd that arguments to your functions have names like a and i. Why do that?



      Your class names are better than 1 character variable names, but could be expanded. For example, tree is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree) to distinguish it from a typical binary search tree or a red-black tree?



      The class name obj is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj is entirely free from meaning. Why not call it particle? Or body?



      Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().



      Data Structures



      It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct or class unto itself. And instead of manually keeping all three in sync by doing this:



          i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
      i->tv = i->pv = i->v;
      i->ta = i->pa = i->a;


      You could do something like this:



      struct movement {
      float3 pos;
      float3 velocity;
      float3 acceleration;
      };
      ...
      new_movement = movement;
      prev_movement = movement;


      So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.






      share|improve this answer
























        3












        3








        3






        This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.



        Performance



        It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.



        That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move() into 4 different functions, you can allocate t1-t4 on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.



        Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.



        But if you want to get real performance gains I recommend one or more of the following:




        1. Write a SIMD version of the code that operates on more than one particle at a time.

        2. Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

        3. Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.


        Data Hiding



        All the members of your tree class are public. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud and obj classes suffer from the same problem.



        Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.



        Naming



        You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment and total_mass, so it's a little odd that arguments to your functions have names like a and i. Why do that?



        Your class names are better than 1 character variable names, but could be expanded. For example, tree is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree) to distinguish it from a typical binary search tree or a red-black tree?



        The class name obj is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj is entirely free from meaning. Why not call it particle? Or body?



        Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().



        Data Structures



        It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct or class unto itself. And instead of manually keeping all three in sync by doing this:



            i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
        i->tv = i->pv = i->v;
        i->ta = i->pa = i->a;


        You could do something like this:



        struct movement {
        float3 pos;
        float3 velocity;
        float3 acceleration;
        };
        ...
        new_movement = movement;
        prev_movement = movement;


        So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.






        share|improve this answer












        This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.



        Performance



        It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.



        That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move() into 4 different functions, you can allocate t1-t4 on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.



        Your accelerate() function is recursive. You might be able to increase the speed by making it loop instead of recursing.



        But if you want to get real performance gains I recommend one or more of the following:




        1. Write a SIMD version of the code that operates on more than one particle at a time.

        2. Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)

        3. Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.


        Data Hiding



        All the members of your tree class are public. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud and obj classes suffer from the same problem.



        Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.



        Naming



        You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment and total_mass, so it's a little odd that arguments to your functions have names like a and i. Why do that?



        Your class names are better than 1 character variable names, but could be expanded. For example, tree is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree) to distinguish it from a typical binary search tree or a red-black tree?



        The class name obj is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj is entirely free from meaning. Why not call it particle? Or body?



        Most of your function names are pretty good with the exception of COM(). It's only called in one place, so there's no excuse for not just writing out center_of_mass().



        Data Structures



        It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct or class unto itself. And instead of manually keeping all three in sync by doing this:



            i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
        i->tv = i->pv = i->v;
        i->ta = i->pa = i->a;


        You could do something like this:



        struct movement {
        float3 pos;
        float3 velocity;
        float3 acceleration;
        };
        ...
        new_movement = movement;
        prev_movement = movement;


        So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 16 hours ago









        user1118321

        10.8k11145




        10.8k11145






















            Callum is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            Callum is a new contributor. Be nice, and check out our Code of Conduct.













            Callum is a new contributor. Be nice, and check out our Code of Conduct.












            Callum is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210936%2fvery-slow-frame-rate-in-c-n-body-simulation-using-barnes-hutt-and-rk4%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to make a Squid Proxy server?

            Is this a new Fibonacci Identity?

            19世紀