Very slow frame rate in C++ N-body simulation using Barnes-Hutt and RK4
I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.
My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.
Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.
Here is some of my code:
Tree class:
class tree
{
public:
obj* body; // obj class stores the mass, velocity, position ans acceleration of an object
bool parent = false;
double total_mass;
region R; // region class defines boundaaries and can check if an object is in them
vect moment;
tree* nw, *ne, *sw, *se; // the children nodes of the tree
tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}
tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)
{
for (int i = 0; i < a.chest.size(); i++)
{
insert(a.chest[i]); // this constructer id called for the root node only
}
}
~tree()
{
delete nw;
delete sw;
delete ne;
delete se;
}
void insert(obj* i)
{
if (!R.in(i->pos)) // cant insert into a region its not in
{
return;
}
else
{
if (body == nullptr) // no other bodies in this region
{
if (!parent)
{
body = i; // region is not divides so you can insert particle here
}
else
{
nw->insert(i); //region is divided so try to insert into children nodes
ne->insert(i);
sw->insert(i);
se->insert(i);
}
}
else // trying to have more than one particle in a node
{
divide(); // splits node into its childrem
nw->insert(i);
ne->insert(i);
sw->insert(i);
se->insert(i);
nw->insert(body); // insert current object and the object that was previouly in the parent node into the children nodes
ne->insert(body);
sw->insert(body);
se->insert(body);
body = nullptr; // no longer bodies in the node
}
total_mass += i->mass;
moment += (i->pos) * (i->mass);
}
}
void divide()
{
double l = R.length / 2;
nw = new tree(R.point + vect(0, l), l);
ne = new tree(R.point + vect(l, l), l);
sw = new tree(R.point + vect(0, 0), l);
se = new tree(R.point + vect(l, 0), l);
parent = true;
}
vect COM()
{
return moment / total_mass;
}
};
Accelerator:
constexpr double theta = 0.5; //theta criterion
double G = 1 * pow(10,-11); // gravitational constant
void accelerate(obj& i, tree& t)
{
vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node
if (!t.parent) //checks if node is undivided
{
i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2();
}//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle
else
{
if (t.R.in(i.pos) || t.R.length / r.mag() > theta)
{
accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children
accelerate(i, *t.ne);
accelerate(i, *t.sw);
accelerate(i, *t.se);
}
else
{
i.a += r.norm() * G * t.total_mass / r.mag2();
}
}
}
RK4:
void move(cloud& a) // cloud class stores an array of pointers to objects
{
tree* t1 = new tree(a); //adds objects in cloud to a new tree
for (obj* i : a.chest) //chest is the array of pointer to objects
{
accelerate(*i, *t1); //uses tree to refresh the accelration of the particle
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
vect dr1 = i->v * h;
vect dv1 = i->a * h;
i->pos = i->ppos + dr1 / 2;
i->v = i->pv + dv1 / 2;
i->tpos += dr1 / 6;
i->tv += dv1 / 6;
}
delete t1;
tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration
for (obj* i : a.chest)
{
accelerate(*i, *t2);
vect dr2 = i->v * h;
vect dv2 = i->a * h;
i->pos = i->ppos + dr2 / 2;
i->v = i->pv + dv2 / 2;
i->tpos += dr2 / 3;
i->tv += dv2 / 3;
}
delete t2;
tree* t3 = new tree(a);
for (obj* i : a.chest)
{
accelerate(*i, *t3);
vect dr3 = i->v * h;
vect dv3 = i->a * h;
i->pos = i->ppos + dr3;
i->v = i->pv + dv3;
i->tpos += dr3 / 3;
i->tv += dv3 / 3;
}
delete t3;
tree* t4 = new tree(a);
for (obj* i : a.chest)
{
accelerate(*i, *t4);
vect dr4 = i->v * h;
vect dv4 = i->a * h;
i->tpos += dr4 / 6;
i->tv += dv4 / 6;
i->pos = i->tpos;
i->v = i->tv;
i->a = i->pa;
}
delete t4;
}
Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?
c++ algorithm tree simulation physics
New contributor
|
show 4 more comments
I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.
My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.
Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.
Here is some of my code:
Tree class:
class tree
{
public:
obj* body; // obj class stores the mass, velocity, position ans acceleration of an object
bool parent = false;
double total_mass;
region R; // region class defines boundaaries and can check if an object is in them
vect moment;
tree* nw, *ne, *sw, *se; // the children nodes of the tree
tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}
tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)
{
for (int i = 0; i < a.chest.size(); i++)
{
insert(a.chest[i]); // this constructer id called for the root node only
}
}
~tree()
{
delete nw;
delete sw;
delete ne;
delete se;
}
void insert(obj* i)
{
if (!R.in(i->pos)) // cant insert into a region its not in
{
return;
}
else
{
if (body == nullptr) // no other bodies in this region
{
if (!parent)
{
body = i; // region is not divides so you can insert particle here
}
else
{
nw->insert(i); //region is divided so try to insert into children nodes
ne->insert(i);
sw->insert(i);
se->insert(i);
}
}
else // trying to have more than one particle in a node
{
divide(); // splits node into its childrem
nw->insert(i);
ne->insert(i);
sw->insert(i);
se->insert(i);
nw->insert(body); // insert current object and the object that was previouly in the parent node into the children nodes
ne->insert(body);
sw->insert(body);
se->insert(body);
body = nullptr; // no longer bodies in the node
}
total_mass += i->mass;
moment += (i->pos) * (i->mass);
}
}
void divide()
{
double l = R.length / 2;
nw = new tree(R.point + vect(0, l), l);
ne = new tree(R.point + vect(l, l), l);
sw = new tree(R.point + vect(0, 0), l);
se = new tree(R.point + vect(l, 0), l);
parent = true;
}
vect COM()
{
return moment / total_mass;
}
};
Accelerator:
constexpr double theta = 0.5; //theta criterion
double G = 1 * pow(10,-11); // gravitational constant
void accelerate(obj& i, tree& t)
{
vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node
if (!t.parent) //checks if node is undivided
{
i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2();
}//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle
else
{
if (t.R.in(i.pos) || t.R.length / r.mag() > theta)
{
accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children
accelerate(i, *t.ne);
accelerate(i, *t.sw);
accelerate(i, *t.se);
}
else
{
i.a += r.norm() * G * t.total_mass / r.mag2();
}
}
}
RK4:
void move(cloud& a) // cloud class stores an array of pointers to objects
{
tree* t1 = new tree(a); //adds objects in cloud to a new tree
for (obj* i : a.chest) //chest is the array of pointer to objects
{
accelerate(*i, *t1); //uses tree to refresh the accelration of the particle
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
vect dr1 = i->v * h;
vect dv1 = i->a * h;
i->pos = i->ppos + dr1 / 2;
i->v = i->pv + dv1 / 2;
i->tpos += dr1 / 6;
i->tv += dv1 / 6;
}
delete t1;
tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration
for (obj* i : a.chest)
{
accelerate(*i, *t2);
vect dr2 = i->v * h;
vect dv2 = i->a * h;
i->pos = i->ppos + dr2 / 2;
i->v = i->pv + dv2 / 2;
i->tpos += dr2 / 3;
i->tv += dv2 / 3;
}
delete t2;
tree* t3 = new tree(a);
for (obj* i : a.chest)
{
accelerate(*i, *t3);
vect dr3 = i->v * h;
vect dv3 = i->a * h;
i->pos = i->ppos + dr3;
i->v = i->pv + dv3;
i->tpos += dr3 / 3;
i->tv += dv3 / 3;
}
delete t3;
tree* t4 = new tree(a);
for (obj* i : a.chest)
{
accelerate(*i, *t4);
vect dr4 = i->v * h;
vect dv4 = i->a * h;
i->tpos += dr4 / 6;
i->tv += dv4 / 6;
i->pos = i->tpos;
i->v = i->tv;
i->a = i->pa;
}
delete t4;
}
Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?
c++ algorithm tree simulation physics
New contributor
2
There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
yesterday
(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to askWould half a minute be normal when simulating [10^5…6] particles?
– greybeard
yesterday
Would half a minute be normal when simulating this many particles?
No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
yesterday
1
Please include thecloud
class and amain
with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
yesterday
The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
yesterday
|
show 4 more comments
I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.
My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.
Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.
Here is some of my code:
Tree class:
class tree
{
public:
obj* body; // obj class stores the mass, velocity, position ans acceleration of an object
bool parent = false;
double total_mass;
region R; // region class defines boundaaries and can check if an object is in them
vect moment;
tree* nw, *ne, *sw, *se; // the children nodes of the tree
tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}
tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)
{
for (int i = 0; i < a.chest.size(); i++)
{
insert(a.chest[i]); // this constructer id called for the root node only
}
}
~tree()
{
delete nw;
delete sw;
delete ne;
delete se;
}
void insert(obj* i)
{
if (!R.in(i->pos)) // cant insert into a region its not in
{
return;
}
else
{
if (body == nullptr) // no other bodies in this region
{
if (!parent)
{
body = i; // region is not divides so you can insert particle here
}
else
{
nw->insert(i); //region is divided so try to insert into children nodes
ne->insert(i);
sw->insert(i);
se->insert(i);
}
}
else // trying to have more than one particle in a node
{
divide(); // splits node into its childrem
nw->insert(i);
ne->insert(i);
sw->insert(i);
se->insert(i);
nw->insert(body); // insert current object and the object that was previouly in the parent node into the children nodes
ne->insert(body);
sw->insert(body);
se->insert(body);
body = nullptr; // no longer bodies in the node
}
total_mass += i->mass;
moment += (i->pos) * (i->mass);
}
}
void divide()
{
double l = R.length / 2;
nw = new tree(R.point + vect(0, l), l);
ne = new tree(R.point + vect(l, l), l);
sw = new tree(R.point + vect(0, 0), l);
se = new tree(R.point + vect(l, 0), l);
parent = true;
}
vect COM()
{
return moment / total_mass;
}
};
Accelerator:
constexpr double theta = 0.5; //theta criterion
double G = 1 * pow(10,-11); // gravitational constant
void accelerate(obj& i, tree& t)
{
vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node
if (!t.parent) //checks if node is undivided
{
i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2();
}//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle
else
{
if (t.R.in(i.pos) || t.R.length / r.mag() > theta)
{
accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children
accelerate(i, *t.ne);
accelerate(i, *t.sw);
accelerate(i, *t.se);
}
else
{
i.a += r.norm() * G * t.total_mass / r.mag2();
}
}
}
RK4:
void move(cloud& a) // cloud class stores an array of pointers to objects
{
tree* t1 = new tree(a); //adds objects in cloud to a new tree
for (obj* i : a.chest) //chest is the array of pointer to objects
{
accelerate(*i, *t1); //uses tree to refresh the accelration of the particle
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
vect dr1 = i->v * h;
vect dv1 = i->a * h;
i->pos = i->ppos + dr1 / 2;
i->v = i->pv + dv1 / 2;
i->tpos += dr1 / 6;
i->tv += dv1 / 6;
}
delete t1;
tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration
for (obj* i : a.chest)
{
accelerate(*i, *t2);
vect dr2 = i->v * h;
vect dv2 = i->a * h;
i->pos = i->ppos + dr2 / 2;
i->v = i->pv + dv2 / 2;
i->tpos += dr2 / 3;
i->tv += dv2 / 3;
}
delete t2;
tree* t3 = new tree(a);
for (obj* i : a.chest)
{
accelerate(*i, *t3);
vect dr3 = i->v * h;
vect dv3 = i->a * h;
i->pos = i->ppos + dr3;
i->v = i->pv + dv3;
i->tpos += dr3 / 3;
i->tv += dv3 / 3;
}
delete t3;
tree* t4 = new tree(a);
for (obj* i : a.chest)
{
accelerate(*i, *t4);
vect dr4 = i->v * h;
vect dv4 = i->a * h;
i->tpos += dr4 / 6;
i->tv += dv4 / 6;
i->pos = i->tpos;
i->v = i->tv;
i->a = i->pa;
}
delete t4;
}
Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?
c++ algorithm tree simulation physics
New contributor
I am writing the code for my EPQ project and am aiming to produce a graphic showing the collapse of a particle cloud with around 10^5/6 particles.
My current code will spawn a specified number of particles in a uniform distribution across a specified area. It will then calculate the acceleration of the particles using a Barnes-Hutt tree and integrate this to find the new position of each particle. I currently have no graphics but the program will print to the console upon each movement.
Unfortunately, each iteration takes around half a minute (running with 35000 particles) which is way too slow for a graphic. So i am looking for a way to improve my algorithm(s) to make it faster.
Here is some of my code:
Tree class:
class tree
{
public:
obj* body; // obj class stores the mass, velocity, position ans acceleration of an object
bool parent = false;
double total_mass;
region R; // region class defines boundaaries and can check if an object is in them
vect moment;
tree* nw, *ne, *sw, *se; // the children nodes of the tree
tree(vect corner, double length) : R(length, corner), total_mass(0), moment(vect(0, 0)), body(nullptr) {}
tree(cloud& a) : R(a.size + 1, vect(0,0)), total_mass(0), moment(vect(0, 0)), body(nullptr)
{
for (int i = 0; i < a.chest.size(); i++)
{
insert(a.chest[i]); // this constructer id called for the root node only
}
}
~tree()
{
delete nw;
delete sw;
delete ne;
delete se;
}
void insert(obj* i)
{
if (!R.in(i->pos)) // cant insert into a region its not in
{
return;
}
else
{
if (body == nullptr) // no other bodies in this region
{
if (!parent)
{
body = i; // region is not divides so you can insert particle here
}
else
{
nw->insert(i); //region is divided so try to insert into children nodes
ne->insert(i);
sw->insert(i);
se->insert(i);
}
}
else // trying to have more than one particle in a node
{
divide(); // splits node into its childrem
nw->insert(i);
ne->insert(i);
sw->insert(i);
se->insert(i);
nw->insert(body); // insert current object and the object that was previouly in the parent node into the children nodes
ne->insert(body);
sw->insert(body);
se->insert(body);
body = nullptr; // no longer bodies in the node
}
total_mass += i->mass;
moment += (i->pos) * (i->mass);
}
}
void divide()
{
double l = R.length / 2;
nw = new tree(R.point + vect(0, l), l);
ne = new tree(R.point + vect(l, l), l);
sw = new tree(R.point + vect(0, 0), l);
se = new tree(R.point + vect(l, 0), l);
parent = true;
}
vect COM()
{
return moment / total_mass;
}
};
Accelerator:
constexpr double theta = 0.5; //theta criterion
double G = 1 * pow(10,-11); // gravitational constant
void accelerate(obj& i, tree& t)
{
vect r = t.COM() - i.pos; // vector between the position of the particle and the center of mass of the node
if (!t.parent) //checks if node is undivided
{
i.a += (t.body == nullptr || t.R.in(i.pos)) ? vect(0, 0) : r.norm() * G * t.total_mass / r.mag2();
}//if there are also no bodys or the object being accelerated is in the node then there is no effect on the particle
else
{
if (t.R.in(i.pos) || t.R.length / r.mag() > theta)
{
accelerate(i, *t.nw); //object is in the node or the node does not meet the theta criterion so try the nodes children
accelerate(i, *t.ne);
accelerate(i, *t.sw);
accelerate(i, *t.se);
}
else
{
i.a += r.norm() * G * t.total_mass / r.mag2();
}
}
}
RK4:
void move(cloud& a) // cloud class stores an array of pointers to objects
{
tree* t1 = new tree(a); //adds objects in cloud to a new tree
for (obj* i : a.chest) //chest is the array of pointer to objects
{
accelerate(*i, *t1); //uses tree to refresh the accelration of the particle
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
vect dr1 = i->v * h;
vect dv1 = i->a * h;
i->pos = i->ppos + dr1 / 2;
i->v = i->pv + dv1 / 2;
i->tpos += dr1 / 6;
i->tv += dv1 / 6;
}
delete t1;
tree* t2 = new tree(a); // deletes previous tree and creates a new one to culculate the new acceleration
for (obj* i : a.chest)
{
accelerate(*i, *t2);
vect dr2 = i->v * h;
vect dv2 = i->a * h;
i->pos = i->ppos + dr2 / 2;
i->v = i->pv + dv2 / 2;
i->tpos += dr2 / 3;
i->tv += dv2 / 3;
}
delete t2;
tree* t3 = new tree(a);
for (obj* i : a.chest)
{
accelerate(*i, *t3);
vect dr3 = i->v * h;
vect dv3 = i->a * h;
i->pos = i->ppos + dr3;
i->v = i->pv + dv3;
i->tpos += dr3 / 3;
i->tv += dv3 / 3;
}
delete t3;
tree* t4 = new tree(a);
for (obj* i : a.chest)
{
accelerate(*i, *t4);
vect dr4 = i->v * h;
vect dv4 = i->a * h;
i->tpos += dr4 / 6;
i->tv += dv4 / 6;
i->pos = i->tpos;
i->v = i->tv;
i->a = i->pa;
}
delete t4;
}
Would half a minute be normal when simulating this many particles? If not how could I improve this code to make it run faster?
c++ algorithm tree simulation physics
c++ algorithm tree simulation physics
New contributor
New contributor
edited yesterday
200_success
128k15152413
128k15152413
New contributor
asked yesterday
Callum
183
183
New contributor
New contributor
2
There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
yesterday
(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to askWould half a minute be normal when simulating [10^5…6] particles?
– greybeard
yesterday
Would half a minute be normal when simulating this many particles?
No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
yesterday
1
Please include thecloud
class and amain
with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
yesterday
The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
yesterday
|
show 4 more comments
2
There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
yesterday
(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to askWould half a minute be normal when simulating [10^5…6] particles?
– greybeard
yesterday
Would half a minute be normal when simulating this many particles?
No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.
– bruglesco
yesterday
1
Please include thecloud
class and amain
with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.
– Cris Luengo
yesterday
The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
yesterday
2
2
There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
yesterday
There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
yesterday
(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask
Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
yesterday
(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask
Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
yesterday
Would half a minute be normal when simulating this many particles?
No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.– bruglesco
yesterday
Would half a minute be normal when simulating this many particles?
No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.– bruglesco
yesterday
1
1
Please include the
cloud
class and a main
with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.– Cris Luengo
yesterday
Please include the
cloud
class and a main
with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.– Cris Luengo
yesterday
The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
yesterday
The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
yesterday
|
show 4 more comments
1 Answer
1
active
oldest
votes
This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.
Performance
It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.
That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move()
into 4 different functions, you can allocate t1
-t4
on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.
Your accelerate()
function is recursive. You might be able to increase the speed by making it loop instead of recursing.
But if you want to get real performance gains I recommend one or more of the following:
- Write a SIMD version of the code that operates on more than one particle at a time.
- Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)
- Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.
Data Hiding
All the members of your tree
class are public
. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud
and obj
classes suffer from the same problem.
Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.
Naming
You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment
and total_mass
, so it's a little odd that arguments to your functions have names like a
and i
. Why do that?
Your class names are better than 1 character variable names, but could be expanded. For example, tree
is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree
) to distinguish it from a typical binary search tree or a red-black tree?
The class name obj
is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj
is entirely free from meaning. Why not call it particle
? Or body
?
Most of your function names are pretty good with the exception of COM()
. It's only called in one place, so there's no excuse for not just writing out center_of_mass()
.
Data Structures
It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct
or class
unto itself. And instead of manually keeping all three in sync by doing this:
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
You could do something like this:
struct movement {
float3 pos;
float3 velocity;
float3 acceleration;
};
...
new_movement = movement;
prev_movement = movement;
So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Callum is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210936%2fvery-slow-frame-rate-in-c-n-body-simulation-using-barnes-hutt-and-rk4%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.
Performance
It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.
That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move()
into 4 different functions, you can allocate t1
-t4
on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.
Your accelerate()
function is recursive. You might be able to increase the speed by making it loop instead of recursing.
But if you want to get real performance gains I recommend one or more of the following:
- Write a SIMD version of the code that operates on more than one particle at a time.
- Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)
- Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.
Data Hiding
All the members of your tree
class are public
. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud
and obj
classes suffer from the same problem.
Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.
Naming
You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment
and total_mass
, so it's a little odd that arguments to your functions have names like a
and i
. Why do that?
Your class names are better than 1 character variable names, but could be expanded. For example, tree
is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree
) to distinguish it from a typical binary search tree or a red-black tree?
The class name obj
is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj
is entirely free from meaning. Why not call it particle
? Or body
?
Most of your function names are pretty good with the exception of COM()
. It's only called in one place, so there's no excuse for not just writing out center_of_mass()
.
Data Structures
It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct
or class
unto itself. And instead of manually keeping all three in sync by doing this:
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
You could do something like this:
struct movement {
float3 pos;
float3 velocity;
float3 acceleration;
};
...
new_movement = movement;
prev_movement = movement;
So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.
add a comment |
This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.
Performance
It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.
That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move()
into 4 different functions, you can allocate t1
-t4
on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.
Your accelerate()
function is recursive. You might be able to increase the speed by making it loop instead of recursing.
But if you want to get real performance gains I recommend one or more of the following:
- Write a SIMD version of the code that operates on more than one particle at a time.
- Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)
- Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.
Data Hiding
All the members of your tree
class are public
. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud
and obj
classes suffer from the same problem.
Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.
Naming
You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment
and total_mass
, so it's a little odd that arguments to your functions have names like a
and i
. Why do that?
Your class names are better than 1 character variable names, but could be expanded. For example, tree
is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree
) to distinguish it from a typical binary search tree or a red-black tree?
The class name obj
is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj
is entirely free from meaning. Why not call it particle
? Or body
?
Most of your function names are pretty good with the exception of COM()
. It's only called in one place, so there's no excuse for not just writing out center_of_mass()
.
Data Structures
It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct
or class
unto itself. And instead of manually keeping all three in sync by doing this:
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
You could do something like this:
struct movement {
float3 pos;
float3 velocity;
float3 acceleration;
};
...
new_movement = movement;
prev_movement = movement;
So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.
add a comment |
This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.
Performance
It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.
That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move()
into 4 different functions, you can allocate t1
-t4
on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.
Your accelerate()
function is recursive. You might be able to increase the speed by making it loop instead of recursing.
But if you want to get real performance gains I recommend one or more of the following:
- Write a SIMD version of the code that operates on more than one particle at a time.
- Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)
- Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.
Data Hiding
All the members of your tree
class are public
. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud
and obj
classes suffer from the same problem.
Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.
Naming
You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment
and total_mass
, so it's a little odd that arguments to your functions have names like a
and i
. Why do that?
Your class names are better than 1 character variable names, but could be expanded. For example, tree
is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree
) to distinguish it from a typical binary search tree or a red-black tree?
The class name obj
is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj
is entirely free from meaning. Why not call it particle
? Or body
?
Most of your function names are pretty good with the exception of COM()
. It's only called in one place, so there's no excuse for not just writing out center_of_mass()
.
Data Structures
It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct
or class
unto itself. And instead of manually keeping all three in sync by doing this:
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
You could do something like this:
struct movement {
float3 pos;
float3 velocity;
float3 acceleration;
};
...
new_movement = movement;
prev_movement = movement;
So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.
This looks like a really fascinating project. I've only played with toy particle systems – never anything as useful as this! But I do have some thoughts that might be helpful.
Performance
It's hard to say for sure where the slowdown is because it's not possible for me to profile the code based just on what's here. But that's something you should do. Run it in a profiler and see where the slowdowns are rather than guessing.
That said, I see some things that I frequently find to be performance issues. As mentioned in the comments, the number of allocations that the code does might be an issue. Stack allocations may be faster in some cases than heap allocations. So if you break up the code in move()
into 4 different functions, you can allocate t1
-t4
on the stack rather than the heap. It might improve the speed since a stack allocation is generally just a pointer addition.
Your accelerate()
function is recursive. You might be able to increase the speed by making it loop instead of recursing.
But if you want to get real performance gains I recommend one or more of the following:
- Write a SIMD version of the code that operates on more than one particle at a time.
- Write a multi-threaded version of the code that does the calculations for multiple particles on different threads. (You can do this along with #1 to get even more speed.)
- Run the simulation on the GPU. This is likely to yield even better performance than combining 1 & 2 as GPUs have thousands of cores instead of just dozens.
Data Hiding
All the members of your tree
class are public
. That's usually a bad idea because it means that any other code in your application can reach in and change values inside the object, potentially leaving it in an invalid state. It also make debugging harder because you can't narrow down where a member variable was changed. Traditionally, it makes sense to make your member variables private and have accessors to retrieve or change them. The accessors can be written in the header so they get inlined and are just as fast as if they were public. But by having accessors you can, for example, put a breakpoint in a single place and figure out where a variable is changed. It would appear that your cloud
and obj
classes suffer from the same problem.
Also, if you make the objects immutable (so they can be read but not changed or written to), it makes moving to multiple threads much easier. But there are other implications to doing that, such as needing to create new values rather than updating existing ones, which may or may not be a performance issue.
Naming
You really need to expand on your variable names. 1 and 2 letter names are too hard to read. You get it exactly right with names like moment
and total_mass
, so it's a little odd that arguments to your functions have names like a
and i
. Why do that?
Your class names are better than 1 character variable names, but could be expanded. For example, tree
is a Barnes-Hutt tree. Why not call it that (or at least something like BHTree
) to distinguish it from a typical binary search tree or a red-black tree?
The class name obj
is exceedingly unhelpful. Every piece of data in your program is either a POD or an object, so as a class name obj
is entirely free from meaning. Why not call it particle
? Or body
?
Most of your function names are pretty good with the exception of COM()
. It's only called in one place, so there's no excuse for not just writing out center_of_mass()
.
Data Structures
It looks like there are several bits of data within a particle that need to be updated in sync and that go together. For example, the position, velocity and acceleration of the particle. That should be a struct
or class
unto itself. And instead of manually keeping all three in sync by doing this:
i->tpos = i->ppos = i->pos; // tpos/v/a stores the value of the new pos/v/a, ppos stores the value from the previous itteration
i->tv = i->pv = i->v;
i->ta = i->pa = i->a;
You could do something like this:
struct movement {
float3 pos;
float3 velocity;
float3 acceleration;
};
...
new_movement = movement;
prev_movement = movement;
So now you only have 2 assignments instead of 6. It's easier to read and it makes clear that position, velocity, and acceleration are all properties of the same object and that you have a next, current, previous relationship between them.
answered 16 hours ago
user1118321
10.8k11145
10.8k11145
add a comment |
add a comment |
Callum is a new contributor. Be nice, and check out our Code of Conduct.
Callum is a new contributor. Be nice, and check out our Code of Conduct.
Callum is a new contributor. Be nice, and check out our Code of Conduct.
Callum is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210936%2fvery-slow-frame-rate-in-c-n-body-simulation-using-barnes-hutt-and-rk4%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
There's a lot of dynamic memory allocation going on here - might be worth profiling how much time that takes. If it's significant, you could try to avoid that (by reusing objects via some kind of pool and/or a custom allocator)
– Hulk
yesterday
(Welcome to Code Review!) While your question is on topic as far as you are ready for open-ended feedback, there are bound to be more promising places to ask
Would half a minute be normal when simulating [10^5…6] particles?
– greybeard
yesterday
Would half a minute be normal when simulating this many particles?
No. The target framers for most games is 60 FPS. I'm not sure how most particles systems achieve that though I suspect an object pool could be the answer. Be sure to head over to Game Development and see if they've seen this before. I bet they have.– bruglesco
yesterday
1
Please include the
cloud
class and amain
with some test data so we can run the code. Incomplete code snippets are hard to review, and make it impossible for us to help you improve the code.– Cris Luengo
yesterday
The rest of the code required for it to run is pages and pages so I wont dump it all on here. I have looked into object pools though which is promising and i will post a similar question on Game Development at some point. Thanks for all your advice :)
– Callum
yesterday