Where is algorithm written at for BrepIncrementalMesh?

blobfish

CAD community veteran
Tracing around OCC can be a real challenge with all the virtual dispatch. I ran the following code and created a dynamic call graph. Maybe that will help you track down what you are interested in.
C++:
#include <iostream>

#include <BRepMesh_IncrementalMesh.hxx>
#include <valgrind/callgrind.h>

#include <ocsall.h>


// run with:
// valgrind --tool=callgrind --instr-atstart=no ./occtCallGraph
int main(int, char**)
{
  auto fileShape = ocs::readBRep("~/cadseer/media/mm2021.brep");
  assert(!fileShape.IsNull());
  BRepTools::Clean(fileShape);
  CALLGRIND_START_INSTRUMENTATION; //skipping code before this
  BRepMesh_IncrementalMesh mesher(fileShape, 0.01, Standard_False, 0.5, Standard_True);
  CALLGRIND_STOP_INSTRUMENTATION; //skipping code after this
  CALLGRIND_DUMP_STATS;
  return 0;
}
 

Attachments

  • meshDynamicCallGraph.zip
    7.8 KB · Views: 2

easternbun

CAD practitioner
Tracing around OCC can be a real challenge with all the virtual dispatch. I ran the following code and created a dynamic call graph. Maybe that will help you track down what you are interested in.
C++:
#include <iostream>

#include <BRepMesh_IncrementalMesh.hxx>
#include <valgrind/callgrind.h>

#include <ocsall.h>


// run with:
// valgrind --tool=callgrind --instr-atstart=no ./occtCallGraph
int main(int, char**)
{
  auto fileShape = ocs::readBRep("~/cadseer/media/mm2021.brep");
  assert(!fileShape.IsNull());
  BRepTools::Clean(fileShape);
  CALLGRIND_START_INSTRUMENTATION; //skipping code before this
  BRepMesh_IncrementalMesh mesher(fileShape, 0.01, Standard_False, 0.5, Standard_True);
  CALLGRIND_STOP_INSTRUMENTATION; //skipping code after this
  CALLGRIND_DUMP_STATS;
  return 0;
}
exactly what I need, I do see delaun, so I guess this may be it.
 

Quaoar

Administrator
Staff member
@blobfish Interesting approach, indeed. I sometimes do the same with VTune Amplifier on windows, they have a nice "flame chart".
 

blobfish

CAD community veteran
exactly what I need, I do see delaun, so I guess this may be it.
That was my guess also, as I very briefly scanned the graphviz file.

@blobfish Interesting approach, indeed. I sometimes do the same with VTune Amplifier on windows, they have a nice "flame chart".
Yes! a flame graph is a much nicer visualization than graphviz. If I ever work on my tool again, I will look into generating flame graphs. I used graphviz, as it is 'baked' into boost graph.

I have never tried VTune, but I did try amd's CodeXL years ago. I don't remember why I gave up on CodeXL, but it probably suffered from the same thing as what I will describe next. I then got into using linux perf + gprof2dot.py, then 'hotspot' that also uses linux perf. IMHO: These tools are profilers at heart that also have call graph support. I know linux perf does, and I believe the others also, work by taking samples of the call stack at a frequency. Basically the faster a subgraph of calls is, the more likely it is to get skipped. That behavior is great for performace profiling, but not so much when generating a callgraph for code comprehension. I found with linux perf that by the time I got the sample frequency high enough to get a complete call graph, I was generating files so big they weren't manageable. I believe valgrind/callgrind is different, as it stands between the binary application and the system and doesn't rely on sampling. I am no expert in this area, so don't take anything I have said as gospel.

@Quaoar When you run VTune on something big, like opencascade, are you able to find what you want quickly? For me, the biggest problem is trying to find 'signal through the noise'. I have done some things like isolating callgrinds scope as shown in the sample code. After generation of callgrind file, I have a custom program that parses the callgrind result and has filtering options. Internally using boost graph and boost filtered graph. I can push my way to get something, but it is not easy.
 

Quaoar

Administrator
Staff member
If I ever work on my tool again...
Hey, what, have you given up with cadseer? Or have you rather finalized it more or less?

Your experience with profilers sounds way more epic than mine. Back in my OCC days, Roman Lygin (who worked at Intel back then) used to come to our office and promote VTune Amplifier, which was the tool to go for the entire modeling team. But my way of working with it was (and still is) just a matter of clicking few buttons here and there. Usually I consulted call trees and (sometimes) thread snapshots, but now that they introduced this flame diagram, it quickly became my favourite view.

Normally, I attach VTune to the process, and it stays idle while I'm preparing for testing. It is often just one function that I run from Tcl or C++ (as a unit test). Once prepared (i.e., the CAD model is loaded and I'm ready to fire the algorithm), I click "Run" and let it do its magic. In most situations, the output is clean and concise, so it does not require any further work except for your own thinking and figuring out what's going on in the code.

From time to time, I run Amplifier preventively on the entire test grid for a specific algorithm just to discover its implicit bottlenecks, which are sometimes not obvious at all (especially after several years of maintenance). So many times I was staring at BRepBndLib eating 90% of execution time :) It really gives a good perspective on what you ended up with and what the contribution of each algorithm's stage is.

1713285326574.png

So, coming back to your question, yes, I normally find issues quickly with VTune, but it also might be that I'm running it on very isolated pieces of functionality. For example, I can guess that running it over some heavy modeling procedures in Salome would be much less fun.
 

blobfish

CAD community veteran
Hey, what, have you given up with cadseer? Or have you rather finalized it more or less?
No, I am still working on cadseer. I was referring to the tool that I used to generate the graphviz file from the callgrind output. Sorry for the confusion. I do find myself spending more time on opencascade than cadseer these days. Working on cadseer feels like I am trying to shingle the roof before the foundation cement has cured. :)


Normally, I attach VTune to the process, and it stays idle while I'm preparing for testing. It is often just one function that I run from Tcl or C++ (as a unit test). Once prepared (i.e., the CAD model is loaded and I'm ready to fire the algorithm), I click "Run" and let it do its magic. In most situations, the output is clean and concise, so it does not require any further work except for your own thinking and figuring out what's going on in the code.

From time to time, I run Amplifier preventively on the entire test grid for a specific algorithm just to discover its implicit bottlenecks, which are sometimes not obvious at all (especially after several years of maintenance). So many times I was staring at BRepBndLib eating 90% of execution time :) It really gives a good perspective on what you ended up with and what the contribution of each algorithm's stage is.
So your usage of VTune has been primarily for 'hotspot' detection? Next time I want a call graph I will check it out.
 

Quaoar

Administrator
Staff member
I do find myself spending more time on opencascade than cadseer these days.
What is your current focus on OpenCascade? Just curious.

So your usage of VTune has been primarily for 'hotspot' detection
Yep. I also tried some of its extra tools (Adviser or whatever it was called), but the effect was almost nothing. For hunting down memory leaks, I still have nothing better than diagnostic macros spread over the code base. It feels pretty much like I'm in the stone age with profiling memory usage, but, to be fair, I've never taken the time to sit down and read about best practices for solving such problems.
 

blobfish

CAD community veteran
What is your current focus on OpenCascade? Just curious.
I have been spinning my wheels in the offset api. I will make a new post as not to further hi-jack this thread.

For hunting down memory leaks, I still have nothing better than diagnostic macros spread over the code base. It feels pretty much like I'm in the stone age with profiling memory usage, but, to be fair, I've never taken the time to sit down and read about best practices for solving such problems.
Welcome to my world.:) A while back I found a heisenbug in OCC. I could never trigger a segfault inside GDB with a debug build, but 1 out of every 4 runs with a release build would segfault .... like clockwork! :confused: I had previously read about address sanitizer, so I created a specific build of OCC to enable it. The very first run of my test, address sanitizer pinpointed the problem. I was blown away. I now always have an address sanitizer build of OCC ready. I don't know if a windows experience will be as good, but it is worth a try.
address sanitizer
 
Top