Profiling tools such as gprof and ssrun are used to analyze the run-time performance of a scientific application. The profiling is done in serial and in parallel mode using MPI as the communication interface. The application is a quantum chemistry program using Hartree Fock theory and Pulays DIIS method. An extensive set of test cases is taken into account in order to reach uniform conclusions. A known problem with decreased parallel scalability can thus be narrowed down to a single subroutine responsible for the reduction in Speed Up. The critical module is analyzed and a typical pitfall with triple matrix multiplications is identified. After overhauling the critical subroutine re-examination of the run-time behavior shows significantly improved performance and markedly improved parallel scalability. The lessons learned here might be of interest to other people working in similar fields with similar problems.