In this study, the results of parallelization of a 3-D dual code (Thin Layer, Parabolized Navier-Stokes solver) for solving supersonic turbulent flow around body and wing-body combinations are presented. As a serial code, TLNS solver is very time consuming and takes a large part of memory due to the iterative and lengthy computations. Also for complicated geometries, an exceeding number of grid points are required that results in larger serial computation times. Therefore parallelizing this code would bring about a large saving in computer time and memory. In this study, a cluster of 16 computational nodes with 2.4 and 2.8 GHz, P4 CPU has been used. Also MPI library is used for communicating data among processors. Domain is partitioned in a 1-D form in longitudinal, radial and circumferential directions and results are compared with those of serial computations. There are several methods for data communication among processors such as blocking send and non-blocking send. The performance of each method is evaluated and the best method for the problem at hand is determined. The results are compared in terms of run time, speed-up and efficiency for executing the parallel code on 1, 2, 3, 4, 8, 12 and 16 processors. Also the parallel results are compared with serial results and the correctness of the parallel code is proved for each case. The effect of different partitioning direction and their interaction with the turbulence modeling is studied and the best choice is shown. The limitations of using Baldwin-Lomax turbulence model in a parallel program are discussed and a remedy is presented.