Name
HPL_pdrpanrlT Right-looking recursive panel factorization.
Synopsis
#include "hpl.h"
void
HPL_pdrpanrlT(
HPL_T_panel *
PANEL,
const int
M,
const int
N,
const int
ICOFF,
double *
WORK
);
Description
HPL_pdrpanrlT
recursively  factorizes  a panel of columns  using  the
recursive Right-looking variant of the one-dimensional algorithm. The
lower  triangular  N0-by-N0  upper  block of the panel  is stored  in
transpose form.
 
Bi-directional  exchange  is  used  to  perform  the  swap::broadcast
operations  at once  for one column in the panel.  This  results in a
lower number of slightly larger  messages than usual.  On P processes
and assuming bi-directional links,  the running time of this function
can be approximated by (when N is equal to N0):                      
 
   N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
   N0^2 * ( M - N0/3 ) * gam2-3
 
where M is the local number of rows of  the panel, lat and bdwth  are
the latency and bandwidth of the network for  double  precision  real
words, and  gam2-3  is  an estimate of the  Level 2 and Level 3  BLAS
rate of execution. The  recursive  algorithm  allows indeed to almost
achieve  Level 3 BLAS  performance  in the panel factorization.  On a
large  number of modern machines,  this  operation is however latency
bound,  meaning  that its cost can  be estimated  by only the latency
portion N0 * log_2(P) * lat.  Mono-directional links will double this
communication cost.
Arguments
PANEL   (local input/output)          HPL_T_panel *
        On entry,  PANEL  points to the data structure containing the
        panel information.
M       (local input)                 const int
        On entry,  M specifies the local number of rows of sub(A).
N       (local input)                 const int
        On entry,  N specifies the local number of columns of sub(A).
ICOFF   (global input)                const int
        On entry, ICOFF specifies the row and column offset of sub(A)
        in A.
WORK    (local workspace)             double *
        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
See Also
HPL_dlocmax,
HPL_dlocswpN,
HPL_dlocswpT,
HPL_pdmxswp,
HPL_pdpancrN,
HPL_pdpancrT,
HPL_pdpanllN,
HPL_pdpanllT,
HPL_pdpanrlN,
HPL_pdpanrlT,
HPL_pdrpancrN,
HPL_pdrpancrT,
HPL_pdrpanllN,
HPL_pdrpanllT,
HPL_pdrpanrlN,
HPL_pdfact.