DEMOINTLAB_RESIDUALS Fast and accurate matrix residuals

Contents

Two new algorithms prodK and spProdK

From INTLAB Version 14 new and versatile algorithms to compute approximations and/or verified bounds for matrix residuals are included. They supersede previous algorithms such as AccDot, Dot_, Dot2, Dot2_, DotK, and others, see the demo file 'daccsumdot' (Accurate summation and dot products). The old routines are left in INTLAB for upward compatibility.

Two new algorithms are added, namely prodK for full input and spProdK for sparse input. The algorithms are based on

M. Lange and S.M. Rump: Accurate floating-point matrix products, to appear

and the superb implementation of both algorithm is due to Marko Lange. The algorithms are very fast although entirely implemented in Matlab/Octave - as all algorithms in INTLAB. My dearest thanks to Marko for these outstanding algorithms.

Residuals

The accurate approximation and inclusion of residuals is mandotory and ubiquitous to compute verified bounds successfully and accurately. A typical example are verified bounds for the solution of a linear system. Consider

setround(0)
n = 1000
A = randmat(n,1e12);
cnd = cond(A)
b = A*ones(n,1);
n =
        1000
cnd =
   1.0000e+12

A random matrix with anticipated condition number cnd=1e12 is generated together with a right hand side such that the solution of the linear system Ax=b is close to all 1's. We first compute an inclusion of A^-1 b by our standard algorithm verifylss:

X = verifylss(A,b);

The inclusion X is close to maximally accurate:

rel = relerr(X);
relX = [median(rel) max(rel)]
relX =
   1.1102e-16   2.2204e-16

The approximation by Matlab's backslash

xs = A\b;
rel = relerr(xs,X);
relxs = [median(rel) max(rel)]
relxs =
   5.4985e-06   2.1903e-04

is correct to 5 digits in the median and 3-4 digits in the worst case. The classical residual iteration should improve the accuracy of xs:

for k=1:2
  xs = xs - A\(A*xs-b);
  rel = relerr(xs,X);
  relxs = [median(rel) max(rel)]
end
relxs =
   7.6928e-07   2.5873e-05
relxs =
   6.2114e-07   1.8975e-05

Accurate residuals

As is well known Skeel proved that one residual iteration in working precision ensures backward stability, however, the accuracy of the approximation is only improved when calculating the residual in some higher precision. That is done by prodK:

for k=1:3
  xs = xs - A\prodK(A,xs,-1,b,2);
  rel = relerr(xs,X);
  relxs = [median(rel) max(rel)]
end
relxs =
   4.6737e-12   1.7652e-10
relxs =
   2.2204e-16   2.9977e-15
relxs =
   1.1102e-16   2.2204e-16

As can be seen three residual iterations suffice for an approximation with close to maximal accuracy. The syntax of algorithm prodK is

r = prodK(a1,b1,a2,b2,...,am,bm,k)

with m>=1. The input pairs may be real or complex.

The pair products ai*bi must all have equal dimension pxq for i=1..m. An exeption is that the first factor ai may be scalar in which case bi is of dimension pxq.

The final input parameter k specifies that the computation is 'as if' computed in (k/2+1)-fold precision. That means for k=2 as above the result is as computed in extended precision, for k=4 as if in 3-fold precision, and so forth.

A typical application is the matrix residual with respect to an approximate inverse:

R = inv(A);
C = prodK(R,A,-1,eye(n),2);
maxC = max(abs(C(:)))
maxC =
   1.1891e-04

The residual is small, however, that is an approximation, not inclusion of the residual.

Accurate inclusion of residuals

An inclusion uses the same syntax as before except that there is a second output parameter for the error:

[r,e] = prodK(a1,b1,a2,b2,...,am,bm,k)

It follows that the true product P := sum(i=1..m,ai*bi) satisfies

r - e  <= P <= r + e .

An application to the matrix residual is

[C,E] = prodK(R,A,-1,eye(n),2);
setround(1)
normRes = norm(abs(C)+E,inf)
setround(0)
normRes =
   4.3839e-03

It follows that the inf-norm of the residual RA-I is strictly less than 1 so that RA-I is convergent and both R and A are nonsingular.

Performance of prodK

Because of its importance we offered previously several dot product routines, for example Dot2_ with the syntax

P = Dot2_(A1,B1,A2,B2,...)

On return, P is an inclusion of A1*B1 + A2+B2 + ... with accuracy as if computed in 2-fold precision. The input maybe real or complex. The computational precision is the same as of prodK for k=2, so we can compare the performance directly, first for matrix-vector multiplication:

N = [10 30 100 300 1000 3000 10000];
tDot2_ = zeros(1,length(N));
tprodK = zeros(1,length(N));
for i=1:length(N)
  n = N(i);
  A1 = randn(n);
  B1 = randn(n,1);
  A2 = randn(n);
  B2 = randn(n,1);
  tic
  C = Dot2_(A1,B1,A2,B2);
  tDot2_(i) = toc;
  tic
  [C,E] = prodK(A1,B1,A2,B2,2);
  tprodK(i) = toc;
end
disp(sprintf('matrix*vector   n %6d%6d%6d%6d%6d%6d%6d',N))
disp(repmat('-',1,60))
disp(sprintf('t(Dot2_)/t(prodK) %6.1f%6.1f%6.1f%6.1f%6.1f%6.1f%6.1f',tDot2_./tprodK))
matrix*vector   n     10    30   100   300  1000  3000 10000
------------------------------------------------------------
t(Dot2_)/t(prodK)    2.1   4.2   6.3   5.1   2.1   1.2   0.7

Here both Dot2_ and prodK compute inclusions of the true result.

We display the time ratio t(Dot2_)/t(prodK), i.e., a ratio 2 means that the new prodK is twice as fast as Dot2_. For medium size dimensions the new prodK is faster than Dot2_, for very small and very large vector length Dot2_ is faster. That changes for matrix multiplications:

N = [10 30 100 300 1000];
tDot2_ = zeros(1,length(N));
tprodK = zeros(1,length(N));
for i=1:length(N)
  n = N(i);
  A1 = randn(n);
  B1 = randn(n);
  A2 = randn(n);
  B2 = randn(n);
  tic
  C = Dot2_(A1,B1,A2,B2);
  tDot2_(i) = toc;
  tic
  [C,E] = prodK(A1,B1,A2,B2,2);
  tprodK(i) = toc;
end
disp(sprintf('matrix*matrix   n %6d%6d%6d%6d%6d',N))
disp(repmat('-',1,50))
disp(sprintf('t(Dot2_)/t(prodK) %6.1f%6.1f%6.1f%6.1f%6.1f',tDot2_./tprodK))
matrix*matrix   n     10    30   100   300  1000
--------------------------------------------------
t(Dot2_)/t(prodK)    5.9   3.0  16.7  31.6 122.8

With growing dimension the new prodK becomes increasingly faster than Dot2_. That is similar for complex data:

N = [10 30 100 300 1000];
tDot2_ = zeros(1,length(N));
tprodK = zeros(1,length(N));
for i=1:length(N)
  n = N(i);
  A1 = randomc(n);
  B1 = randomc(n);
  A2 = randomc(n);
  B2 = randomc(n);
  tic
  C = Dot2_(A1,B1,A2,B2);
  tDot2_(i) = toc;
  tic
  [C,E] = prodK(A1,B1,A2,B2,2);
  tprodK(i) = toc;
end
disp(sprintf('complex matrix  n %6d%6d%6d%6d%6d',N))
disp(repmat('-',1,50))
disp(sprintf('t(Dot2_)/t(prodK) %6.1f%6.1f%6.1f%6.1f%6.1f',tDot2_./tprodK))
complex matrix  n     10    30   100   300  1000
--------------------------------------------------
t(Dot2_)/t(prodK)    1.1   2.2  16.0  27.2  82.3

Performance of prodK versus mp of Advanpix

Another possibility to calculate accurate approximations of residuals is using the multiple precision package by Advanpix. Here the computing precision can be freely defined. A particular fast implemention uses mp.Digits(34) corresponding to extended precision according to the IEEE754 arithmetic standard. The operations respect the rounding mode.

However, Advanpix is written in highly optimized C-code whereas INTLAB is entirely written in Matlab/Octave suffering from interpretation overhead. Nevertheless we compare the performance of Advanpix and prodK, first for matrix-vector multiplication:

N = [10 30 100 300 1000 3000 10000];
tmp = zeros(1,length(N));
tprodK = zeros(1,length(N));
mp.Digits(34);
for i=1:length(N)
  n = N(i);
  A1 = randn(n);
  B1 = randn(n,1);
  A2 = randn(n);
  B2 = randn(n,1);
  tic
  double(mp(A1)*B1+mp(A2)*B2);
  tmp(i) = toc;
  tic
  prodK(A1,B1,A2,B2,2);
  tprodK(i) = toc;
end
disp(sprintf('matrix*vector  n %6d%6d%6d%6d%6d%6d%6d',N))
disp(repmat('-',1,60))
disp(sprintf('t(mp)/t(prodK)   %6.1f%6.1f%6.1f%6.1f%6.1f%6.1f%6.1f',tmp./tprodK))
matrix*vector  n     10    30   100   300  1000  3000 10000
------------------------------------------------------------
t(mp)/t(prodK)      1.0   1.2   1.5   4.0   1.9   2.6   1.7

Both mp in mp.Digits(34) as prodK with k=2 compute approximations as if computed in extended precision.

Despite the interpretation overhead the new prodK is often twice as fast as mp by Advanpix. Following is a similar test for matrix-matrix multiplication.

N = [10 30 100 300 1000];
tmp = zeros(1,length(N));
tprodK = zeros(1,length(N));
mp.Digits(34);
for i=1:length(N)
  n = N(i);
  A1 = randn(n);
  B1 = randn(n);
  A2 = randn(n);
  B2 = randn(n);
  tic
  double(mp(A1)*B1+mp(A2)*B2);
  tmp(i) = toc;
  tic
  prodK(A1,B1,A2,B2,2);
  tprodK(i) = toc;
end
disp(sprintf('matrix*matrix  n %6d%6d%6d%6d%6d',N))
disp(repmat('-',1,60))
disp(sprintf('t(mp)/t(prodK)   %6.1f%6.1f%6.1f%6.1f%6.1f',tmp./tprodK))
matrix*matrix  n     10    30   100   300  1000
------------------------------------------------------------
t(mp)/t(prodK)      0.1   2.2   9.9  30.1  60.6

Now for larger dimensions the new prodK becomes significantly faster than Andvanpix' mp.

Since Advanpix in mp.Digits(34) corresponding to extended precision respects the rounding mode, it is also possible to compute verified inclusions. Following is an example of matrix-matrix products:

N = [10 30 100 300 1000];
tmp = zeros(1,length(N));
tprodK = zeros(1,length(N));
mp.Digits(34);
for i=1:length(N)
  n = N(i);
  A1 = randn(n);
  B1 = randn(n);
  A2 = randn(n);
  B2 = randn(n);
  tic
  setround(-1)
  Cdown = double(mp(A1)*B1+mp(A2)*B2);
  setround(1)
  Cup = double(mp(A1)*B1+mp(A2)*B2);
  setround(0)
  tmp(i) = toc;
  tic
  [C,E] = prodK(A1,B1,A2,B2,2);
  tprodK(i) = toc;
end
disp(sprintf('matrix*matrix  n %6d%6d%6d%6d%6d',N))
disp(repmat('-',1,60))
disp(sprintf('t(mp)/t(prodK)   %6.1f%6.1f%6.1f%6.1f%6.1f',tmp./tprodK))
matrix*matrix  n     10    30   100   300  1000
------------------------------------------------------------
t(mp)/t(prodK)      0.4   4.3  16.3  64.0 108.0

As can be seen our prodK is significantly faster than mp despite interpretation overhead.

Performance improvement using Karatsuba's method

For matrix with larger dimension prodK uses Karatruba's method. The threshold for k=2 is n=333. The effect can be seen as follows:

for n=333:334
  A = randn(n);
  tic
  prodK(A,A,2);
  toc
end
Elapsed time is 0.013243 seconds.
Elapsed time is 0.012353 seconds.

For n=334 prodK switches to Karatsuba's method with a significant gain in computing time.

Results represented by an unevaluated sum

A higher precision may be simulated by an unevaluated sum. Consider the algorithm InvIllco from

S.M. Rump: Inversion of extremely ill-conditioned matrices in floating-point,
  Japan J. Indust. Appl. Math. (JJIAM), 26:249-277, 2009.

It shows how to invert extremely ill-conditioned matrices in working precision, i.e., double precision (binary64). We first generate a very ill-conditioned random matrix.

n = 50;
A = randmat(n,1e25);    % random matrix with cond(A)~1e25
R = inv(A);             % no digit is correct
mp.Digits(34);
Rmp = double(inv(mp(A)));
v = 1:5;
Rv = R(v,v)
Rmpv = Rmp(v,v)
Rv =
  -7.8221e+13   2.0673e+14  -2.2503e+14  -7.4331e+13  -2.8509e+14
   2.0673e+14  -5.0601e+14   5.9458e+14   2.3239e+14   6.9468e+14
  -2.2503e+14   5.9458e+14  -6.4740e+14  -2.1399e+14  -8.1993e+14
  -7.4331e+13   2.3239e+14  -2.1399e+14  -3.8636e+13  -3.2325e+14
  -2.8509e+14   6.9468e+14  -8.1993e+14  -3.2325e+14  -9.5344e+14
Rmpv =
   3.5911e+17  -1.6119e+18   1.0358e+18  -2.4891e+17   2.2741e+18
  -1.6119e+18   7.2354e+18  -4.6496e+18   1.1173e+18  -1.0208e+19
   1.0358e+18  -4.6496e+18   2.9879e+18  -7.1798e+17   6.5597e+18
  -2.4891e+17   1.1173e+18  -7.1798e+17   1.7258e+17  -1.5764e+18
   2.2741e+18  -1.0208e+19   6.5597e+18  -1.5764e+18   1.4402e+19

As can be seen the approximate "inverse" R computed by Matlab's inv is off by 5 orders of magnitude. Nevertheless it has been shown in the paper above that R contains some useful information:

k = 2;
P = prodK(R,A,k);       % accurate product R*A
condP = cond(P)
condP =
   7.9365e+09

In the second line we use R as a preconditioner. Although it is off by orders of magnitude from the true inverse, the condition number of P drops by about the relative rounding error unit u~1e-16. Hence an approximate inverse Xof P should be correct to about 7 digits.

X = inv(P);             % Matlab's approximate inverse

% The call
%
%   r = prodK(a1,b1,a2,b2,...,am,bm,'OutputTerms',ell,k)
%
% specifies that the result is stored in a cell array r with ell terms.
% Similarly,
%
%   [r,e] = prodK(a1,b1,a2,b2,...,am,bm,'OutputTerms',ell,k)
%
% stores an approximation to the result Res := sum(i=1..m,ai*bi) in the cell
% array r together with an error term e such that
%
%   | sum(i=1..ell,r{i}) - Res | <= e .
%
% Using X from above we compute an approximate inverse of the very
% ill-conditioned matrix A as follows:

R2 = prodK(X,R,k,'OutputTerms',k)  % new approximate inverse stored in 2 terms

% We multiplied X by R and store the result as an unevaluated sum R{1}+R{2}
% of 2 matrices. The product is calculated with k=2, i.e., (k/2+1) = 2-fold
% precision, and the number of output terms is k=2, i.e., R2 is a cell
% array with 2 elements as well.
%
% In theory XR = (RA)^-1 R = A^-1 R^-1 R = A^-1. And indeed
% using R{1}+R{2} as a preconditioner works well, extracting the hidden
% information Matlab's inverse:

P2 = prodK(R2,A,2);
rel = relerr(P2,eye(n));
format shorte
[median(rel(:)) max(rel(:))]
format short
R2 =
  1×2 cell array
    {50×50 double}    {50×50 double}
ans =
   8.3128e-10   3.7532e-07

Note that in the first line the first parameter is R2, i.e., the unevaluated sum R{1}+R{2}. We can use R2 directly in prodK as parameter. The resule P2 is R{1}*A + R{2}*A computed in 2-fold precision.

Together with prodK the approximate inverse R{1}+R{2} can be used to solve a linear system as well.

It is shown in the paper above that this method can be iterated into an unevaluated sum of more than 2 summands. To that end we construct an extremely ill-conditioned matrix:

n = 50;
A = randmat(n,1e250);   % random matrix with cond(A)~1e250

We used

S.M. Rump: A Class of Arbitrarily Ill-conditioned Floating-Point Matrices,
  SIAM Journal on Matrix Analysis and Applications (SIMAX, 12(4):645-653, 1991.

to construct such an extremely ill-conditioned matrix.

The condition number cond(A)~1e250 implies that changing the input matrix A in the 250-th digit, the entries of its inverse change in the first place. As indicated by the mp-toolbox the true condition is even larger. Again we use the mp-toolbox with high precision to judge Matlab's approximate inverse R:

R = inv(A);                 % off by 260 orders of magnitude from A^-1
mp.Digits(300);             % using 300 digits precision
condAmp = double(cond(mp(A)))
Rmp = double(inv(mp(A)));   % inverse using 300 digits precision
v = 1:5;
Rv = R(v,v)
Rmpv = Rmp(v,v)
condAmp =
  2.1844e+283
Rv =
   1.0e+06 *
   -5.3010   -0.5669    0.7419   -1.9851    2.9909
    6.4349    0.6807   -0.9041    2.4010   -3.6278
    7.0049    0.7379   -0.9856    2.6101   -3.9479
    9.3923    0.9773   -1.3269    3.4858   -5.2888
   -4.4860   -0.4851    0.6254   -1.6861    2.5331
Rmpv =
  1.0e+269 *
    0.1727    0.0706   -0.0005    0.1249   -0.1175
    1.6141    0.6593   -0.0045    1.1673   -1.0977
    2.5121    1.0261   -0.0070    1.8166   -1.7084
    6.2744    2.5628   -0.0176    4.5374   -4.2671
    1.4568    0.5950   -0.0041    1.0535   -0.9907

Matlab's warning about a condition number is a little bit underestimated.

Matlab's approximate inverse R is off by more then 260 orders of magnitude. Nevertheless the ratio between its entries is important and contains useful information. This is unveiled by the following code:

k = 1;
wng = warning;
warning off
tic
while norm(prodK(R,A,-1,eye(n),2*k-1),inf) > 1e-12
  k = k+1;
  P = prodK(R,A,2*k-1);
  X = inv(P);
  R = prodK(X,R,2*k-1,'OutputTerms',k);
end
time = toc
warning(wng)
R
norm(prodK(R,A,-1,eye(n),2*k-1),inf)
time =
    0.3508
R =
  1×19 cell array
  Columns 1 through 4
    {50×50 double}    {50×50 double}    {50×50 double}    {50×50 double}
  Columns 5 through 8
    {50×50 double}    {50×50 double}    {50×50 double}    {50×50 double}
  Columns 9 through 12
    {50×50 double}    {50×50 double}    {50×50 double}    {50×50 double}
  Columns 13 through 16
    {50×50 double}    {50×50 double}    {50×50 double}    {50×50 double}
  Columns 17 through 19
    {50×50 double}    {50×50 double}    {50×50 double}
ans =
   3.7475e-14

The iteration produces an approximate inverse R stored in 20 unevaluated terms. As a preconditioner it produces RA equal to the identity matrix with an error close to u~1e-16.

The implemention is particularly clear because the unevaluated sum can be used directly as parameter of prodK.

Moreover the computing time is less than half a second. We mention that the previous implementation of InvIllco in Version 13 of INTLAB used ProdKL and needed more than 2 minutes.

Residuals with sparse data

In principle prodK can be used for sparse matrices as well, however, the time penalty can be very large. To that end Marko Lange implemented a special version spProdK for sparse input. We offer both routines separately rather than automizing because, depending on the sparsity, there are cases where it is better to use prodK.

The syntax is as of prodK:

r = spProdK(a1,b1,a2,b2,...,am,bm,k)

for an approximation r of the true product P := sum(i=1..m,ai*bi), and

[r,e] = spProdK(a1,b1,a2,b2,...,am,bm,k)

for an inclusion r-e <= P <= r+e.

We generate matrices of dimension 5000 with 0.1 percent density.

n = 5000;
A = sprand(n,n,0.001);
close
spy(A)

The timing for prodK and spProdK is as follows:

tic
prodK(A,A,2);
tprodK = toc
tic
spProdK(A,A,2);
tspProdK = toc
tprodK =
    8.6533
tspProdK =
    0.0942

As can be seen the sparse version spProdK is faster by 2 orders of magnitude than prodK.

Performance of spProdK versus mp of Advanpix for sparse data

We also compare to the mp-toolbox using extended precision:

N = [1000 3000 10000 30000];
tmp = zeros(1,length(N));
tspProdK = zeros(1,length(N));
mp.Digits(34);
for i=1:length(N)
  n = N(i);
  A = sprand(n,n,0.001);
  tic
  double(mp(A)*A-A);
  tmp(i) = toc;
  tic
  spProdK(A,A,-1,A,2);
  tspProdK(i) = toc;
end
disp(sprintf('matrix*matrix  n   %6d%6d%6d%6d%6d',N))
disp(repmat('-',1,60))
disp(sprintf('t(mp)/t(spProdK)   %6.1f%6.1f%6.1f%6.1f%6.1f',tmp./tspProdK))
matrix*matrix  n     1000  3000 10000 30000
------------------------------------------------------------
t(mp)/t(spProdK)      0.1   0.8   1.8   1.6

As can be seen the highly optimized C-code in the Advanpix toolbox is for small dimensions faster than Marko Lange's Matlab/Octave code spProdK, for larger dimension spProdK outperforms the mp-toolbox.

The mp-toolbox with mp.Digits(34) corresponding to extended precision respects the rounding mode, however, two multiplications are necessary to compute the bounds.

N = [1000 3000 10000 30000];
tmp = zeros(1,length(N));
tprodK = zeros(1,length(N));
mp.Digits(34);
for i=1:length(N)
  n = N(i);
  A = sprand(n,n,0.001);
  tic
  setround(-1)
  Cdown = double(mp(A)*A-A);
  setround(1)
  Cup = double(mp(A)*A-A);
  setround(0)
  tmp(i) = toc;
  tic
  [C,E] = spProdK(A,A,-1,A,2);
  tspProdK(i) = toc;
end
disp(sprintf('matrix*matrix  n   %6d%6d%6d%6d%6d',N))
disp(repmat('-',1,60))
disp(sprintf('t(mp)/t(spProdK)   %6.1f%6.1f%6.1f%6.1f%6.1f',tmp./tspProdK))
matrix*matrix  n     1000  3000 10000 30000
------------------------------------------------------------
t(mp)/t(spProdK)      0.1   1.0   1.5   1.4

As can be seen the situation is very similar as before, for reasonable dimensions spProdK is faster than Advanpix' mp toolbox.

Enjoy INTLAB

INTLAB was designed and written by S.M. Rump, head of the Institute for Reliable Computing, Hamburg University of Technology. Suggestions are always welcome to rump (at) tuhh.de