Divand

(Difference between revisions)
Jump to: navigation, search
 Revision as of 11:34, 5 August 2014 (view source)Alex (Talk | contribs) (→Octave)← Older edit Revision as of 15:12, 9 May 2016 (view source)Alex (Talk | contribs) Newer edit → Line 14: Line 14: Barth, A., Beckers, J.-M., Troupin, C., Alvera-Azcárate, A., and Vandenbulcke, L.: [http://www.geosci-model-dev.net/7/225/2014/gmd-7-225-2014.pdf divand-1.0: n-dimensional variational data analysis for ocean observations], Geosci. Model Dev., 7, 225-241, doi:[http://dx.doi.org/10.5194/gmd-7-225-2014 10.5194/gmd-7-225-2014], 2014. Barth, A., Beckers, J.-M., Troupin, C., Alvera-Azcárate, A., and Vandenbulcke, L.: [http://www.geosci-model-dev.net/7/225/2014/gmd-7-225-2014.pdf divand-1.0: n-dimensional variational data analysis for ocean observations], Geosci. Model Dev., 7, 225-241, doi:[http://dx.doi.org/10.5194/gmd-7-225-2014 10.5194/gmd-7-225-2014], 2014. - - The submitted manuscript is also available at Geoscientific Model Development 'Discussion', but please cite the final version (in Geoscientific Model Development) rather than the discussion paper. = Requirements = = Requirements =

Revision as of 15:12, 9 May 2016

divand performs an n-dimensional variational analysis of arbitrarily located observations. Observation will be interpolated on a curvilinear grid in 2, 3 or more dimensions.

Please cite this paper as follows if you use divand in a publication:

Barth, A., Beckers, J.-M., Troupin, C., Alvera-Azcárate, A., and Vandenbulcke, L.: divand-1.0: n-dimensional variational data analysis for ocean observations, Geosci. Model Dev., 7, 225-241, doi:10.5194/gmd-7-225-2014, 2014.

Requirements

Your need GNU Octave (version 3.4 or later) or MATLAB to run divand.

Installing

Octave

Inside octave, you can download and install the package by issuing:

pkg install -verbose -forge -auto divand


See "help pkg" and the documentation on installing packages for more options.

Matlab

The following instructions work also for Octave if you do not want to use the package manager.

addpath('/path/to/divand/inst');


Testing

A test script test_divand is included to verify the correct functioning of the toolbox. The script should be run in a Octave or MATLAB session. All tests should pass without warning or error.

run test_interp_1d: (max difference=2.22045e-16)   OK
run test_interp_2d: (max difference=2.22045e-16)   OK
run test_interp_regular: (max difference=2.22045e-16)   OK
run test_sparse_diff: (max difference=0)   OK
run test_1dvar:   OK
run test_2dvar: (max difference=4.28659e-16)   OK
run test_2dvar_check: (max rms=2.39788e-16)   OK
run test_2dvar_adv:   OK
run test_2dvar_rellen:   OK
run test_2dvar_lenxy:   OK
run test_2dvar_check_correrr: (max rms=7.45461e-16)   OK
run test_2dvar_constrain: (max rms=9.41909e-10)   OK
run test_2dvar_cyclic: (max rms=2.71352e-14)   OK
run test_2dvar_eof_var:   OK
run test_2dvar_eof_check:   OK
run test_3dvar: (max difference=0.0182573)   OK
run test_3dvar_large_stacked: (max difference=2.40796e-15)   OK
run test_4dvar: (max difference=0.0111807)   OK


Documentation

The main routine of this toolbox is called divand which performs an n-dimensional variational analysis of arbitrarily located observations. It can be called with a different number of input and output arguments:

 [fi,err,s] = divand(mask,pmn,xi,x,f,len,lambda,...);
[fi,err] = divand(mask,pmn,xi,x,f,len,lambda,...);
[fi,s] = divand(mask,pmn,xi,x,f,len,lambda,...);


Input arguments

mask
binary mask delimiting the domain. 1 is inside and 0 outside. This parameters has the same number of dimensions and grid points as the domain. For oceanographic application, this is the land-sea mask.
pmn
scale factor of the grid. pmn is a cell array with n arrays (each array has the same size as the mask). Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension.
xi
cell array with n arrays (each array has the same size as the mask). Every element represents a coordinate of the final grid on which the observations are interpolated
x
cell array with n vectors (m-by-1 vector). Every vector represents a coordinate of the observations
f
vector with the value of the observations *minus* the background estimate (m-by-1 array). (see note)
len
correlation length (units must be consistent with scale factors)
• if len is scalar, then the correlation length is the same in all directions and everywhere in the domain
• if len is cell array of n scalars, then the correlation length can depend on the direction but for a given direction the correlation length is the same everywhere in the domain. The option is for example used when analysing 3D domain (longitude, latitude and depth) since the vertical scale is much smaller than the horizontal scale.
• if len is cell array of n arrays (each array has the same size as the mask), then the correlation length can depend on the direction and on the location. The individual arrays should have the same size as the domain.
lambda
signal-to-noise ratio of observations (if lambda is a scalar). lambda is defined as the ratio of background error variance over the observation error variance. The larger this value is, the closer is the interpolated field fi to the observation.
• If lambda is a scalar, then R = 1/lambda I, where R is the observation error covariance matrix (scaled by the background error variance),
• If lambda is a vector, then R = diag(lambda)
• If lambda is a matrix (or matrix-like object such CovarParam), then R = lambda

Optional input arguments

Optional input arguments specified as pairs of keyword and values:

'velocity', vel
cell array with n arrays (each array has the same size as the mask) representing the velocity of advection constraint. The default is no advection constraint
'alpha'
alpha is vector of coefficients multiplying various terms in the cost function.

The first element multiplies the norm. The other i-th element of alpha multiplies the (i+1)-th derivative. Per default, the highest derivative is m = ceil(1+n/2) where n is the dimension of the problem. The values of alpha are per default the (m+1)th row of the Pascal triangle:

           m=0         1
m=1       1   1
m=1     1   2   1     (n=1,2)
m=2   1   3   3   1   (n=3,4)
...


'diagnostics'
0 or 1 turns diagnostic and debugging information on (1) or off (0, default). If on, they will be returned as the last output argument
'constraint'
a structure with user specified constraint
'moddim'
modulo for cyclic dimension (vector with n elements). Zero is used for non-cyclic dimensions. Halo points should not be included for cyclic dimensions. For example if the first dimension is cyclic, then the grid point corresponding to mask(1,j) should be between mask(end,1) (left neighbord) and mask(2,j) (right neighbord)
'fracdim'
fractional indices (n-by-m array). If this array is specified, then x and xi are not used.

Note: 'velocity' and 'constraint' may appear multiple times

Output

fi
the analysed field
err
error variance of the analysis field relative to the error variance of the background
s
structure with some diagnostics mainly for debugging purposes.

Note

If zero is not a valid first guess for your variable (as it is the case for e.g. ocean temperature), you have to subtract the first guess from the observations before calling divand and then add the first guess back in.

Advanced usage

Additional constraint

An arbitrary number of additional constraint can be included to the cost function which should have the following form:

$J_c(\vec x) = \sum_i \left(\mathbf C_i \vec x - \vec z_i \right)^T \mathbf Q_i^{-1} \left(\mathbf C_i \vec x - \vec z_i \right)$

For every constrain a structure with the following fields is passed to divand:

yo
the vector $\vec z_i$
H
the matrix $\mathbf C_i$
R
the matrix $\mathbf Q_i$ (symmetric and positive defined)

Internally the observations are also implemented as an additional constraint.

Example

Simple analytical example

see divand_simple_example.m

% A simple example of divand is 2 dimensions
% with observations from an analytical function

% observations
x = rand(75,1);
y = rand(75,1);
f = sin(x*6) .* cos(y*6);

% final grid
[xi,yi] = ndgrid(linspace(0,1,30));

% reference field
fref = sin(xi*6) .* cos(yi*6);

% all points are valid points
mask = ones(size(xi));

% this problem has a simple cartesian metric
% pm is the inverse of the resolution along the 1st dimension
% pn is the inverse of the resolution along the 2nd dimension

pm = ones(size(xi)) / (xi(2,1)-xi(1,1));
pn = ones(size(xi)) / (yi(1,2)-yi(1,1));

% correlation length
len = 0.1;

% signal-to-noise ratio
lambda = 20;

% fi is the interpolated field
fi = divand(mask,{pm,pn},{xi,yi},{x,y},f,len,lambda);

% plotting of results
subplot(1,2,1);
pcolor(xi,yi,fref);
shading flat,colorbar
ca = caxis;
set(findobj(gcf,'FontSize',10),'FontSize',16)
title('Reference field and observation loc.');
hold on
plot(x,y,'k.');

subplot(1,2,2);
hold on
pcolor(xi,yi,fi);
shading flat,colorbar
caxis(ca);
set(findobj(gcf,'FontSize',10),'FontSize',16)
title('Interpolated field');

set(gcf,'PaperUnits','centimeters','PaperPosition',[0 0 25 12 ]);
print -dpng divand_simple_example.png


Simple realistic example

A realistic example of divand in 2 dimensions with Salinity observations in the Mediterranean Sea at 30m is provided. Observations can be downloaded from http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/data.dat and the bathymetry is available at http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/diva_bath.nc.


% load observations

load data.dat

% extract columns of data
lonobs = data(:,1);
latobs = data(:,2);
S = data(:,3);

% load bathymetry
% bath is negative in water and positive in air
bat = ncread('diva_bath.nc','bat');
lon = ncread('diva_bath.nc','lon');
lat = ncread('diva_bath.nc','lat');

% compute grid metric
% pm and pn are in meters^-1
[lon,lat] = ndgrid(lon,lat);
[pm,pn] = divand_metric(lon,lat);

% compute mean and anomalies
Smean = mean(S);
Sanom = S - mean(S);

% correlation length (in meters)
len = 200e3;

% signal-to-noise ratio
lambda = 0.5;

% land-sea mask
% mask everything below 30 m
mask = bat < -30;

% mask for the analysis
maska = mask;
% remove Atlantic
maska(1:60,130:end) = 0;
% remove Black Sea
maska(323:end,100:end) = 0;

% make analysis
Sa =  divand(maska,{pm,pn},{lon,lat},{lonobs,latobs},Sanom,len,lambda);

% add mean back
Sa2 = Sa + Smean;

% create plots

% color axis
ca = [36.2520   39.4480];

% aspect ratio
ar = [1  cos(mean(lat(:)) * pi/180) 1];

subplot(2,1,1);
scatter(lonobs,latobs,10,S)
caxis(ca);
colorbar
hold on, contour(lon,lat,mask,0.5,'k')
xlim(lon([1 end]))
ylim(lat([1 end]))
title('Observations');
set(gca,'DataAspectRatio',ar,'Layer','top')

subplot(2,1,2);
pcolor(lon,lat,Sa2), shading flat,colorbar
hold on
plot(lonobs,latobs,'k.','MarkerSize',1)
caxis(ca)
contour(lon,lat,mask,0.5,'k')
xlim(lon([1 end]))
ylim(lat([1 end]))
title('Analysis');
set(gca,'DataAspectRatio',ar,'Layer','top')

set(gcf,'PaperUnits','centimeters','PaperPosition',[0 0 15 15 ]);
print -dpng divand_realistic_example.png

This example gives the following output:

Determining the parameters

The parameter lambda and parameter len are crucial for the analysis. lambda corresponds to the signal-to-noise ratio (variance of background error over variance of observation error). Therefore, its value depends on how accurate and how representative the observations are. The value of len can sometimes be determined by physical arguments. One statistical way to determine the parameter(s) is to do a cross-validation.

1. choose, at random, a relatively small subset of observations (about 5%). This is the validation data set.
2. make the analysis without your validation data set
3. compare the analysis to your validation data set and compute the RMS difference
4. repeat steps 2 and 3 with different values of the parameters and try to minimize the RMS difference.

The minimization of the RMS difference can be automated using e.g. the octave/matlab function fminsearch. You can repeat all steps with a different validation data set to ensure that the optimal parameters values are robust.

Fun

A educational web application has been developed to reconstruct a field based on point "observations". The user must choose in an optimal way the location of 10 observations such that the analysed field obtained by divand based on these observation is as close as possible to the original field.

Troubleshooting

Cholesky factorization fails with the following error:

??? Error using ==> chol
CHOLMOD internal error.

Error in ==> CovarIS.factorize at 5
[self.RP, self.q, self.PP] = chol (self.IS);


This might happen if:

• memory is insufficient,
• matrix contains a NaN introduced through metric scale factor equal to zero or a user constraint.

ichol failed

You get the error:

Error using ichol
Encountered nonpositive pivot.


You might try diagonal compensation, e.g.

icholparam = struct('michol','on','diagcomp',0.0036);


or the incomplete Cholesky decomposition with threshold dropping (ict) and diagonal compensation:

icholparam = struct('type','ict','droptol',1e-6,'diagcomp',0.0001);


To pass the ichol parameters you must call divand like this:

[fi,s] = divand(mask,{pm,pn,pk},{xi,yi,zi},{x,y,z},f,len,lambda,...
'diagnostics',1,'primal',1,'inversion','pcg',...
'tol',1e-6,'maxit',50000,...
'compPC',@(iB,H,R) divand_pc_michol(iB,H,R,icholparam));


At bit of experimentation is necessary. Try to use lowest value of diagcomp (without triggering the error in ichol) and the lowest value of troptol for the ict decomposition (without running out of memory).