# Divand

(Difference between revisions)
 Revision as of 09:40, 7 June 2013 (view source)Jmb (Talk | contribs) (→Troubleshooting)← Older edit Latest revision as of 16:05, 26 March 2019 (view source)Gher (Talk | contribs) (→NEW) (49 intermediate revisions not shown) Line 1: Line 1: - = divand: n-dimensional variational data analysis = + + - divand performs an n-dimensional variational analysis of arbitrarily located observations. Observation will be interpolated on a curvilinear grid in 2, 3 or more dimensions. + + + + divand performs an n-dimensional variational analysis of arbitrarily located observations. Observation will be interpolated on a curvilinear grid in 2, 3 or more dimensions. + + Please cite this paper as follows if you use divand in a publication: + + Barth, A., Beckers, J.-M., Troupin, C., Alvera-Azcárate, A., and Vandenbulcke, L.: [http://www.geosci-model-dev.net/7/225/2014/gmd-7-225-2014.pdf divand-1.0: n-dimensional variational data analysis for ocean observations], Geosci. Model Dev., 7, 225-241, doi:[http://dx.doi.org/10.5194/gmd-7-225-2014 10.5194/gmd-7-225-2014], 2014. = Requirements = = Requirements = - Your need [http://www.octave.org GNU Octave] (version 3.4 or later) or MATLAB to run divand. + Your need [http://www.octave.org GNU Octave] (version 3.4 or later) or MATLAB to run divand. + + + = NEW = + + If you are not strongly relying on MATLAB or Octave, you should consider using the newest JULIA version: + + https://github.com/gher-ulg/divand.jl + + This will be the only DIVA version that will be supported/developped at some time. = Installing = = Installing = - # Download the source code from http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/divand-1.0.tar.gz. + == Octave == - # Extract the package + - # In MATLAB add the directory divand-x.y to the search path. In octave add the command to your .octaverc file (for testing it is sufficient if you move your current working directory to the divand-x.y directory): + Inside octave, you can download and install the package by issuing: + + pkg install -verbose -forge -auto divand + + See "help pkg" and the documentation on [https://www.gnu.org/software/octave/doc/interpreter/Installing-and-Removing-Packages.html installing packages] for more options. + + == Matlab == + + The following instructions work also for Octave if you do not want to use the package manager. + + * Download the source code from http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/divand-1.1.2.tar.gz (see also http://octave.sourceforge.net/divand/). + * Extract the package + * In MATLAB add the directory divand/inst to the search path. For testing it is sufficient if you move your current working directory to the divand/inst directory. - addpath('/path/to/divand-x.y.'); + addpath('/path/to/divand/inst'); = Testing = = Testing = Line 41: Line 74: = Documentation = = Documentation = - Issue in MATLAB/Octave the command help divand: + The main routine of this toolbox is called divand which performs an n-dimensional variational analysis of arbitrarily located observations. It can be called with a different number of input and output arguments: [fi,err,s] = divand(mask,pmn,xi,x,f,len,lambda,...); [fi,err,s] = divand(mask,pmn,xi,x,f,len,lambda,...); [fi,err] = divand(mask,pmn,xi,x,f,len,lambda,...); [fi,err] = divand(mask,pmn,xi,x,f,len,lambda,...); [fi,s] = divand(mask,pmn,xi,x,f,len,lambda,...); [fi,s] = divand(mask,pmn,xi,x,f,len,lambda,...); - - Perform an n-dimentional variational analysis of ariabitrarly located observations. - - == Input == - ;mask: binary mask delimiting the domain. 1 is inside and 0 outside. For oceanographic application, this is the land-sea mask. + == Input arguments == - ;pmn: scale factor of the grid. pmn is a cell array with n elements. Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension. + - ;xi: cell array with n elements. Every element represents a coordinate of the final grid on which the observations are interpolated + ;mask: binary mask delimiting the domain. 1 is inside and 0 outside. This parameters has the same number of dimensions and grid points as the domain. For oceanographic application, this is the land-sea mask. - ;x: cell array with n elements. Every element represents a coordinate of the observations + ;pmn: scale factor of the grid. pmn is a cell array with n arrays (each array has the same size as the mask). Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension. - ;f: value of the observations *minus* the background estimate (m-by-1 array). (see note) + ;xi: cell array with n arrays (each array has the same size as the mask). Every element represents a coordinate of the final grid on which the observations are interpolated - ;len: correlation length + ;x: cell array with n vectors (m-by-1 vector). Every vector represents a coordinate of the observations + ;f: vector with the value of the observations *minus* the background estimate (m-by-1 array). (see note) + ;len: correlation length (units must be consistent with scale factors) * if len is scalar, then the correlation length is the same in all directions and everywhere in the domain * if len is scalar, then the correlation length is the same in all directions and everywhere in the domain * if len is cell array of n scalars, then the correlation length can depend on the direction but for a given direction the correlation length is the same everywhere in the domain. The option is for example used when analysing 3D domain (longitude, latitude and depth) since the vertical scale is much smaller than the horizontal scale. * if len is cell array of n scalars, then the correlation length can depend on the direction but for a given direction the correlation length is the same everywhere in the domain. The option is for example used when analysing 3D domain (longitude, latitude and depth) since the vertical scale is much smaller than the horizontal scale. - * if len is cell array of n arrays, then the correlation length can depend on the direction and on the location. The individual arrays should have the same size as the domain. + * if len is cell array of n arrays (each array has the same size as the mask), then the correlation length can depend on the direction and on the location. The individual arrays should have the same size as the domain. - ;lambda: signal-to-noise ratio of observations (if lambda is a scalar). The larger this value is, the closer is the field fi to the observation. + ;lambda: signal-to-noise ratio of observations (if lambda is a scalar). lambda is defined as the ratio of background error variance over the observation error variance. The larger this value is, the closer is the interpolated field fi to the observation. - * If lambda is a scalar, then R = 1/lambda I, where R is the observation error covariance matrix, + * If lambda is a scalar, then R = 1/lambda I, where R is the observation error covariance matrix (scaled by the background error variance), * If lambda is a vector, then R = diag(lambda) * If lambda is a vector, then R = diag(lambda) - * If lambda is a matrix (or a CovarParam object), then R = lambda + * If lambda is a matrix (or matrix-like object such CovarParam), then R = lambda == Optional input arguments == == Optional input arguments == Optional input arguments specified as pairs of keyword and values: Optional input arguments specified as pairs of keyword and values: - ;'velocity', vel: velocity of advection constraint. The default is no-advection constraint + ;'velocity', vel: cell array with n arrays (each array has the same size as the mask) representing the velocity of advection constraint. The default is no advection constraint ;'alpha': alpha is vector of coefficients multiplying various terms in the cost function. ;'alpha': alpha is vector of coefficients multiplying various terms in the cost function. Line 98: Line 129: - == Note == + === Note === If zero is not a valid first guess for your variable (as it is the case for e.g. ocean temperature), you have to subtract the first guess from the observations before calling divand and then add the first guess back in. If zero is not a valid first guess for your variable (as it is the case for e.g. ocean temperature), you have to subtract the first guess from the observations before calling divand and then add the first guess back in. - + + + == Advanced usage == + + === Additional constraint === + + An arbitrary number of additional constraint can be included to the cost function which should have the following form: + + + $+ J_c(\vec x) = \sum_i \left(\mathbf C_i \vec x - \vec z_i \right)^T \mathbf Q_i^{-1} \left(\mathbf C_i \vec x - \vec z_i \right) +$ + + For every constrain a structure with the following fields is passed to divand: + + ;yo: the vector $\vec z_i$ + ;H: the matrix $\mathbf C_i$ + ;R: the matrix $\mathbf Q_i$ (symmetric and positive defined) + + Internally the observations are also implemented as an additional constraint. + == Example == == Example == + + + === Simple analytical example === see divand_simple_example.m see divand_simple_example.m -

% A simple example of divand is 2 dimensions                                                                                                                                                                                                                                                                                                                                                                                                            % A simple example of divand is 2 dimensions
Line 167:                                                                                                                                                                                                                                                                                                                                                                                                                                                            Line 220:

[[File:Divand simple example.png|800px]]                                                                                                                                                                                                                                                                                                                                                                                                                [[File:Divand simple example.png|800px]]
+
+                                                                                        === Simple realistic example ===
+
+                                                                                        A realistic example of divand in 2 dimensions with Salinity observations in the Mediterranean Sea at 30m is provided.
+
+
+
+
+
+
+
+                                                                                        % extract columns of data
+                                                                                        lonobs = data(:,1);
+                                                                                        latobs = data(:,2);
+                                                                                        S = data(:,3);
+
+                                                                                        % bath is negative in water and positive in air
+
+                                                                                        % compute grid metric
+                                                                                        % pm and pn are in meters^-1
+                                                                                        [lon,lat] = ndgrid(lon,lat);
+                                                                                        [pm,pn] = divand_metric(lon,lat);
+
+                                                                                        % compute mean and anomalies
+                                                                                        Smean = mean(S);
+                                                                                        Sanom = S - mean(S);
+
+                                                                                        % correlation length (in meters)
+                                                                                        len = 200e3;
+
+                                                                                        % signal-to-noise ratio
+                                                                                        lambda = 0.5;
+
+                                                                                        % mask everything below 30 m
+                                                                                        mask = bat < -30;
+
+                                                                                        % mask for the analysis
+                                                                                        % remove Atlantic
+                                                                                        % remove Black Sea
+
+                                                                                        % make analysis
+
+                                                                                        Sa2 = Sa + Smean;
+
+                                                                                        % create plots
+
+                                                                                        % color axis
+                                                                                        ca = [36.2520   39.4480];
+
+                                                                                        % aspect ratio
+                                                                                        ar = [1  cos(mean(lat(:)) * pi/180) 1];
+
+
+                                                                                        subplot(2,1,1);
+                                                                                        scatter(lonobs,latobs,10,S)
+                                                                                        caxis(ca);
+                                                                                        colorbar
+                                                                                        xlim(lon([1 end]))
+                                                                                        ylim(lat([1 end]))
+                                                                                        title('Observations');
+                                                                                        set(gca,'DataAspectRatio',ar,'Layer','top')
+
+                                                                                        subplot(2,1,2);
+                                                                                        hold on
+                                                                                        plot(lonobs,latobs,'k.','MarkerSize',1)
+                                                                                        caxis(ca)
+                                                                                        xlim(lon([1 end]))
+                                                                                        ylim(lat([1 end]))
+                                                                                        title('Analysis');
+                                                                                        set(gca,'DataAspectRatio',ar,'Layer','top')
+
+                                                                                        set(gcf,'PaperUnits','centimeters','PaperPosition',[0 0 15 15 ]);
+                                                                                        print -dpng divand_realistic_example.png
+ + + This example gives the following output: + + [[Image:Divand realistic example.png|800px]] + + + === Determining the parameters === + + The parameter lambda and parameter len are crucial for the analysis. lambda corresponds to the signal-to-noise ratio (variance of background error over variance of observation error). Therefore, its value depends on how accurate and how representative the observations are. The value of len can sometimes be determined by physical arguments. + One statistical way to determine the parameter(s) is to do a [https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29 cross-validation]. + + # choose, at random, a relatively small subset of observations (about 5%). This is the validation data set. + # make the analysis without your validation data set + # compare the analysis to your validation data set and compute the RMS difference + # repeat steps 2 and 3 with different values of the parameters and try to minimize the RMS difference. + + The minimization of the RMS difference can be automated using e.g. the octave/matlab function fminsearch. + You can repeat all steps with a different validation data set to ensure that the optimal parameters values are robust. = Fun = = Fun = - A [http://data-assimilation.net/Tools/divand_demo/html/ educational web application] has been developed to reconstruct a field based on point "observations". The user must choose in an optimal way the location of 10 observation such that the analysed field obtained by divand based on these observation is as close as possible to the original field. + A [http://data-assimilation.net/Tools/divand_demo/html/ educational web application] has been developed to reconstruct a field based on point "observations". The user must choose in an optimal way the location of 10 observations such that the analysed field obtained by divand based on these observation is as close as possible to the original field. = Troubleshooting = = Troubleshooting = Line 185: Line 346: * memory is insufficient, * memory is insufficient, * matrix contains a NaN introduced through metric scale factor equal to zero or a user constraint. * matrix contains a NaN introduced through metric scale factor equal to zero or a user constraint. + + + == ichol failed == + + You get the error: + + Error using ichol + Encountered nonpositive pivot. + + You might try diagonal compensation, e.g. + + icholparam = struct('michol','on','diagcomp',0.0036); + + or the incomplete Cholesky decomposition with threshold dropping (ict) and diagonal compensation: + + icholparam = struct('type','ict','droptol',1e-6,'diagcomp',0.0001); + + To pass the ichol parameters you must call divand like this: + + [fi,s] = divand(mask,{pm,pn,pk},{xi,yi,zi},{x,y,z},f,len,lambda,... + 'diagnostics',1,'primal',1,'inversion','pcg',... + 'tol',1e-6,'maxit',50000,... + 'compPC',@(iB,H,R) divand_pc_michol(iB,H,R,icholparam)); + + At bit of experimentation is necessary. Try to use lowest value of diagcomp (without triggering the error in ichol) and the lowest value of troptol for the ict decomposition (without running out of memory).

## Latest revision as of 16:05, 26 March 2019

divand performs an n-dimensional variational analysis of arbitrarily located observations. Observation will be interpolated on a curvilinear grid in 2, 3 or more dimensions.

Please cite this paper as follows if you use divand in a publication:

Barth, A., Beckers, J.-M., Troupin, C., Alvera-Azcárate, A., and Vandenbulcke, L.: divand-1.0: n-dimensional variational data analysis for ocean observations, Geosci. Model Dev., 7, 225-241, doi:10.5194/gmd-7-225-2014, 2014.

# Requirements

Your need GNU Octave (version 3.4 or later) or MATLAB to run divand.

# NEW

If you are not strongly relying on MATLAB or Octave, you should consider using the newest JULIA version:

This will be the only DIVA version that will be supported/developped at some time.

# Installing

## Octave

pkg install -verbose -forge -auto divand


See "help pkg" and the documentation on installing packages for more options.

## Matlab

The following instructions work also for Octave if you do not want to use the package manager.

addpath('/path/to/divand/inst');


# Testing

A test script test_divand is included to verify the correct functioning of the toolbox. The script should be run in a Octave or MATLAB session. All tests should pass without warning or error.

run test_interp_1d: (max difference=2.22045e-16)   OK
run test_interp_2d: (max difference=2.22045e-16)   OK
run test_interp_regular: (max difference=2.22045e-16)   OK
run test_sparse_diff: (max difference=0)   OK
run test_1dvar:   OK
run test_2dvar: (max difference=4.28659e-16)   OK
run test_2dvar_check: (max rms=2.39788e-16)   OK
run test_2dvar_rellen:   OK
run test_2dvar_lenxy:   OK
run test_2dvar_check_correrr: (max rms=7.45461e-16)   OK
run test_2dvar_constrain: (max rms=9.41909e-10)   OK
run test_2dvar_cyclic: (max rms=2.71352e-14)   OK
run test_2dvar_eof_var:   OK
run test_2dvar_eof_check:   OK
run test_3dvar: (max difference=0.0182573)   OK
run test_3dvar_large_stacked: (max difference=2.40796e-15)   OK
run test_4dvar: (max difference=0.0111807)   OK


# Documentation

The main routine of this toolbox is called divand which performs an n-dimensional variational analysis of arbitrarily located observations. It can be called with a different number of input and output arguments:

 [fi,err,s] = divand(mask,pmn,xi,x,f,len,lambda,...);


## Input arguments

binary mask delimiting the domain. 1 is inside and 0 outside. This parameters has the same number of dimensions and grid points as the domain. For oceanographic application, this is the land-sea mask.
pmn
scale factor of the grid. pmn is a cell array with n arrays (each array has the same size as the mask). Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension.
xi
cell array with n arrays (each array has the same size as the mask). Every element represents a coordinate of the final grid on which the observations are interpolated
x
cell array with n vectors (m-by-1 vector). Every vector represents a coordinate of the observations
f
vector with the value of the observations *minus* the background estimate (m-by-1 array). (see note)
len
correlation length (units must be consistent with scale factors)
• if len is scalar, then the correlation length is the same in all directions and everywhere in the domain
• if len is cell array of n scalars, then the correlation length can depend on the direction but for a given direction the correlation length is the same everywhere in the domain. The option is for example used when analysing 3D domain (longitude, latitude and depth) since the vertical scale is much smaller than the horizontal scale.
• if len is cell array of n arrays (each array has the same size as the mask), then the correlation length can depend on the direction and on the location. The individual arrays should have the same size as the domain.
lambda
signal-to-noise ratio of observations (if lambda is a scalar). lambda is defined as the ratio of background error variance over the observation error variance. The larger this value is, the closer is the interpolated field fi to the observation.
• If lambda is a scalar, then R = 1/lambda I, where R is the observation error covariance matrix (scaled by the background error variance),
• If lambda is a vector, then R = diag(lambda)
• If lambda is a matrix (or matrix-like object such CovarParam), then R = lambda

## Optional input arguments

Optional input arguments specified as pairs of keyword and values:

'velocity', vel
cell array with n arrays (each array has the same size as the mask) representing the velocity of advection constraint. The default is no advection constraint
'alpha'
alpha is vector of coefficients multiplying various terms in the cost function.

The first element multiplies the norm. The other i-th element of alpha multiplies the (i+1)-th derivative. Per default, the highest derivative is m = ceil(1+n/2) where n is the dimension of the problem. The values of alpha are per default the (m+1)th row of the Pascal triangle:

           m=0         1
m=1       1   1
m=1     1   2   1     (n=1,2)
m=2   1   3   3   1   (n=3,4)
...


'diagnostics'
0 or 1 turns diagnostic and debugging information on (1) or off (0, default). If on, they will be returned as the last output argument
'constraint'
a structure with user specified constraint
'moddim'
modulo for cyclic dimension (vector with n elements). Zero is used for non-cyclic dimensions. Halo points should not be included for cyclic dimensions. For example if the first dimension is cyclic, then the grid point corresponding to mask(1,j) should be between mask(end,1) (left neighbord) and mask(2,j) (right neighbord)
'fracdim'
fractional indices (n-by-m array). If this array is specified, then x and xi are not used.

Note: 'velocity' and 'constraint' may appear multiple times

## Output

fi
the analysed field
err
error variance of the analysis field relative to the error variance of the background
s
structure with some diagnostics mainly for debugging purposes.

### Note

If zero is not a valid first guess for your variable (as it is the case for e.g. ocean temperature), you have to subtract the first guess from the observations before calling divand and then add the first guess back in.

An arbitrary number of additional constraint can be included to the cost function which should have the following form:

$J_c(\vec x) = \sum_i \left(\mathbf C_i \vec x - \vec z_i \right)^T \mathbf Q_i^{-1} \left(\mathbf C_i \vec x - \vec z_i \right)$

For every constrain a structure with the following fields is passed to divand:

yo
the vector $\vec z_i$
H
the matrix $\mathbf C_i$
R
the matrix $\mathbf Q_i$ (symmetric and positive defined)

Internally the observations are also implemented as an additional constraint.

## Example

### Simple analytical example

see divand_simple_example.m

% A simple example of divand is 2 dimensions
% with observations from an analytical function

% observations
x = rand(75,1);
y = rand(75,1);
f = sin(x*6) .* cos(y*6);

% final grid
[xi,yi] = ndgrid(linspace(0,1,30));

% reference field
fref = sin(xi*6) .* cos(yi*6);

% all points are valid points

% this problem has a simple cartesian metric
% pm is the inverse of the resolution along the 1st dimension
% pn is the inverse of the resolution along the 2nd dimension

pm = ones(size(xi)) / (xi(2,1)-xi(1,1));
pn = ones(size(xi)) / (yi(1,2)-yi(1,1));

% correlation length
len = 0.1;

% signal-to-noise ratio
lambda = 20;

% fi is the interpolated field

% plotting of results
subplot(1,2,1);
pcolor(xi,yi,fref);
ca = caxis;
set(findobj(gcf,'FontSize',10),'FontSize',16)
title('Reference field and observation loc.');
hold on
plot(x,y,'k.');

subplot(1,2,2);
hold on
pcolor(xi,yi,fi);
caxis(ca);
set(findobj(gcf,'FontSize',10),'FontSize',16)
title('Interpolated field');

set(gcf,'PaperUnits','centimeters','PaperPosition',[0 0 25 12 ]);
print -dpng divand_simple_example.png


### Simple realistic example

A realistic example of divand in 2 dimensions with Salinity observations in the Mediterranean Sea at 30m is provided. Observations can be downloaded from http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/data.dat and the bathymetry is available at http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/diva_bath.nc.



% extract columns of data
lonobs = data(:,1);
latobs = data(:,2);
S = data(:,3);

% bath is negative in water and positive in air

% compute grid metric
% pm and pn are in meters^-1
[lon,lat] = ndgrid(lon,lat);
[pm,pn] = divand_metric(lon,lat);

% compute mean and anomalies
Smean = mean(S);
Sanom = S - mean(S);

% correlation length (in meters)
len = 200e3;

% signal-to-noise ratio
lambda = 0.5;

% mask everything below 30 m

% remove Atlantic
% remove Black Sea

% make analysis

Sa2 = Sa + Smean;

% create plots

% color axis
ca = [36.2520   39.4480];

% aspect ratio
ar = [1  cos(mean(lat(:)) * pi/180) 1];

subplot(2,1,1);
scatter(lonobs,latobs,10,S)
caxis(ca);
colorbar
xlim(lon([1 end]))
ylim(lat([1 end]))
title('Observations');
set(gca,'DataAspectRatio',ar,'Layer','top')

subplot(2,1,2);
hold on
plot(lonobs,latobs,'k.','MarkerSize',1)
caxis(ca)
xlim(lon([1 end]))
ylim(lat([1 end]))
title('Analysis');
set(gca,'DataAspectRatio',ar,'Layer','top')

set(gcf,'PaperUnits','centimeters','PaperPosition',[0 0 15 15 ]);
print -dpng divand_realistic_example.png

This example gives the following output:

### Determining the parameters

The parameter lambda and parameter len are crucial for the analysis. lambda corresponds to the signal-to-noise ratio (variance of background error over variance of observation error). Therefore, its value depends on how accurate and how representative the observations are. The value of len can sometimes be determined by physical arguments. One statistical way to determine the parameter(s) is to do a cross-validation.

1. choose, at random, a relatively small subset of observations (about 5%). This is the validation data set.
2. make the analysis without your validation data set
3. compare the analysis to your validation data set and compute the RMS difference
4. repeat steps 2 and 3 with different values of the parameters and try to minimize the RMS difference.

The minimization of the RMS difference can be automated using e.g. the octave/matlab function fminsearch. You can repeat all steps with a different validation data set to ensure that the optimal parameters values are robust.

# Fun

A educational web application has been developed to reconstruct a field based on point "observations". The user must choose in an optimal way the location of 10 observations such that the analysed field obtained by divand based on these observation is as close as possible to the original field.

# Troubleshooting

Cholesky factorization fails with the following error:

??? Error using ==> chol
CHOLMOD internal error.

Error in ==> CovarIS.factorize at 5
[self.RP, self.q, self.PP] = chol (self.IS);


This might happen if:

• memory is insufficient,
• matrix contains a NaN introduced through metric scale factor equal to zero or a user constraint.

## ichol failed

You get the error:

Error using ichol
Encountered nonpositive pivot.


You might try diagonal compensation, e.g.

icholparam = struct('michol','on','diagcomp',0.0036);


or the incomplete Cholesky decomposition with threshold dropping (ict) and diagonal compensation:

icholparam = struct('type','ict','droptol',1e-6,'diagcomp',0.0001);


To pass the ichol parameters you must call divand like this:

[fi,s] = divand(mask,{pm,pn,pk},{xi,yi,zi},{x,y,z},f,len,lambda,...
'diagnostics',1,'primal',1,'inversion','pcg',...
'tol',1e-6,'maxit',50000,...
'compPC',@(iB,H,R) divand_pc_michol(iB,H,R,icholparam));


At bit of experimentation is necessary. Try to use lowest value of diagcomp (without triggering the error in ichol) and the lowest value of troptol for the ict decomposition (without running out of memory).