Extra

The following script runs in scilab: http://www.scilab.og/
Scilab is free and similar to Matlab.
This script addresses Nick Rowe's hypothetical problem here:

http://worthwhile.typepad.com/worthwhile_canadian_initi/2014/02/principal-agent-and-the-assignment-of-targets-to-instruments.html

Results plotted at the bottom (extra credit problems there too!). Does anybody have a real model of M(.) and F(.) we could try this on (as explained in Nick's post)? Let me know!

Jesse (in Nick's comments) does a great job of explaining in an intuitive way why both m and f need to be solved for and adjusted simultaneously. He makes use of a water control problem with two knobs to fiddle with and two desired outputs (temperature and flow rate). Here's a quote:

"This works well if you know the gains (or elastisities reasonably well) and the same system controls both outputs (c and h) and has responsibility for both measured variables (T and F in this case). You also have to control both synthetic variables simultaneously for it to work."
  
My comments there (beneath Jesse's) and here describe the same thing, except right from the start I assumed we didn't "know the gains" well, and would have to estimate them. What I describe here and show a working example of is a kind of blind estimator, where pretty much all I've assumed is that the system I'm trying to control can be approximated as linear near a nominal operating point, and is reasonably static enough that I can estimate it before it significantly changes. If you're more than happy with Jessie's explanation, I doubt this will do much for you: this is for the extreme nerd... so proceed with caution! Really the only reason I went to so much trouble here was it seemed I couldn't quite get it right in the comments on Nick's blog... and so I started to wonder, what is exactly the "right" approach here. I wanted to make sure what I was talking about could be done, and that I really understood how to do it. At this point, this probably has little to do with Nick's point, and doesn't add much beyond what Jesse already covered. Still though... it does provide a platform to test any of Nick's hypothesis should he do any more conjecturing on this particular set up (though I have a feeling he's finished with this topic).

The basic idea of the simulation that I present here (source code below) is to set up an arbitrary (random) problem like Nick describes and solve it iteratively with a blind search algorithm. Most of the problem is randomly created: the user just picks the problem dimensions under the first block of code under the comment line "control parameters": Nv = length of V, Nm = length of Um, Nf = length of Uf, and Rv = a measurement noise covariance (be sure to pick one that's positive definite!... for example Rv = sqrtRv*sqrtRv' works when sqrtRv = rand(2,2)). Niter is the number of time steps, and Nls_xtra is how many additional time steps worth of data is collected (above the minimum) before the first estimates of the system matrices are made (the system matrices are A and b, and they are randomly selected before the time iterations begin). A is dimension 2x(2+Nv+Nm+Nf), b is 2x1. I put Nick's control variables in x1: x1(1) = m and x1(2) = f. While x2 consists of the vector [V' Um' Uf']'. An initial guess for x1 is made (randomly) and the vector x2 is a random function of time. I also add in measurement noise v with covariance Rv. I assume I can see "shocks" V, Um, and Uf perfectly. The basic idea is it's a 1st order blind search algorithm, in which I assume a linear (really affine) structure to the underlying functions M(.) and F(.). Note that y = [M F]' is my output (note that a single quote ' denotes transpose, in all that follows: thus y is 2x1). The system looks like this:

y = A*x + b

Where x = [x1' x2']'. The objective is to minimize the magnitude of the error vector, which is the difference between between the desired output, ystart = [M* F*]', and the actual output, y = [M F]'. ystar is selected randomly before the time iterations begin.

Essentially the 1st row of the above system equation is an affine approximation of the function M(.), and the second row is an affine approximation of F(.), but I have no knowledge of the parameters of either apriori (other than a few elements of A that I know are zero based on Nick's description: essentially M is not a function of Uf, and likewise F is not a function of Um). Also, my measurements are corrupted by additive white noise v (I discuss why I added this noise later).

z = y - ystar + v

My strategy is to continuously refine a least squares estimate of A and b, and use those estimates to then select x1 so as to minimize the power of the error vector, z. I also calculate a history of true errors Z to plot:

Z = y - ystar

This all sounds more complicated than it is. Really I can summarize the algorithm in just three simple steps, to be repeated for each time iteration:
  1. Make a noise corrupted measurement y = [M F]' + v, on the true system, where v is the noise
  2. Use y to refine estimates of A and b
  3. Use A & b estimates & knowledge of V, Um & Uf to select x1 = [m f]' such that we predict it will zero out the error signal: y - ystar = [(M-M*) (F-F*)]'  ...(of course it probably wont really zero it out because our estimates of A and b are not perfect).
I know Nick didn't mention noise, but the problem is too easy without it, plus it ensures that we always have model uncertainty, since we start with very little apriori information, and must discover the model parameters adaptively.What fun is it if we don't have model uncertainty? :D

I believe this basic approach will work even if the underlying system is not perfectly static and not perfectly linear, but I'd probably need to update my least squares estimation to fade the old data away, and weight the newer data more heavily for it to work acceptably well under those circumstances. That weighting would be a tuning parameter (a fading parameter for old data). If there are too many dynamics or too much non-linearity it will almost certainly cause the solver to fail though, regardless of how any tuning parameters are set. The way it's programmed now, the truth system is perfectly static and linear, and all measurements are given equal weight by the system estimator. I have some "extra credit" questions at the bottom of this page addressing this.

This is a very general type of solver, and it would be interesting to see if it can be tuned to work on a realistic system of M(.) and F(.) (maybe like Nick imagines), with realistic V, Um, and Uf "shock" vector functions.

Nick describes in his post a procedure for minimizing the error in first M and then F (by twiddling first m only, and then f only, or vice versa), as if these steps could be taken independently. I don't believe this will work very well since M and F are explicitly coupled together, and thus m and f must be selected at the same time: it's a simultaneous system of equations. Maybe I'll try to dream up an example where solving first one then the other never converges.

Have fun!

// script to simulate Nick Rowe's problem and a solver

// control parameters:
rand("normal"); // set distribution to normal
Nv = 1; // dimention of V
Nm = 1; // dimention of Um
Nf = 1; // dimention of Uf
Rv = (0.01)^2*eye(2); // additive measurement noise covariance
Niter = 56; // number of time iterations
Nls_xtra = 5; // number of extra measurments for LS estimate

// start the run

// dimensions of system:
Nx2 = Nv+Nm+Nf;
Nx1 = 2;
Nx = Nx1+Nx2;
// Build our system matrices:
A = rand(2,Nx);
// zero out parts of A:
A(1,$-Nf+1:$) = 0;
A(2,4:4+Nm-1) = 0;
b = rand(2,1);
// objective: ystar = [M* F*]'
ystar = rand(2,1); 
// Gv is generator matrix for addivite noise:
Gv = chol(Rv)';
// Calculate number iter for least squares estimate of A & b:
Nls_min = max(Nm,Nf) + Nv + Nx1 + 1;
// Adjust to overdetermine system:
Nls = Nls_min + Nls_xtra;
// initial guess:
x1_0 = rand(2,1); // [m0 f0]'

// init least squares (LS) estimation variables:
sm = zeros(1,1);
sf = zeros(1,1)
Hm = zeros(1,Nx-Nf+1);
Hf = zeros(1,Nx-Nm+1)

// initialize some loop variables:
k = 0;
p = 1;
x1 = x1_0;
x2_0 = zeros(Nx-2,1);
best0 = zeros(2,1);
best = best0;
db = zeros(2,1);
start = 0;
Z = zeros(2,1); // error vector

// Iterate through the loop: time steps
for n=1:Niter
    V = rand(Nv,1);
    Um = rand(Nm,1);
    Uf = rand(Nf,1);
    x2 = [V;Um;Uf];
    dx2 = x2-x2_0;
    x2_0 = x2;
    // perturb m and f one at a time (for LS estimate of A & b)
    if n==2 then
        x1(1) = x1(1) + 1;
    elseif n==3 then
        x1(1) = x1_0(1);
        x1(2) = x1(2) + 1;
    elseif n==4
        x1 = x1_0;
    elseif n > Nls;
        // A and b estimated, correct for V, Um, Uf, and delta b:
        dx1 = -A1est\(A2est*dx2+db);
        x1 = x1 + dx1;
    end
    
    // calculate the true error:
    x = [x1;x2];
    v = Gv*rand(2,1);
    y = A*x + b;
    z = y-ystar; // z is true error
    
    // if start flag set, record true error:
    if start then
        Z(:,p) = z;
        p = p+1;
    end
    
    // add measurement noise to true error:
    z = z + v;
    
    // Add rows to LS estimate vector & matrix:
    sm(k+1) = z(1)+ystar(1);
    sf(k+1) = z(2)+ystar(2);
    Hm(k+1,:) = [x1' V' Um' 1];
    Hf(k+1,:) = [x1' V' Uf' 1];
    k = k + 1;

    // When we have enough data, do LS estimate of A & b:
    if n >= Nls_min
       // LS estimation: each row separately:
       LS_est1 = Hm\sm;
       LS_est2 = Hf\sf;
       // Build the top row and bottom row estimates of A:
       Aest1 = [LS_est1(1:$-1)' zeros(1,Nf)];
       Aest2 = [LS_est2(1:2)' LS_est2(3:3+Nv-1)' zeros(1,Nm) LS_est2($-Nf:$-1)'];
       best0 = best;
       Aest = [Aest1;Aest2];
       // Separate A into A1 and A2 estimates:
       A1est = Aest(1:2,1:2);
       A2est = Aest(:,3:$);
       // Build b estimate
       best = [LS_est1($);LS_est2($)];
       if n==Nls_min then
           // set flag to begin searching for solution:
           disp('start')
       end
    end
    
    if n > Nls_min
       // Correct for remaining error, z:
       db = best-best0;
       dx1 = -A1est\z;
       x1 = x1 + dx1;    
       start = 1;      
    end
end

// Plot the RMS error vs time:
err = sqrt(sum(Z.*Z,1));
t = [0:length(err)-1];
figure(1), plot2d("nl",t',err');
 



 Extra credit:

1. Clean up the unnecessary and non-optimal parts of simulation (hint: do I really need to correct for z at the bottom?)
2. Add small dynamics (e.g. linear) and non-linearities (e.g. quadratic) to the truth model. E.g. for dynamic slowly
  move true A and b between {A0,b0} and {A1,b1} along a straight path. 
3. See if you can adapt the linear blind search model to be robust to 2 (hint weight the rows of s and H).
4. Add better output (animated perhaps?). Visuals on how well we're estimating A & b perhaps?
5. Anybody have a real model of M(.) and F(.) we could try it on? I'd need V, Um, and Uf too (as functions of time)
6. Add the ability to exploit our knowledge of Rv (if we have it). (hint: affects LS problem). What if there's a mismatch?
6.a: We could also add an ability to exploit time (not just spacial) correlations in v (i.e. v is non-white).
  (hint: think off diagonal elements of an expanded Rv, like you'd need to use in the LS problem anyway) 
7. Make practical by limiting the growth of the size of the least squares problem (size of s and H variables)
8. Hobble the model a-la' Nick Rowe's post, and see if you can still make it work. If not why not? If not all cases, which cases?
9. Can you reformulate this as a Kalman Filter / optimal control problem? But one in which I need to estimate
  the model? Or are the 'states' I'm estimating really the elements of A and b? What are their dynamics? Does this
  approach help with limiting the size of expanded versions of Rv?
10. Add noise to V, Um, and Uf: perhaps this isn't necessary: perhaps that's already captured by v 
 
 
Note: I just tried a version in which I calculate x1 = -A1est\(A2est*x2 + best), and don't do any update with z (other than for the estimation
of A and b). It had an error which dropped, but not as far, and then the error stayed almost flat. Not sure why!
 
Note2: I wrote something very much like this one time to estimate the phase and magnitude of a series of taps in a set of equalizers to 
minimze out of band power for a cell base station power amplifier. The principal was the same: blind search... we resorted to it when
all of our fancy methods using information we supposedly could estimate about the various channels in the amplifier seemed to fail.
The blind search method was better and did a better job of keeping the out of band power (adjacent channel power) to a minimum.
 
Note3: How many other non-economists out there would love to see some of Nick's other posts coded up? Maybe it's not possible or
advisable, but I'm often stymied by use of nomenclature I really don't understand. I have a frustrating feeling though, that if I did understand
what was being discussed, it may very well be possible to code up an example. If numerical solutions are required, fine. There often seems to
be a lot of speculation about what will happen, given such and such a model. Are there any instances in which we can simple code it up
and find out? I'd love to know. If nothing else, it would give me a much better feeling for the mechanics of what's happening. 
 
----------------------------------------------------------------------------------------------------
 
Now I've written another script to check the assertion I make in a further comment here:

I actually worked the math out first, but then I wrote a scilab script to verify. If you're 
interested in the math, let me know. I'm hesitant to say that the condition here on A
(you can think of A as a matrix of partial derivatives... or the "Jacobian" I think it's called) is both
necessary and sufficient, because there are probably some extra smoothness conditions
you'd need to state on both M(.) and F(.), but if they are "sufficiently smooth" then I believe
this is an iff (if and only if) condition on the partials. I just don't know how to specify what
"sufficiently smooth" means. I ran the script a number of times without any failures to predict
convergence (each run tries 1000 different random A, y, and initial x cases). At first I did
have some errors (thus the extra code to try to plot what happens in one of those cases), but
I soon discovered my error: the x(1) part of the initial guess doesn't matter, so I should check
convergence by comparing the starting and ending errors after the iterations (I do 50 iterations).
Two iterations should actually be sufficient rather than 50. Sometimes the convergence
or divergence is so slow, that it didn't recover from choosing an incorrect initial error, and then I'd 
get a few failures over the 1000 trials. But after replacing err0 with err_vec(1) I don't ever seem
to fail to predict convergence now. 
 
Also note that the converging cases and the non-converging cases will switch if we change the
order of which solution we try to find first: i.e. if we try to find f* first and then m*. 
 
Here's a "proof" requested by JW Mason on Nick's blog:
 
Y = A*X
Y and X are 2x1 and A is 2x2 with elements aij and X with elements xi and Y with yi. Solving 1 at a time (we'll start w/ the first row). I'll add a time subscript to the elements of X
x1_1 = (y1 - a12*x2_0)/a11 = y1/a11 - x2_0*a12/a11
Then we solve for x2
x2_1 = (y2 - a21*x1_1)/a22
Now back to x1_2
x1_2 = y1/a11 - y2/(a11*a22) + x1_1*a21*a12/(a11*a22) = K + G*x1_1 K = y1/a11 - y2/(a11*a22) G = a21*a12/(a11*a22)
Thus x1_(1+n) = K*G^0 + K*G^1 + K*G^2 + ... + K*G^(n-1) + (G^n)*x1_1
You can see right away this only converges if |G| < 1
A similar statement can be made for x2_(1+n). It's also straight forward to prove that if |G| < 1, then indeed these sequences converge to:
X_infinity = inverse(A)*Y
Write out the equation for x1_infinity... it's the same as for x1_(1+n) but never terminates. Then K + x1_infinity*G = x1_infinity which implies K/(1-G) = x1_infinity. Write out K/(1-G) ... I think you'll find it's the same as the top row for inverse(A)*Y
Now do the same for x2_infinity. QED
 
// script to check a condition on matrix A for to ensure that iterative independent 
// solutions to y = A*x for the elements of x will eventually converge: I'm supposing
// that the condition is that |a12*a21| < |a11*a22| (A is a 2x2)
rand("normal");

// Control params:
Ntrials = 1000; // number of matrices to check
Niter = 50; // number of iteration to check convergence

// Run:

num_failure = 0;
err_vec = zeros(Niter,1);

for k=1:Ntrials
    A = rand(2,2);
    y = rand(2,1);
    x0 = rand(2,1);
    x = x0;
    true_x = A\y;
    err0 = true_x-x;
    
    for n=1:Niter
        x(1) = (y(1)-A(1,2)*x(2))/A(1,1);
        x(2) = (y(2)-A(2,1)*x(1))/A(2,2);
        err_vec(n) = norm(true_x - x);
    end
    
    err1 = true_x-x;
    
    converge_pred = abs(A(1,1)*A(2,2)) > abs(A(1,2)*A(2,1));
    converge_check = err_vec(1) > err_vec($);
    
    if converge_pred ~= converge_check then
        num_failure = num_failure+1;
        AA = A;
        yy = y;
        xx = x;
        err00 = err0;
        err11 = err1;
        check_meas = abs(A(1,1)*A(2,2)) - abs(A(1,2)*A(2,1));
        err_vec0 = err_vec;
        converge_pred0 = converge_pred
        converge_check0 = converge_check
    end
end

// How did we do?
Ntrials
num_failure

// If we had any failures to predict the convergence, investigate why
// on the last failure
if num_failure > 0 then
    check_meas
    norm(err00)
    norm(err11)
    converge_pred0
    converge_check0
    figure(1), plot(err_vec0)
end
 
 
  

No comments:

Post a Comment