$title Cross Entropy SAM Estimation (CESAM2,SEQ=393) $onText CESAM2 illustrates a cross entropy technique for estimating the cells of a consistent SAM assuming that the initial data are inconsistent and measured with error. The method is applied to estimate the macro SAM used in CESAM.GMS. Cell elements, some macro control totals, and row and column totals are assumed to be measured with error. We assume that the user can specify prior estimates of the values and standard errors of measurement for the cell values, macro control totals, and row and column sums. The original version of this code, CESAM.GMS, assumed that the SAM column coefficients, A(i,j) = SAM(i,j)/SUM(i, SAM(i,j)), are treated as analogous to probabilities and are included directly in the cross-entropy minimand. In this version each SAM element, SAM(i,j), is assumed to be measured with error, and all the errors are treated as probability-weighted sums of error support sets. Only probabilities are included in the cross-entropy minimand, which is consistent with the information-theoretic Bayesian approach to estimate probabilities. The cost of this approach is that in CESAM2 there are many more probabilities to be estimated. However, new solution algorithms are able to solve large problems of this type, so size is no longer a serious constraint. In the estimation procedure, we assume prior information on either: (1) values of cells, SAM(i,j), or (2) coefficients, A(i,j). Errors can be treated as either: (1) additive [e.g., SAM(i,j) = sam0(i,j) + err(i,j)], or (2) multiplicative [e.g., A(i,j) = abar0(i,j)*EXP(err(i,j))] where sam0(i,j) and abar0(i,j) are prior values of the cell value or coefficient, and err(i,j) is the estimated measurement error. In the first case, the prior mean of the errors is assumed to be zero. In the second case, it will be one. In the first case, it is possible for the posterior estimated cell value to change sign from the prior, while in the second case the posterior estimated coefficient value can never change sign. In the code below, we assume a prior on coefficients measured with multiplicative errors for selected SAM accounts defined by the set acoeff(i). Note that it is important to scale the SAM. Ideally, the SAM being estimated should be scaled so that it does not contain values larger than about 1e3. Note also that by default we use the GAMS intrinsic function centropy() in the objective definition. If you define NOCENTROPY (e.g. by running with --NOCENTROPY = 1 on the command line) the cross-entropy function is written explicitly using logs, etc. References: Robinson, S, Cattaneo, A, and El-Said, M, Updating and Estimating a Social Accounting Matrix Using Cross Enthropy Methods. Economic System Research 13, 1 (2001). Golan, G, Judge, G, and Miller, D, Maximum Enthropy Econometrics. John Wiley and Sons, 1996. Judge, George G. and Ron C. Mittelhammer, An Information Theoretic Approach to Econometrics. Cambridge: Cambridge University Press, 2012. Programmed by Sherman Robinson, April 2013 Environment and Production Technology Division and Development Strategy and Governance Division International Food Policy Research Institute (IFPRI) 2033 K Street, N.W. Washington, DC 20006 USA Email: S.Robinson@CGIAR.ORG Earlier version, CESAM, programmed by Sherman Robinson and Moataz El-Said, November 2000. Original version programmed by Sherman Robinson and Andrea Cattaneo. Keywords: nonlinear programming, micro economics, cross entropy, social accounting matrix $offText * by default use centropy(), uncomment the line below to use logs explicitly * $set NOCENTROPY 1 Set i 'SAM accounts' / ACT 'Activities', COM 'Commodities' FAC 'Factors', ENT 'Enterprises' HOU 'Households', GOV 'Govt recurrent expenditures' GIN 'Govt investment', CAP 'Capital account' ROW 'Rest of world', TOTAL 'Row and column totals' / icoeff(i,i) 'SAM elements whose prior is specified as coefficients' ival(i,i) 'SAM elements whose prior is specified as values' NONZERO(i,i) 'SAM elements that can be nonzero and hence estimated' ii(i) 'all accounts in i except total' macro 'macro controls' / gdpfc2, gdp2 / * The set jwt defines the dimension of the support set for the error * distribution and the number of weights that must be estimated for each * error. In this case, we specify an uninformative prior for jwt1, * a normal prior for jwt2, and a general two-parameter distribution for jwt3. jwt 'master set of possible weights' / 1*7 / jwt1(jwt) 'set of weights for errors in column sums' / 1*7 / jwt2(jwt) 'set of weights for errors in macro totals' / 1*5 / jwt3(jwt) 'set of weights for errors in cell elements' / 1*3 /; Alias (i,j), (ii,jj); ii(i) = yes; ii("Total") = no; Parameter stderr1 'standard error of measurement for column sums' / .05 / stderr2 'standard error of measurement for macro totals' / .05 / stderr3 'standard error of measurement for cell elements' / .25 / scalesam 'scale factor for scaling initial SAM' / 1e3 / delta 'small number for CE objective function' / 1e-8 /; * Prior unbalanced proto-SAM * ######################## SAM DATABASE ######################## * The SAM is unbalanced by adding new rows with bad data Table SAM(i,j) 'prior unbalanced social accounting matrix' ACT COM FAC ENT ACT 0.0 14827.4240 0.0 0.0 COM 7917.5040 0.0 0.0 0.0 FAC 9805.4140 0.0 0.0 0.0 ENT 0.0 0.0 3699.7060 0.0 * HOU 0.0 0.0 6031.3080 3417.5060 HOU 6000.0000 3300.0000 GOV 733.6000 357.4000 74.4000 165.2000 GIN 0.0 0.0 0.0 0.0 CAP 0.0 0.0 0.0 150.0000 ROW 0.0 5573.8150 0.0 0.0 Total 18456.5180 20758.639 9805.414 3732.706 + HOU GOV GIN CAP ACT 2101.0490 -0.3270 0.0 0.0 * COM 6753.3320 1764.5000 2118.5000 2197.7980 COM 6953.3320 1564.5000 2518.5000 2597.7980 FAC 0.0 0.0 0.0 0.0 ENT 0.0 33.0000 0.0 0.0 HOU 0.0 29.6000 0.0 0.0 GOV 139.5000 0.0 0.0 0.0 GIN 0.0 0.0 0.0 0.0 CAP 649.1560 -356.6730 -406.2000 0.0 ROW 0.0 0.0 0.0 0.0 Total 9643.037 1470.1 1712.3 2197.798 + ROW Total ACT 1488.1570 18416.303 COM 0.0 20751.634 FAC 0.0 9805.414 ENT 0.0 3732.706 * HOU 209.5010 9687.915 HOU 200.0000 9687.915 GOV 0.0 1470.1 GIN 1712.3000 1712.3 CAP 2163.8570 2200.14 ROW 0.0 5573.815 Total 5573.815; Parameter SAM0(i,j) 'unbalance prior or proto-SAM transactions matrix' SAMBALCHK(i) 'column sums minus row sums in the SAM' Abar0(i,j) 'prior SAM coefficient matrix' ColSum0(i) 'targets for macro SAM column totals' macrov0(macro) 'target values for macro aggregates' vbar1(i,jwt) 'error support set 1 for column sums' vbar2(macro,jwt) 'error support set 2 for macro aggregates' vbar3(i,j,jwt) 'error support set 3 for SAM elements' wbar1(i,jwt) 'weights on error support set 1 for column totals' wbar2(macro,jwt) 'weights on error support set 2 for macro aggregates' wbar3(i,j,jwt) 'weights on error support set 3 for SAM elements' sigmay1(i) 'prior standard error of column sums' sigmay2(macro) 'prior standard error of macro aggregates' sigmay3(i,j) 'prior standard error of SAM elements' * macro control totals gdp0 'base GDP' gdpfc0 'base GDP at factor cost' gdp00 'GDP from final SAM' gdpfc00 'GDP at factor cost from final SAM'; * ################# Initializing Parameters ################# SAM("TOTAL",jj) = sum(ii, SAM(ii,jj)); SAM(ii,"TOTAL") = sum(jj, SAM(ii,jj)); * Divide SAM entries by scalesam for better scaling. * The SAM is scaled to enhance solver efficiency. Nonlinear solvers are * more efficient if variables are scaled to be around 1. SAM(i,j) = SAM(i,j)/scalesam; abar0(ii,jj)$SAM("TOTAL",jj) = SAM(ii,jj)/SAM("TOTAL",jj); SAM0(ii,jj) = SAM(ii,jj); SAM0("TOTAL",jj) = sum(ii, SAM(ii,jj)); SAM0(ii,"TOTAL") = sum(jj, SAM(ii,jj)); SAMBALCHK(jj) = SAM0('TOTAL',jj) - SAM0(jj,'TOTAL'); display abar0, sam0, sambalchk; * ######################## CROSS ENTROPY ############################## Parameter NegSam(i,j) 'negative SAM values' chkset(i,j) 'check coefficient and value sets'; * identify negative SAM entries for information NegSam(i,j)$(SAM0(i,j) < 0) = SAM(i,j); * Define set of elements of SAM that can be nonzero. In this case, only * elements which are nonzero in initial SAM. NONZERO(ii,jj)$(Abar0(ii,jj)) = yes; * SAM cells with priors on coefficients. We will also assume they have * multiplicative errors. Set acoeff(i) 'accounts with prior on column coefficients' / act, fac, ent, hou /; icoeff(ii,acoeff)$NONZERO(ii,acoeff) = yes; ival(ii,jj)$(SAM0(ii,jj) and (not icoeff(ii,jj))) = yes; chkset(ii,jj) = 1$ival(ii,jj) + 1$icoeff(ii,jj) - 1$NONZERO(ii,jj); display icoeff, ival, chkset; * Note that target column sums are being set to average of initial * row and column sums. Initial column sums or other values * could have been used instead, depending on knowledge of data quality * and any other prior information. ColSum0(ii) = (sam(ii,"total") + sam("total",ii))/2; gdpfc0 = sam("fac","act"); gdp0 = sam("fac","act") + sam("gov","act") - sam("act","gov") + sam("gov","com"); macrov0("gdp2") = gdp0; macrov0("gdpfc2") = gdpfc0; display negsam, ColSum0, gdpfc0, gdp0, macrov0; $onText *############### Define prior distributions of errors ##################### Start from assumed prior knowledge of the means and standard errors of measurement of the cell elements, macro aggregates, and column sums. Below, we assume that all column sums and macro aggregates have standard errors set in stderr1, stderr2, and stderr3. These are Bayesian priors, not a maintained hypothesis. The estimated error is weighted sum of elements in an error support set: ERR(ii) = sum(jwt, W(ii,jwt)*VBAR(ii,jwt)) where the W's are estimated in the CE procedure and the support set, VBAR, is specified to span the possible domain of the errors. The prior mean (zero) and variance of these errors is given by: 0 = sum(jwt, WBAR(ii,jwt)*VBAR(ii,jwt)) and (sigmay(ii))**2 = sum(jwt, WBAR(ii,jwt)*(VBAR(ii,jwt))**2) where the WBAR's are the priors on the probability weights. The VBARs are chosen to define a domain for the support set of +/- 3 standard errors. The prior on the weights, WBAR, are then calculated to yield the specified prior on the standard error, sigmay. In Robinson, Cattaneo, and El-Said (2001), we specify prior weights (WBAR) that are "uninformative", given by a uniform distribution, and set the prior standard errors by the choice of support set, VBAR. In that paper, we use a three-weight specification (jwt /1*3/) with uniform prior weights. Our current practice to specify an uniformative prior is to use a seven-element support set with uniform prior weights, WBAR. To illustrate, we specify an uninformative prior for col/row sums below We assume two possible "informative" priors: (1) A general two-parameter distribution with priors on the mean (zero) and variance (sigmay**2). This prior requires a three-element support set. We specify this prior for SAM elements with either additive or multiplicative errors. (2) A normal distribution with priors on the mean (zero), variance (sigmay**2), skewness (zero), and kurtosis (3*sigmay**4). This prior requires a five-element support set. We specify this prior for macro totals. We solve for the prior weights (wbar) given the specification of the prior distribution and choice of the support set values (vbar) For example, for the prior of a normal distribution, we assumes a prior mean of zero, skewness of zero, and a prior value of kurtosis consistent with a prior normal distribution with mean zero and variance of sigmay**2. In this case, kurtosis equals 3*sigmay**4. To specify a four-parameter distribution (mean, variance, skewness, kurtosis) requires a five-weight support set. The prior weights (wbar) are specified so that: sum(jwt, wbar(ii,jwt)*vbar(ii,jwt)**4) = 3*sigmay(ii,jwt)**4 as well as defining the variance as above, a mean of zero, and skewness (third moment) of zero. These equations suffice to determine the values of the prior weights (wbar). The choice of +/- 1.5 standard error for vbar(ii,"2") and vbar(ii,"4") is arbitrary, but it is convenient to have equal spacing of the error support set. These are priors, not maintained hypotheses. The actual moments are estimated as part of the estimation procedure. $offText * Set standard deviation for errors on column/row totals sigmay1(ii) = stderr1*ColSum0(ii); * Set constants for 7-weight error distribution (uninformative uniform prior) vbar1(ii,"1") = -3*sigmay1(ii); vbar1(ii,"2") = -2*sigmay1(ii); vbar1(ii,"3") = -1*sigmay1(ii); vbar1(ii,"4") = 0; vbar1(ii,"5") = 1*sigmay1(ii); vbar1(ii,"6") = 2*sigmay1(ii); vbar1(ii,"7") = 3*sigmay1(ii); wbar1(ii,jwt1) = 1/7; * Set standard deviation for errors on macro aggregates sigmay2(macro) = stderr2*macrov0(macro); * Set constants for 5-weight error distribution (normal prior) vbar2(macro,"1") = -3 *sigmay2(macro); vbar2(macro,"2") = -1.5*sigmay2(macro); vbar2(macro,"3") = 0; vbar2(macro,"4") = 1.5*sigmay2(macro); vbar2(macro,"5") = 3 *sigmay2(macro); wbar2(macro,"1") = 1/162; wbar2(macro,"2") = 16/81; wbar2(macro,"3") = 48/81; wbar2(macro,"4") = 16/81; wbar2(macro,"5") = 1/162; * Set constants for 3-weight error distribution (2-parameter prior) loop((ii,jj)$NONZERO(ii,jj), * Set standard deviation for errors on cell values or coefficients * Additive errors sigmay3(ii,jj)$ival(ii,jj) = stderr3*ABS(sam0(ii,jj)); * Multiplicative errors sigmay3(ii,jj)$icoeff(ii,jj) = stderr3; vbar3(ii,jj,"1") = -3*sigmay3(ii,jj); vbar3(ii,jj,"2") = 0; vbar3(ii,jj,"3") = 3*sigmay3(ii,jj); wbar3(ii,jj,"1") = 1/18; wbar3(ii,jj,"2") = 16/18; wbar3(ii,jj,"3") = 1/18; * end ii,jj loop ); display vbar1, vbar2, vbar3, sigmay1, sigmay2, sigmay3; Variable A(i,j) 'posterior SAM coefficient matrix' TSAM(i,j) 'posterior matrix of SAM transactions' MACROV(macro) 'macro aggregates' Y(i) 'row sum of SAM' ERR1(i) 'error value on column sums' ERR2(macro) 'error value for macro aggregates' ERR3(i,j) 'error value for SAM elements' W1(i,jwt) 'error weights for column sums' W2(macro,jwt) 'error weights for macro aggregates' W3(i,j,jwt) 'error weights for cell elements' DENTROPY 'entropy difference (objective)'; * ########################## INITIALIZE VARIABLES ################## A.L(ii,jj) = Abar0(ii,jj); TSAM.L(ii,jj) = sam0(ii,jj); Y.L(ii) = ColSum0(ii); MACROV.L(macro) = macrov0(macro); ERR1.L(ii) = 0.0; ERR2.L(macro) = 0.0; ERR3.L(ii,jj)$NONZERO(ii,jj) = 0.0; W1.L(ii,jwt) = wbar1(ii,jwt); W2.L(macro,jwt) = wbar2(macro,jwt); W3.L(ii,jj,jwt)$NONZERO(ii,jj) = wbar3(ii,jj,jwt); DENTROPY.L = 0.0; * ############ CORE EQUATIONS ############ Equation ROWSUMEQ(i) 'rowsum with error' ROWSUM(i) 'row sums' COLSUM(j) 'column sums' SAMCOEF(i,j) 'define SAM coefficients' TSAMEQ(i,j) 'SAM elements in values' ASAMEQ(i,j) 'SAM coefficients' GDPFCDEF 'define GDP at factor cost' GDPDEF 'define GDP at market prices' MACROEQ(macro) 'macro aggregates with error' ERROR1EQ(i) 'definition of error term 1' ERROR2EQ(macro) 'definition of error term 2' ERROR3EQ(i,j) 'definition of error term 3' SUMW1(i) 'sum of weights 1' SUMW2(macro) 'sum of weights 2' SUMW3(i,j) 'sum of weights 3' ENTROPY 'entropy difference definition'; * Row and column sums estimation and balance ROWSUMEQ(ii).. Y(ii) =e= ColSum0(ii) + ERR1(ii); ROWSUM(ii).. sum(jj, TSAM(ii,jj)) =e= Y(ii); COLSUM(jj).. sum(ii, TSAM(ii,jj)) =e= Y(jj); * Estimating SAM elements from prior values or coefficients SAMCOEF(ii,jj)$NONZERO(ii,jj).. TSAM(ii,jj) =e= A(ii,jj)*Y(jj); TSAMEQ(ii,jj)$IVAL(ii,jj).. TSAM(ii,jj) =e= sam0(ii,jj) + ERR3(ii,jj); ASAMEQ(ii,jj)$ICOEFF(ii,jj).. A(ii,jj) =e= abar0(ii,jj)*EXP(ERR3(ii,jj)); * Macro aggregates measured with error GDPFCDEF.. MACROV("gdpfc2") =e= TSAM("fac","act"); GDPDEF.. MACROV("gdp2") =e= TSAM("fac","act") + TSAM("gov","act") - TSAM("act","gov") + TSAM("gov","com"); MACROEQ(macro).. MACROV(macro) =e= macrov0(macro) + ERR2(macro); * Definition of errors as probability weighted sums of support sets ERROR1EQ(ii).. ERR1(ii) =e= sum(jwt1, W1(ii,jwt1)*vbar1(ii,jwt1)); ERROR2EQ(macro).. ERR2(macro) =e= sum(jwt2, W2(macro,jwt2)*vbar2(macro,jwt2)); ERROR3EQ(ii,jj)$NONZERO(ii,jj).. ERR3(ii,jj) =e= sum(jwt3, W3(ii,jj,jwt3)*vbar3(ii,jj,jwt3)); * Probabilities must sum to one SUMW1(ii).. sum(jwt1, W1(ii,jwt1)) =e= 1; SUMW2(macro).. sum(jwt2, W2(macro,jwt2)) =e= 1; SUMW3(ii,jj)$NONZERO(ii,jj).. sum(jwt3, W3(ii,jj,jwt3)) =e= 1; $ifThen set NOCENTROPY * Cross-entropy objective function, explicit version ENTROPY.. DENTROPY =e= sum((ii,jj,jwt3)$nonzero(ii,jj), W3(ii,jj,jwt3)*(log(W3(ii,jj,jwt3) + delta) - log(wbar3(ii,jj,jwt3) + delta))) + sum((ii,jwt1), W1(ii,jwt1) * (log(W1(ii,jwt1) + delta) - log(wbar1(ii,jwt1) + delta))) + sum((macro,jwt2), W2(macro,jwt2) * (log(W2(macro,jwt2) + delta) - log(wbar2(macro,jwt2) + delta))); $else * Objective function using GAMS cross-entropy intrinsic function, CENTROPY ENTROPY.. DENTROPY =e= sum[(ii,jj,jwt3)$nonzero(ii,jj), CENTROPY(W3(ii,jj,jwt3),wbar3(ii,jj,jwt3))] + sum[(ii,jwt1), CENTROPY(W1(ii,jwt1),wbar1(ii,jwt1))] + sum[(macro,jwt2), CENTROPY(W2(macro,jwt2),wbar2(macro,jwt2))]; $endIf * Define bounds for cell values and fix variables not * included in the estimation A.fx(ii,jj)$ (not nonzero(ii,jj)) = 0; TSAM.fx(ii,jj)$(not nonzero(ii,jj)) = 0; * Upper and lower bounds on the error weights W1.lo(ii,jwt1) = 0; W1.up(ii,jwt1) = 1; W2.lo(macro,jwt2) = 0; W2.up(macro,jwt2) = 1; W3.lo(ii,jj,jwt3)$NONZERO(ii,jj) = 0; W3.up(ii,jj,jwt3)$NONZERO(ii,jj) = 1; W3.fx(ii,jj,jwt3)$(not nonzero(ii,jj)) = 0; Model SAMENTROP / all /; option limRow = 100, limCol = 0, solPrint = on, domLim = 100; SAMENTROP.holdFixed = 1; solve SAMENTROP using NLP minimizing dentropy; * Parameters for reporting results Parameter Macsam1(i,j) 'assigned new balanced SAM flows from CE' Macsam2(i,j) 'balanced SAM flows in original units' percent1(i,j) 'percent change of new SAM from original SAM' Diffrnce(i,j) 'differnce btw original SAM and final SAM in values'; macsam1(ii,jj) = TSAM.L(ii,jj); macsam1("total",jj) = sum(ii, macsam1(ii,jj)); macsam1(ii,"total") = sum(jj, macsam1(ii,jj)); macsam2(i,j) = macsam1(i,j)*scalesam; percent1(i,j)$(sam0(i,j)) = 100*(macsam1(i,j) - sam0(i,j))/sam0(i,j); Diffrnce(i,j) = macsam1(i,j) - sam0(i,j); SAMBALCHK(jj) = TSAM.L('TOTAL',jj) - TSAM.L(jj,'TOTAL'); display sam0, macsam1, SAMBALCHK, Diffrnce, percent1, macsam2, dentropy.l; gdp00 = macsam1("fac","act") + macsam1("gov","act") - macsam1("act","gov") + macsam1("gov","com"); gdpfc00 = macsam1("fac","act"); display gdp0, gdp00, gdpfc0, gdpfc00, macrov0, macrov.l; Parameter ANEW(i,j); * print some stuff ANEW(ii,jj) = A.l(ii,jj); ANEW("total",jj) = sum(ii, A.L(ii,jj)); ANEW(ii,"total") = sum(jj, A.L(ii,jj)); ABAR0("total",jj) = sum(ii, ABAR0(ii,jj)); ABAR0(ii,"total") = sum(jj, ABAR0(ii,jj)); display ANEW, ABAR0; Scalar meanerr1, meanerr2; meanerr1 = sum(ii, abs(err1.l(ii)))/card(ii); meanerr2 = sum(macro, abs(err2.l(macro)))/card(macro); display meanerr1, meanerr2;