/* racd9p4.gau does panel data using Hausman, Hall and Griliches data This preliminary version does: (1) Fixed effects linear model (2) Fixed effects Poisson model using maxlik with analytical derivatives (though numerical hessian may not be correct) (3) Fixed effects Neg bin model using maxlik with numerical derivatives (4) Random effects Poisson model using maxlik with numerical derivatives (5) Random effects Neg bin model using maxlik with numerical derivatives 1, 2, 4 and 5 give same result as Limdep (see racd9p5.lim). In this comparison all models have no constant and no time-invariant. Same data and variables as chapter nine used. Limdep does not do 3 for this data. For all I am unable to calculate Hessian at estimates. Instead use outerproduct of first derivatives for covariance matrix. Need to include constant and perhaps time invariant in random effects models. TO RUN THIS PROGRAM .... - You need an implementation of Gauss that has Maxlik (a Gauss add-on) - You need my procedures feplli (for gradient etc.) included near the end of this program - You need procedures opendat, rad, readdat included at the end of this program to select data to include model Program 1. Reads in a flat ascii data set. 2. Creates new data from transformation of original data. This sets up the data in appropriate form for panel analysis. 3. Converts the data to a Gauss data set, with names. 4. Selects a subset of the data to be used in regression 5. Calls my procedure feplli etc. to get logl, gradient and hessian 6. Calls the Gauss add-on maxlik to perform maximum likelihood. */ /* The application is to panel data on the patents R& D relationship. This uses data in racd9d1.asc (= original data in PATR7079.DAT) The data are those used in: Hall, Griliches, and Hausman (1986), "Patents and R&D: Is There a Lag?", IER 27: 265-283. For more details on the data see file racd9.1st The files have the same organization, fixed format ~80 column length, 4 records per firm-observation, data separated by blanks, but in fixed fields so the file can be read either by fixed or free format commands. The order of the variables is the following: Time-invariant: CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT, Time-varying X: LOGR70,LOGR71,LOGR72, ....., LOGR77,LOGR78,LOGR79 Time-varying Y: PAT70,PAT71,PAT72, ....., PAT77,PAT78,PAT79 where CUSIP Compustat's identifying number for the firm (Committee on Uniform Security Identification Procedures number). ARDSIC A two-digit code for the applied R&D industrial classification (roughly that in Bound, Cummins, Griliches, Hall, and Jaffe, in the Griliches R&D, Patents, and Productivity volume). SCISECT Dummy equal to one for firms in the scientific sector. LOGK The logarithm of the book value of capital in 1972. SUMPAT The sum of patents applied for between 1972-1979. LOGR70- The logarithm of R&D spending during the year (in 1972 dollars). LOGR79 PAT70- The number of patents applied for during the year that were PAT79 eventually granted. */ /******* Initial setup: libraries used etc. and output file ******** */ New; @ clear workspace @ library maxlik; @ So Gauss knows we use procedures in maxlik @ maxset; @ A maxlik command @ disable; @ Allows missing values in calculation @ outfile = "c:\\racd\\racd9p4.out"; @ all output goes to this file @ output file = ^outfile reset; @ delete earlier contents @ print "File racd9p4.out is output from Gauss program racd9p4.prg"; print "Date is:"; print date; print; /******* Read in the data and create variables ********** Here we first read the data in as a flat ascii file. The data are ordered by firm with 8, 6, 6 and variables in lines 1, 2, 3 and 4. Some of these observations Variables in order are 5 time invariant: CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT, Regressor for 10 years: LOGR70,LOGR71,LOGR72,LOGR73,LOGR74, LOGR75,LOGR76,LOGR77,LOGR78,LOGR79, Dependent for 10 years PAT70,PAT71,PAT72,PAT73,PAT74, PAT75,PAT76,PAT77,PAT78,PAT79 Then we create a Gauss data set with names provided. The data set is for 5 years data to be analyzed. */ /* Read data into matrix */ nrecords = 346; load data1[nrecords,25] = "c:\\racd\\racd9d1.asc"; /* Dependent for 5 years */ patents = data1[.,21:25]; @ PAT75,PAT76,PAT77,PAT78,PAT79 @ /* Time-invariant */ static = data1[.,1:5]; @ CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT @ /* Regressors for 5 years */ xvec75 = data1[.,11]~data1[.,10]~data1[.,9]~data1[.,8]~data1[.,7] ~data1[.,6]; @ LOGR75,LOGR74,LOGR73,LOGR72,LOGR71,LOGR70 @ xvec76 = data1[.,12]~data1[.,11]~data1[.,10]~data1[.,9]~data1[.,8] ~data1[.,7]; @ LOGR76,LOGR75,LOGR74,LOGR73,LOGR72,LOGR71 @ xvec77 = data1[.,13]~data1[.,12]~data1[.,11]~data1[.,10]~data1[.,9] ~data1[.,8]; @ LOGR77,LOGR76,LOGR75,LOGR74,LOGR73,LOGR72 @ xvec78 = data1[.,14]~data1[.,13]~data1[.,12]~data1[.,11]~data1[.,10] ~data1[.,9]; @ LOGR75,LOGR74,LOGR73,LOGR72,LOGR71,LOGR70 @ xvec79 = data1[.,15]~data1[.,14]~data1[.,13]~data1[.,12]~data1[.,11] ~data1[.,10]; @ LOGR75,LOGR74,LOGR73,LOGR72,LOGR71,LOGR70 @ /* Time dummies for 5 years */ one = ones(nrecords,1); @ A column vector of ones @ zero = zeros(nrecords,1); @ A column vactor of zeros @ dyear75 = one~zero~zero~zero~zero; dyear76 = zero~one~zero~zero~zero; dyear77 = zero~zero~one~zero~zero; dyear78 = zero~zero~zero~one~zero; dyear79 = zero~zero~zero~zero~one; /* Create new data matrix */ data2 = patents~one~static~xvec75~dyear75~xvec76~dyear76 ~xvec77~dyear77~xvec78~dyear78~xvec79~dyear79; clear data1,one,static,xvec75,xvec76,xvec77,xvec78,xvec79; /******* Create Gauss data set with names ********* The created Gauss data set will be called hhgpan1.dat and the associated names will be in file hhgpan1.dht The data set will contain the following data, where Y7AD7B is = 1 if A=B and = 0 if A not= B PAT75,PAT76,PAT77,PAT78,PAT79 ONE,CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT XVEC75 = LOGR75,LOGR74,LOGR73,LOGR72,LOGR71,LOGR70 DVEC75 = Y75D75,Y75D76,Y75D77,Y75D78,Y75D79 XVEC76 = LOGR76,LOGR75,LOGR74,LOGR73,LOGR72,LOGR71 DVEC76 = Y76D75,Y76D76,Y76D77,Y76D78,Y76D79 XVEC77 = LOGR77,LOGR76,LOGR75,LOGR74,LOGR73,LOGR72 DVEC77 = Y77D75,Y77D76,Y77D77,Y77D78,Y77D79 XVEC78 = LOGR78,LOGR77,LOGR76,LOGR75,LOGR74,LOGR73 DVEC78 = Y78D75,Y78D76,Y78D77,Y78D78,Y78D79 XVEC79 = LOGR79,LOGR78,LOGR77,LOGR76,LOGR75,LOGR74 DVEC79 = Y79D75,Y79D76,Y79D77,Y79D78,Y79D79 */ dataset = "c:\\racd\\racd9d3"; let vnames = PAT75,PAT76,PAT77,PAT78,PAT79, ONE,CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT, LOGR75,LOGR75L1,LOGR75L2,LOGR75L3,LOGR75L4,LOGR75L5, Y75D75,Y75D76,Y75D77,Y75D78,Y75D79, LOGR76,LOGR76L1,LOGR76L2,LOGR76L3,LOGR76L4,LOGR76L5, Y76D75,Y76D76,Y76D77,Y76D78,Y76D79, LOGR77,LOGR77L1,LOGR77L2,LOGR77L3,LOGR77L4,LOGR77L5, Y77D75,Y77D76,Y77D77,Y77D78,Y77D79, LOGR78,LOGR78L1,LOGR78L2,LOGR78L3,LOGR78L4,LOGR78L5, Y78D75,Y78D76,Y78D77,Y78D78,Y78D79, LOGR79,LOGR79L1,LOGR79L2,LOGR79L3,LOGR79L4,LOGR79L5, Y79D75,Y79D76,Y79D77,Y79D78,Y79D79; call saved(data2,dataset,vnames); /******** Extract the subset of the Gauss data set to be used ********* */ data ="c:\\racd\\racd9d3"; print; call dstat(data,0); print; @ Check data set coorect @ /* The panel estimator requires data to be organized as 1. Years 1 to T data on the dependent variable 2. Year 1 data on all regressors (including time dummies) .... Year 1 data on all regressors (including time dummies) If the number of years used changes you need to change T */ /* The following two lines need to be changed every time. These give the dependent and independent variables. */ let YVAR = PAT75 PAT76 PAT77 PAT78 PAT79; @ let XVAR = LOGR75,LOGR76,LOGR77,LOGR78,LOGR79; @ let XVAR = LOGR75,LOGR75L1,LOGR75L2,LOGR75L3,LOGR75L4,LOGR75L5, Y75D76,Y75D77,Y75D78,Y75D79, LOGR76,LOGR76L1,LOGR76L2,LOGR76L3,LOGR76L4,LOGR76L5, Y76D76,Y76D77,Y76D78,Y76D79, LOGR77,LOGR77L1,LOGR77L2,LOGR77L3,LOGR77L4,LOGR77L5, Y77D76,Y77D77,Y77D78,Y77D79, LOGR78,LOGR78L1,LOGR78L2,LOGR78L3,LOGR78L4,LOGR78L5, Y78D76,Y78D77,Y78D78,Y78D79, LOGR79,LOGR79L1,LOGR79L2,LOGR79L3,LOGR79L4,LOGR79L5, Y79D76,Y79D77,Y79D78,Y79D79; /* Find number of years (Tmax), regressors (p) and variable names */ Tmax = rows(yvar); p = rows(xvar)/Tmax; print; "Panel analysis for number of periods:" Tmax; print; "Dependent variable in first period: " $yvar[1]; print; "Number of regressors in model: " p; print; "Regressor variables in first period: " $xvar[1:p]'; /* Set data size dimensions. m and nobs are needed for procedures opendat and readdat nobs is needed for model estimation procedures */ m = Tmax; @ Number of dependent variables @ nobs = 346; @ Maximum number of rows to read in dataset @ /* Call opendat, rad, readdat in seldata.src This gives x - nobs*k matrix of explanatory variables y - nobs*m matrix of dependent variables indy - Index for the dependent variables indx - Index for the exogenous variables f1 - File handle for the open data set n - Number of observations in the data set nror - number of times data set needs to be read to be complete rest - number of obervations in the time data is read The data are read into memory one time. Nobs must be equal to number of observations in the data set */ {f1,n,indy,indx} = opendat(data,yvar,xvar,&indices,m); {nror, rest } = rad(nobs,n); { y,x } = Readdat(f1,1,nobs,indy,indx); /******** Do linear fixed effects model ********* */ {bfe,sebfe,tbfe,sumres,sigmasq} = linearfe(y,x,Tmax,nobs,p); /* Print results */ print; print " PANEL FIXED EFFECTS LINEAR MODEL using linearfe"; print "Variable Linear FE Standard errors t-statistics"; results=ones(p,4); results[.,1]=xvar[1:p]; results[.,2:4]=bfe~sebfe~tbfe; let resmask[1,4]= 0 1 1 1; @ character numeric numeric numeric @ call printfmt(results,resmask); print; print "sumres sigmasq:" sumres sigmasq; print; /******** Settings for estimation by MAXLIK version 4 *********/ /* The call to malik is of the form: {b,phi0,grd,cov,retcode} = maxlik(data,yvar|xvar,&loglli,start);. where arguments are data = the name of the Gauss data set (or matrix if data in matrix) yvar|xvar = the variables to be selected from the Gauss data set loglli = procedure to compute log-likelihood (in ghmnl.src) start = starting values of the parameters */ /* The following sets various parameters for maxlik. These are generally ones that over-ride the default values in maxlik. For default values and descriptions see the maxlik manual or the Gauss code in maxlik.src */ _mlditer = 40; @ Not sure @ _max_Algorithm = 2; @ 1=STEEP 2=BFGS 3=DFP 4=NR 5=BHHH 6=PRCG @ _max_MaxIters = 100; @ maximum number of iterations @ _mlmtime = 1e+5; @ Not sure 1e+5 = default @ _max_CovPar = 2; @ Method to compute covariance matrix @ @ 0=Estimated Hessian from the secant update @ @ 1=hessian 2=cross-product 1st derivs @ @ 3 = heteroskedastic-consistent estimate @ _max_GradTol = 0.000001; @ Convergence tolerance for gradient of coeffs @ __output = 2; @ Minimal output. Use for Unix without X-windows @ /******* (1) Fixed effects Poisson using maxlik *****************/ /* Give heading for output */ print; print; parnames = xvar[1:p]; @ Variable names: not needed for maxlik @ output on; " PANEL FIXED EFFECTS POISSON MLE using Maxlik and proc feplli"; " Dataset ";;Data; " Nr of observations ";;n; " Nr of times data is read ";;nror; " Nr of obs. read last chew ";;rest; " Dependent variable ";;$yvar[1]; " Parameters (Variables) names ";; $parnames'; output off; /* Give analytical gradient and hessian */ _max_GradProc = &fepgrd; @ either &fepgrd or 0 if numerical @ _max_HessProc = &fephess; @ either &fephess or 0 if numerical @ /* Starting values */ start = {.32,-.087,.078,.0010,-.0046,.0026,-.042,-.040,-.15,-.19}; /* Run maxlik and print the results */ output on; {bfep,phi0,grdfep,covfep,retcode} = maxlik(data,yvar|xvar,&feplli,start); call maxprt(bfep,phi0,grdfep,covfep,retcode); output off; /******* (2) Fixed effects negative binomial using maxlik **************/ /* Give heading for output */ print; print; parnames = xvar[1:p]; @ Variable names: not needed for maxlik @ output on; " PANEL FIXED EFFECTS NEG BIN MLE using Maxlik and proc fenblli"; " Dataset ";;Data; " Nr of observations ";;n; " Nr of times data is read ";;nror; " Nr of obs. read last chew ";;rest; " Dependent variable ";;$yvar[1]; " Parameters (Variables) names ";; $parnames'; output off; /* Give analytical gradient and hessian */ _max_GradProc = 0; @ either &fenbgrd or 0 if numerical @ _max_HessProc = 0; @ either &fenbhess or 0 if numerical @ /* Starting values */ start = {.32,-.087,.078,.0010,-.0046,.0026,-.042,-.040,-.15,-.19}; /* Run maxlik and print the results */ output on; {bfenb,phi0,grdfenb,covfenb,retcode} = maxlik(data,yvar|xvar,&fenblli,start); call maxprt(bfenb,phi0,grdfenb,covfenb,retcode); output off; /******* (3) Random effects Poisson using maxlik *****************/ /* Give heading for output */ print; print; parnames = xvar[1:p]; @ Variable names: not needed for maxlik @ output on; " PANEL RANDOM EFFECTS POISSON MLE using Maxlik and proc replli"; " Dataset ";;Data; " Nr of observations ";;n; " Nr of times data is read ";;nror; " Nr of obs. read last chew ";;rest; " Dependent variable ";;$yvar[1]; " Parameters (Variables) names ";; $parnames';" DELTA"; output off; /* Give analytical gradient and hessian */ _max_GradProc = 0; @ either &repgrd or 0 if numerical @ _max_HessProc = 0; @ either &rephess or 0 if numerical @ /* Starting values */ start = {.32,-.087,.078,.0010,-.0046,.0026,-.042,-.040,-.15,-.19,2.0}; /* Run maxlik and print the results */ output on; {brep,phi0,grdrep,covrep,retcode} = maxlik(data,yvar|xvar,&replli,start); call maxprt(brep,phi0,grdrep,covrep,retcode); output off; /******* (4) Fixed effects negative binomial using maxlik **************/ /* Give heading for output */ print; print; parnames = xvar[1:p]; @ Variable names: not needed for maxlik @ output on; " PANEL RANDOM EFFECTS NEGBIN MLE using Maxlik and proc renblli"; " Dataset ";;Data; " Nr of observations ";;n; " Nr of times data is read ";;nror; " Nr of obs. read last chew ";;rest; " Dependent variable ";;$yvar[1]; " Parameters (Variables) names ";; $parnames';" A B "; output off; /* Give analytical gradient and hessian */ _max_GradProc = 0; @ either &renbgrd or 0 if numerical @ _max_HessProc = 0; @ either &renbhess or 0 if numerical @ /* Starting values */ start = {.32,-.087,.078,.0010,-.0046,.0026,-.042,-.040,-.15,-.19,2.0,5.0}; /* Run maxlik and print the results */ output on; {brenb,phi0,grdrenb,covrenb,retcode} = maxlik(data,yvar|xvar,&renblli,start); call maxprt(brenb,phi0,grdrenb,covrenb,retcode); output off; f1 = close(f1); /* END OF PROGRAM with procedures following below: (1)-(4) Procedures for maxlik: feplli, fenblli, replli, renblli (5) Procedures to read data: opendat, rad and readdat (6) Procedure for ln(gamma): lngamma */ /************** (1) PROCEDURES feplli, fepgrd, fephess ******************* Uses RACD equations (9.16),(9.17) p.283 and (9.22) p.285 These assume Tmax years of data, and p regressors in each year Data give in order the TMax years y then the Tmax years x */ /**** (1a) Log-likelihood function for Fixed effects Poisson ****/ proc(1) = feplli(theta,z); local li,n,Tmax,p,t,y,x,sumy,lamda,sumlamda,term1,term2,term3; /* Define data dimensions and data */ Tmax = 5; @ number of years @ n = rows(z); @ number of observations @ p = (cols(z) - Tmax)/Tmax; @ number of regressors @ y = z[.,1:Tmax]; @ n x Tmax dependent variable matrix @ x = z[.,Tmax+1:Tmax+Tmax*p]; @ n x Tmax*P regressor matrix @ /* Initialize matrices */ lamda = zeros(n,Tmax); @ n x Tmax matrix of conditional means @ sumlamda = zeros(n,1); sumy=zeros(n,1); term2=zeros(n,1); term3=zeros(n,1); /* Create sums of y and lamda */ t=1; do while t <=Tmax; lamda[.,t] = exp(x[.,p*(t-1)+1:p*t]*theta); sumlamda = sumlamda + lamda[.,t]; sumy = sumy + y[.,t]; t = t+1; endo; /* Calculate n x 1 vector of components of likelihood function */ term1 = lnfact(sumy); t=1; do while t <=Tmax; term2 = term2 + lnfact(y[.,t]); term3 = term3 + y[.,t].*(x[.,p*(t-1)+1:p*t]*theta - ln(sumlamda)); @+.001@ t = t+1; endo; li = term1 - term2 + term3; @ print y~sumlamda~li; @ retp(li); endp; /**** (1b) Gradient function for Fixed effects Poisson ****/ proc(1) = fepgrd(theta,z); local gr,n,Tmax,p,t,y,x,lamda,sumy,sumlamda; /* Define data dimensions and data */ Tmax = 5; @ number of years @ n = rows(z); @ number of observations @ p = (cols(z) - Tmax)/Tmax; @ number of regressors @ y = z[.,1:Tmax]; @ n x Tmax dependent variable matrix @ x = z[.,Tmax+1:Tmax+Tmax*p]; @ n x Tmax*P regressor matrix @ /* Initialize matrices */ gr = zeros(n,1); lamda = zeros(n,Tmax); @ n x Tmax matrix of conditional means @ sumlamda = zeros(n,1); sumy = zeros(n,1); /* Create lamda and sums of y and lamda */ t=1; do while t <=Tmax; lamda[.,t] = exp(x[.,p*(t-1)+1:p*t]*theta); sumlamda = sumlamda + lamda[.,t]; sumy = sumy + y[.,t]; t = t+1; endo; /* Calculate n x 1 vector of components of gradient */ t=1; do while t <= Tmax; gr = gr + x[.,p*(t-1)+1:p*t].*(y[.,t] - lamda[.,t].*sumy./sumlamda); t = t+1; endo; retp(gr); endp; /**** (1c) Hessian function for Fixed effects Poisson ****/ proc(1) = fephess(theta,z); local H,gr,li,n,Tmax,p,t,y,s,x,lamda,sumy,sumlamda,term1,term2; /* Define data dimensions and data */ Tmax = 5; @ number of years @ n = rows(z); @ number of observations @ p = (cols(z) - Tmax)/Tmax; @ number of regressors @ y = z[.,1:Tmax]; @ n x Tmax dependent variable matrix @ x = z[.,Tmax+1:Tmax+Tmax*p]; @ n x Tmax*P regressor matrix @ /* Initialize matrices */ lamda = zeros(n,Tmax); @ n x Tmax matrix of conditional means @ sumlamda = zeros(n,1); sumy = zeros(n,1); term1 = zeros(p,p); term2 = zeros(p,p); /* Create lamda and sums of y and lamda */ t=1; do while t <=Tmax; lamda[.,t] = exp(x[.,p*(t-1)+1:p*t]*theta); sumlamda = sumlamda + lamda[.,t]; sumy = sumy + y[.,t]; t = t+1; endo; /* Calculate p x p Hessian */ t=1; do while t <= Tmax; term1 = term1 + (x[.,p*(t-1)+1:p*t].*lamda[.,t])'(x[.,p*(t-1)+1:p*t].*sumy./sumlamda); s = 1; do while s <= Tmax; term2 = term2 + (x[.,p*(t-1)+1:p*t].*lamda[.,t].*sumy./sumlamda)' (x[.,p*(s-1)+1:p*s].*lamda[.,s]); s = s+1; endo; t = t+1; endo; H = -term1 +term2; retp(H); endp; /************** (2) PROCEDURES fenblli ******************* Uses RACD equations (9.18) p.284 These assume Tmax years of data, and p regressors in each year Data give in order the TMax years y then the Tmax years x */ /*** (2a) Log-likelihood function for fixed effects negative binomial ****/ proc(1) = fenblli(theta,z); local li,n,Tmax,p,t,y,x,lamda,sumlamda,sumy,term1,term2; /* Define data dimensions and data */ Tmax = 5; @ number of years @ n = rows(z); @ number of observations @ p = (cols(z) - Tmax)/Tmax; @ number of regressors @ y = z[.,1:Tmax]; @ n x Tmax dependent variable matrix @ x = z[.,Tmax+1:Tmax+Tmax*p]; @ n x Tmax*P regressor matrix @ /* Initialize matrices */ lamda = zeros(n,Tmax); @ n x Tmax matrix of conditional means @ sumlamda = zeros(n,1); sumy=zeros(n,1); term1=zeros(n,1); /* Create lamda and sums of y and lamda */ t=1; do while t <=Tmax; lamda[.,t] = exp(x[.,p*(t-1)+1:p*t]*theta); sumlamda = sumlamda + lamda[.,t]; sumy = sumy + y[.,t]; t = t+1; endo; /* Calculate n x 1 vector of components of likelihood function */ /* Note: Use my procedure lngamma(x) at end of program */ t=1; do while t <=Tmax; term1 = term1 + lngamma(lamda[.,t]+y[.,t]) - lngamma(lamda[.,t]) - lnfact(y[.,t]); t = t+1; endo; term2 = lngamma(sumlamda) + lnfact(sumy) - lngamma(sumlamda+sumy); li = term1 + term2; retp(li); endp; /************** (3) PROCEDURES replli, repgrd ******************* Uses RACD equations (9.25),(9.26) p.288 These assume Tmax years of data, and p regressors in each year Data give in order the TMax years y then the Tmax years x */ /**** (3a) Log-likelihood function for Random effects Poisson ****/ proc(1) = replli(theta,z); local li,n,Tmax,p,t,i,j,y,x,lamda,sumlamda,sumy,term1,term2,ratio; /* Define data dimensions and data */ Tmax = 5; @ number of years @ n = rows(z); @ number of observations @ p = (cols(z) - Tmax)/Tmax; @ number of regressors @ y = z[.,1:Tmax]; @ n x Tmax dependent variable matrix @ x = z[.,Tmax+1:Tmax+Tmax*p]; @ n x Tmax*P regressor matrix @ /* Initialize matrices */ lamda = zeros(n,Tmax); @ n x Tmax matrix of conditional means @ sumlamda = zeros(n,1); sumy=zeros(n,1); term1=zeros(n,1); /* Create lamda and sums of y and lamda */ t=1; do while t <=Tmax; lamda[.,t] = exp(x[.,p*(t-1)+1:p*t]*theta[1:p]); sumlamda = sumlamda + lamda[.,t]; sumy = sumy + y[.,t]; t = t+1; endo; /* Calculate n x 1 vector of components of likelihood function */ /* Note: Use my procedure lngamma(x) at end of program */ t=1; do while t <=Tmax; term1 = term1 + y[.,t].*ln(lamda[.,t]) - lnfact(y[.,t]); t = t+1; endo; term2 = theta[p+1]*(ln(theta[p+1])-ln(sumlamda+theta[p+1])) -sumy.*(ln(sumlamda+theta[p+1])) + lngamma(sumy+theta[p+1]) -lngamma(theta[p+1]); li = term1 + term2; @ print y~denom~li; @ retp(li); endp; /**** (3b) Gradient function for random effects Poisson ****/ /* This is not wotking. Says dimension theta not same as start values */ proc(1) = repgrd(theta,z); local gr,n,Tmax,p,t,y,x,lamda,sumy,sumlamda; /* Define data dimensions and data */ Tmax = 5; @ number of years @ n = rows(z); @ number of observations @ p = (cols(z) - Tmax)/Tmax; @ number of regressors @ y = z[.,1:Tmax]; @ n x Tmax dependent variable matrix @ x = z[.,Tmax+1:Tmax+Tmax*p]; @ n x Tmax*P regressor matrix @ /* Initialize matrices */ gr = zeros(n,1); lamda = zeros(n,Tmax); @ n x Tmax matrix of conditional means @ sumlamda = zeros(n,1); sumy = zeros(n,1); /* Create lamda and sums of y and lamda */ t=1; do while t <=Tmax; lamda[.,t] = exp(x[.,p*(t-1)+1:p*t]*theta[1:p]); sumlamda = sumlamda + lamda[.,t]; sumy = sumy + y[.,t]; t = t+1; endo; /* Calculate n x 1 vector of components of gradient Note: (ybar+delta/T)/(lamdabar+delta/T) = (sumy+delta)/(sumlamda+delta) */ t=1; do while t <= Tmax; gr = gr + x[.,p*(t-1)+1:p*t] .*(y[.,t] - lamda[.,t].*(sumy+theta[p+1])./(sumlamda+theta[p+1])); t = t+1; endo; retp(gr); endp; /************** (4) PROCEDURES fenblli ******************* Uses RACD equations (9.27) p.288 These assume Tmax years of data, and p regressors in each year Data give in order the TMax years y then the Tmax years x */ /**** (4a) Log-likelihood function for random effects negative bin ****/ proc(1) = renblli(theta,z); local li,n,Tmax,p,t,i,j,y,x,lamda,sumlamda,sumy,term1,term2,ratio; /* Define data dimensions and data */ Tmax = 5; @ number of years @ n = rows(z); @ number of observations @ p = (cols(z) - Tmax)/Tmax; @ number of regressors @ y = z[.,1:Tmax]; @ n x Tmax dependent variable matrix @ x = z[.,Tmax+1:Tmax+Tmax*p]; @ n x Tmax*P regressor matrix @ /* Initialize matrices */ lamda = zeros(n,Tmax); @ n x Tmax matrix of conditional means @ sumlamda = zeros(n,1); sumy=zeros(n,1); term1=zeros(n,1); /* Create lamda and sums of y and lamda */ t=1; do while t <=Tmax; lamda[.,t] = exp(x[.,p*(t-1)+1:p*t]*theta[1:p]); sumlamda = sumlamda + lamda[.,t]; sumy = sumy + y[.,t]; t = t+1; endo; /* Calculate n x 1 vector of components of likelihood function */ /* Note: Use my procedure lngamma(x) at end of program */ t=1; do while t <=Tmax; term1 = term1 + lngamma(lamda[.,t]+y[.,t]) - lngamma(lamda[.,t]) - lnfact(y[.,t]); t = t+1; endo; term2 = lngamma(theta[p+1]+theta[p+2]) + lngamma(theta[p+1]+sumlamda) + lngamma(theta[p+2]+sumy) - lngamma(theta[p+1]) - lngamma(theta[p+2]) - lngamma(theta[p+1]+theta[p+2]+sumlamda+sumy); li = term1 + term2; @ print y~denom~li; @ retp(li); endp; /************** (5) PROCEDURES linearfe ******************* Linear fixed effects model estimates requires data to be organized as: 1. Years 1 to Tmax data on the dependent variable 2. Year 1 data on all regressors (including time dummies) .... Year Tmax data on all regressors (including time dummies) Requires matrices y which is nobs x Tmax x which is nobs x p*Tmax and constants Tmax, nobs and p */ proc(5) = linearfe(y,x,Tmax,nobs,p); local bfe,sebfe,tbfe,sumres,sigmasq,t,ybar,xbar,xx,xy,afe,covbfe,vbfe; /* Create individual averages over time of y and x */ ybar = zeros(nobs,1); @ vector of individual-specific means over time @ xbar = zeros(nobs,p); @ matrix of individual-specific means over time @ t=1; do while t <=Tmax; ybar = ybar + y[.,t]; xbar = xbar + x[.,p*(t-1)+1:p*t]; t = t+1; endo; ybar = ybar/Tmax; xbar = xbar/Tmax; /* Create cross-products xx and xy */ xx = zeros(p,p); xy = zeros(p,1); t=1; do while t <=Tmax; xx = xx + (x[.,p*(t-1)+1:p*t]-xbar)'(x[.,p*(t-1)+1:p*t]-xbar); xy = xy + (x[.,p*(t-1)+1:p*t]-xbar)'(y[.,t]-ybar); t = t+1; endo; /* Create fixed effects estimates */ bfe = invpd(xx)*xy; afe = ybar - xbar*bfe; /* Estimate error variance */ sumres = 0; t=1; do while t <=Tmax; sumres = sumres +(y[.,t]-afe-x[.,p*(t-1)+1:p*t]*bfe)'(y[.,t]-afe-x[.,p*(t-1)+1:p*t]*bfe); t = t+1; endo; sigmasq = sumres/(n*(Tmax-1)-p-4); /* Estimate standard errors */ covbfe = sigmasq*invpd(xx); vbfe = diag(covbfe); sebfe = sqrt(vbfe); tbfe = bfe./sebfe; retp(bfe,sebfe,tbfe,sumres,sigmasq); endp; /************** PROCEDURES opendat, rad, readdat *************** Following procedures written 4/16/97 based on Per's fsa2.src. In turn I think this is based on Frank Windmeijer's code. Basically given a Gauss data set with names Calling opendat, rad, readdat in seldata.src gives x - nobs*k matrix of explanatory variables y - nobs*m matrix of dependent variables indy - Index for the dependent variables indx - Index for the exogenous variables f1 - File handle for the open data set n - Number of observations in the data set nror - number of times data set needs to be read to be complete rest - number of obervations in the time data is read ------- Opendat ------- Open the dataset and get the indices of the explanatory and dependent variables { f1,n,indy,indx } = opendat(data,yvar,xvar,&indices,m); INPUT: data - String data set to be used yvar - Character vecor of names of dependent variables xvar - Charcter vector of names of explantory variables &indices - Proc Indices.src, should be included in the main program m - The Number of dependent variables OUTPUT: f1 - File handle for the open data set n - Number of observations in the data set indy - Index for the dependent variables indx - Index for the exogenous variables ------- rad ------- { nror, rest } = rad(nobs,n); INPUT: n - number of observations in the data set nobs - number of observations to read OUTPUT: nror - number of times data set needs to be read to be complete y - number of obervations in the time data is read ------- Readdat ------- Reads the nobs observations of the data set { x,y } = Readdat(f1,r,nobs,indy,indx); INPUT: f1 - File handle for the open data set r - the row where to start the reading nobs - number of observations to read indy - Index for the dependent variables indx - Index for the exogenous variables program OUTPUT: x - nobs*k matrix of explanatory variables y - nobs*m matrix of dependent variables */ proc (4) = opendat(data,yvar,xvar,&index,m); local f1,n,p,var; local depvar,indy,ind,indx; local index:proc; p = rows(yvar|xvar); { var,ind } = index(data,yvar|xvar); indy = ind[1:m]; indx = ind[m+1:p]; open f1 = ^data; n = rowsf(f1); retp(f1,n,indy,indx); endp; proc (2) = rad(nobs,n); local a,nror,rest; if n > nobs; a = n/nobs; nror = ceil(a); rest = nobs*(a- (nror-1)); else; nror = 1; rest = n; endif; retp(nror,rest); endp; proc (2) = Readdat(f1,r,nobs,indy,indx); local y,z,x; y = seekr(f1,r); z = readr(f1,nobs); x = z[.,indx]; y = z[.,indy]; retp(y,x); endp; /************** PROCEDURE lngamma **************************/ /* Created by Colin Cameron 8/10/98 ** Problem that while lngamma(x) = lnfact(x-1), ** lnfact restricts x > 0 and hence for lngamma restricts X > 1, not > 0 ** Same as lnfact.src except changes on lines 50, 52, 72 ** ** lnfact.src ** (C) Copyright 1988-1994 by Aptech Systems, Inc. ** All Rights Reserved. ** ** Purpose: Computes the natural log of the factorial function ** ** Format: y = LNFACT( x ); ** ** Input: x -- NxK matrix. All elements must be non-negative. ** ** Output: y -- NxK matrix containing the natural log of the factorials of ** each of the elements of x. ** ** Remarks: For integer x in the range from 1 to 172, this is ** (approximately) ln(x!). However, the computation is done ** using using a formula, and the function is defined for ** non-integer x. ** ** In most formulae in which the factorial operator appears it ** is possible to avoid computing the factorial directly, and ** to use LNFACT instead. The advantage of this is that LNFACT ** does not have the overflow problems that ! has. ** ** For x >= 1, this function has at least 6 digit accuracy, ** for x > 4 it has at least 9 digit accuracy, and for x > 10 ** it has at least 12 digit accuracy. For 0 < x < 1, accuracy ** is not known completely but is probably at least 6 digits. ** ** Sometimes log gamma is required instead of log factorial. ** These are related by: ** ** lngamma( x ) = lnfact( x - 1 ); ** ** This function can be converted into lngamma by removing the ** statement x = x + 1 at the top of the proc. Alternatively, ** lngamma can be defined in terms of lnfact using the above ** relationship. ** ** Technical ** Notes: For x > 1, Stirling's formula is used. For 0 < x <= 1, ** ln( gamma( x ) ) is used. ** ** See Also: GAMMA, ! */ proc lngamma(x); /* My change from lnfact(x) */ local x2,x3,ys,yp,mask,brkpnt; brkpnt = 2; /* My change from brkpnt = 1 */ /* for x below this, use direct computation */ /* check for complex input */ if iscplx(x); if hasimag(x); errorlog "ERROR: Not implemented for complex matrices."; end; else; x = real(x); endif; endif; if not (x >= 0); /* some negative x's */ errorlog "ERROR: Arguments must all be non-negative."; end; endif; /* x = x + 1; */ /* if this is removed, the function computes ln gamma */ brkpnt = brkpnt + 1; if (x > brkpnt); /* all x's greater than this -- use stirling's :: approx. */ gosub stirling; retp( ys ); elseif (x <= brkpnt); /* all x's less than this -- use :: ln(gamma) */ gosub lngamma; retp( yp ); else; /* some x's of each type -- combine estimates */ gosub stirling; gosub lngamma; mask = (x .> brkpnt); retp( ys.*mask + yp.*(.not mask) ); endif; /* ----------------- subroutines follow ----------------------- */ stirling: /* stirling's approx */ x2 = x.*x; x3 = x.*x2; if x<0; print "problem"; endif; ys = 0.5 * ln(2*pi) + (x - 0.5) .* ln(x) - x + 1./(12.*x) - 1./(360.*x3) + 1./(1260.*x3.*x2) - 1./(1680.*x3.*x2.*x2) + 1./(1188.*x3.*x2.*x2.*x2); return; lngamma: yp = ln( gamma(x) ); return; endp;