Estimating the mean of a small sample under the two parameter lognormal distribution

Lognormally distributed variables are found in biological, economic and other systems. Here the sampling distributions of maximum likelihood estimates (MLE) for parameters are developed when data are lognormally distributed and estimation is carried out either by the correct lognormal model or by the mis-specified normal distribution. This is designed as an aid to experimental design when drawing a small sample under an assumption that the population follows a normal distribution while in fact it follows a lognormal distribution. Distributions are derived analytically as far as possible by using a technique for estimator densities and are confirmed by simulations. For an independently and identically distributed lognormal sample, when a normal distribution is used for estimation then the distribution of the MLE of the mean is different to that for the MLE of the lognormal mean. The distribution is not known but can be well enough approximated by another lognormal. An analytic method for the distribution of the mis-specified normal variance uses computational convolution for a sample of size 2. The expected value of the mis-specified normal variance is also found as a way to give information about the effect of the model misspecification on inferences for the mean. The results are demonstrated on an example for a population distribution that is abstracted from a survey. Peter Hingley European Patent Office, Munich, Germany e-mail: phingley@epo.org Citation: Peter Hingley, Estimating the mean of a small sample under the two parameter lognormal distribution, in R. Anguelov, M. Lachowicz (Editors), Mathematical Methods and Models in Biosciences, Biomath Forum, Sofia, 2018, pp. 100-121, http://dx.doi.org/10.11145/texts.2018.02.027 Copyright: c © 2018 Hingley et al. This article is distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Introduction
Here some analytic expressions are developed for the distributions of maximum likelihood estimators (MLEs) of parameters of samples from the lognormal distribution.These are described both under a correct lognormal estimation model (EM) and under an incorrect normal EM.The latter situation can occur either because of lack of knowledge of the data generating model (DGM) or because of the simplicity of carrying out statistical inference under the assumption of normality.It may also be that, for a small sample where the statistical assumptions behind the central limit theorem do not apply, asymmetry of the data around the mean is not apparent.Therefore a scientist may be unaware that a variable has a lognormal distribution and so be tempted to measure the arithmetic mean and standard deviation of the sample data in order to use normal inference.
The ideas can be applied in the experimental design phase, when considering the possibility of a different DGM to an EM.By making presumptions about the likely form of the population distribution, then the EM and the sample size can be chosen to give the desired precision of the resulting estimate.At the data analysis stage, other ways to deal with a lack of knowledge of the population distribution include using a robust estimator like the median, or a Student's t test for the mean in the case of a normal distribution with unknown variance.
Lognormally distributed variables are found in biological, economic and other systems.Sometimes it is convenient to calculate statistics directly on the log metric [1] [2] [3] [4].In this case, straightforward normal theory applies for estimating means and standard errors.It can happen however that the original scale is important.The expression for the MLE of the lognormal mean includes the mean and variance of the associated normal distribution on the log scale.The MLEs are neither unbiased nor efficient in this case [5] and some other estimators are available [6] [10].But we consider here the situation of straightforward data analysis where the MLEs for mean and variance are used, either under the lognormal EM or under the normal EM.The arithmetic mean, which is the MLE of a normal mean, does not include a variance term.
Exact analytic probability density functions (PDFs) for MLEs under the lognormal EM will be obtained by using a technique for estimator densities (TED) [7] [8] [9].On the other hand, only approximate forms are developed for the distribution of the arithmetic mean under a lognormal DGM.The analytic PDFs are compared to the empirical PDFs obtained by making simulations with random numbers.Examples are given in the development, firstly for a theoretical illustration and then for a reported distribution of numbers of employees at companies applying for patents.
Section 2 explains TED as an algebraic formula for the PDF of a MLE.Section 3 reviews the exact PDFs of the MLEs of the parameters of the lognormal distribution on lognormally distributed data.Section 4 considers the approximate PDFs of the MLEs of the parameters under the normal distribution on lognormally distributed data.Since this leads to some difficulties even for a sample of size 2, an alternative approach is shown to calculate the expected value of the normal variance estimate.This allows the expected 95 percent range limits for the mean to be found.Section V discusses an example involving data on the numbers of employees at companies making patent applications from a survey.Section 6 concludes and suggests avenues for further research.Computations were made with R programs.

The technique for estimator densities (TED)
This is an exact model based approach to find the density of a MLE, rather than an approximate data based approach such as density estimation where the observed data are used to estimate the distribution [11].
In the following, a term g() indicates a PDF.Consider independently and identically distributed (iid) data that are gathered into a (n × 1) vector w.In order to obtain the MLEs of the parameters of g(), the likelihood of the data is ∏ n i=1 g(w i ).This is maximised by using the logarithm of the likelihood [12].
Say that l(θ , w) is the log likelihood of the data under the EM, with p estimable parameters in a p×1 vector θ .Let and indicate differentiation by θ , once or twice respectively.Consider cases where l(θ , w) is continuous, differentiable and has a single maximum with no other turning point.Then the MLEs θ are given by l (θ , w)| θ = θ = 0.There is also the further requirement that l(θ , w) is differentiable for a second time.It is desired to find g( θ ).Following [7], consider a (p×1) vector where θ * is fixed at an arbitrary value and θ is yet to be specified.Under the regularity conditions that were mentioned above, the exact PDF for θ is given as follows.
Here j(θ , w) = −l (θ , w) is the observed information.The term E w [| j(θ , w)|| θ = θ ] describes a conditional expectation, that is conditional on θ = θ and is taken with respect to w over the EM.The second term represents the value of the PDF g [T ( θ ,θ * ,w)] (t), for which θ * = θ and θ = θ , so that t = 0 by (1).
TED allows for a distinction to be made between the functional forms of the PDFs of the data g 0 (w) on the DGM and g 1 (w|θ ) on the EM.It can also be used when the functional form of the EM is the same as the DGM.
While TED is useful because it gives the exact PDF of the MLE, from a practical point of view it can only be applied to simple enough models for which the components in equation ( 2) can be calculated.In order to illustrate how this works, Table 1 shows some previously described examples (from [8]), where a normal EM is used to estimate the mean when the DGM is either normal (with known variance) or negative exponential.In the former case it turns out that g( θ ) is normal, as is already well known from elementary statistical theory, while in the latter case g( θ ) has a gamma distribution.The table indicates the terms that combine to give g( θ ) according to equation (2).
TED is not a panacea, in that the problem of calculating the analytic PDF for θ is transformed into the problem of finding the analytic PDF g [T ( θ ,θ * ,z)] (t).This can be done for simple PDFs such as those in Table 1.In the cases that are discussed in this paper, the situation is further simplified because Table 1 Examples of the use of TED by equation (2) (from [8]).
Log likelihood l(θ , z), θ = δ , see equation ( 14), set w to z 3 Densities of estimators for the lognormal distribution In this section, results are described when the data are generated by the lognormal distribution and estimated using the MLEs for the lognormal distribution.Most of the results are already known but are redeveloped here using TED to give an integrated approach.
The two parameters can be gathered into a parameter vector ∆ T (2x1) = (µ, σ 2 ).The expected value of w is exp(µ + σ 2 2 ) [3].The mean is a function of b as well as of a in LN(a, b), unlike the case of N(a, b) where the mean a is not a function of b.
As an illustration, consider the distribution LN(−1.5, 3).The expected value of an observation w from this distribution is exp(−1.5 + 1.5) = 1.This is an asymmetric PDF, as is shown in Fig. 1.

The maximum likelihood estimate of the sample mean
Here the MLE of the sample mean of a lognormal distribution is shown, assuming that the variance is known.
Reparameterise from ∆ T to θ T = (γ, σ 2 ), where γ = exp(µ + σ 2 2 ).This is the mean of the lognormal variable that was given in Section 3.1.We do not bother to parameterise the lognormal variance explicitly.In terms of the new parameters, the log likelihood is In order to obtain the MLE for γ, the derivative of the log likelihood is taken wrt γ.
Assuming that σ 2 is known, the mle γ is given by l (θ , w) Fig. 1 The lognormal PDF with mean γ = 1 and lognormal variance term ).This is used for the illustrations in Sections 3 and 4.

The PDF of the MLE of the sample mean
Here the PDF of γ is described, assuming that the variance term σ 2 is known.TED will be used to find g( γ|σ 2 ).
Comparison of expressions (10) and (4) shows that the mean of an iid sample from a lognormal distribution with known σ 2 has a lognormal distribution, with mean exp(log(γ 0 ) + σ 2 2n ) and a variance term σ 2 n .For the illustration that was introduced in 3.1, the middle diagram in Fig. 3 (below) shows a comparison of the PDF specified by equation (10) and a probability histogram derived from simulated data for samples of size n = 2 from LN(−1.5, 3).This distribution is LN(1, 1.5) and has mean exp(0 + 1. 5  2 ) = 2.12.Equivalence of the analytic PDF to the simulations is indicated.

The PDF of the associated sample variance σ 2
By applying the derivative δ l(θ ,w) δ σ 2 to equation ( 6), the MLE of σ 2 is σ 2 = ∑(log(wi)− μ) 2  n . The analytic unconditional PDF g( σ 2 ) can be found from equation (3).Say that the true value of σ 2 is σ 0 2 .Standard theory [12] shows that the quantity n σ 2 σ 0 2 has a chi squared distribution with n − 1 degrees of freedom.So, by transformation, where Γ () is the Gamma function.Equation (11) indicates that g( σ 2 ) does not have to be written as a conditional PDF, because it is independent of γ 0 and γ.Fig. 2 shows this PDF for the illustration with n = 2, again comparing the PDF specified by equation ( 11) with a probability histogram derived from simulated sets of samples.Equivalence of the analytic PDF to the simulations is indicated.

The conditional PDF g( γ| σ 2 )
Here the approach in Section 3.3 is extended to obtain the PDF of the sample mean γ when it is conditional on the sample variance σ 2 .TED will be used to find g( γ| σ 2 ), that is conditional on the estimate σ 2 from the same data set.
Say that the underlying parameters are γ 0 and σ 0 2 .The log likelihood (6) for the estimation model is now written with σ 2 = σ 2 , while for the DGM the same likelihood is written with γ = γ 0 and σ 2 = σ 2 0 .The conditional MLE is derived in an analogous way to equation (8).
T is now obtained as in equation (9).
Figs. 3a to 3c show three variants of g( γ| σ 2 ) for the illustration (taking n = 2 from LN(-1.5, 3)), corresponding to conditional values for σ 2 of 2.25, 3 and 4 respectively.The data have been generated using σ 0 2 = 3. Agreement of the analytic PDF with the histogram only occurs when σ 2 = σ 0 2 = 3 in the middle diagram (as was discussed in Section 3.3).

The joint PDF g( γ, σ 2 )
Here the results of Sections 3.4 and 3.5 are combined to find the joint PDF of γ and σ 2 from a sample.
The previous section 3.5 showed that, for data sets from the lognormal distribution, in the PDF g( γ| σ 2 ) there is a dependency between γ and σ 2 that needs to be considered.σ 2 will not be the same over several data samples, so the conditional PDF g( γ| σ 2 ) may be difficult to interpret.The joint PDF of γ and σ 2 is of interest in order to better understand the consequences of the model.This is given by multiplying the expressions (11) and (12).
Figure 4 shows this bivariate PDF for the illustration, using simulated data sets (left plot) and the analytic formula (right plot).Agreement of the PDFs is indicated.
For n = 2, g( γ, σ 2 ) descends in both directions with no observable mode.Chisquared distributions with 1 or 2 degrees of freedom have no mode [3].This has an effect on the associated PDF g( γ, σ 2 ) when n = 2. Fig. 5 shows that there is a mode for the bivariate PDF with the same model and parameters when n = 6, again by simulations (left plot) and by the analytic formula (right plot).

Fitting the normal distribution to lognormal data
The consequences will now be described of wrongly using the normal distribution as EM when the DGM is the lognormal distribution.The DGM will be written LN(µ 0 , σ 0 2 ) and the EM will be written N(δ , η 2 ), as in Equation ( 3) but now in terms of g(w) rather than g(z).

The conditional PDF g( δ | η2 )
Here the PDF of the sample mean will be developed when it is conditional on the sample variance.
For the normal distribution, the log likelihood for δ , conditional on η2 , from equation ( 3) is Differentiating by δ , the MLE for a normal EM is the sample mean δ = Σ w i n .The exact distribution of δ is not known and unfortunately TED does not help here because it also specifies the need to develop an expression for the distribution of ∑ w i .
Several methods are available to approximate the distribution.One way is to approximate g( δ | η2 ) by a transformed version of the lognormal distribution LN(log(γ 0 ), σ 2 n ) for g( γ|σ 2 ).Expression (10) for the distribution of the MLE γ under the lognormal EM cannot be used directly, because this is for the geometric mean with a correction as at (8), that does not have the same distribution as the arithmetic mean δ .The difference can be seen with simulation results for the illustration using the same DGM by comparing the distributions shown in Fig. 6 and Fig. 3b for n = 2.The distribution for δ is shifted to the left compared to the one for γ.Relating to the lognormal distribution in equation ( 4), assume that δ estimates exp(µ 0 ) while γ estimates exp(µ 0 ).exp( σ 0 2 2n ).The choice of a lognormal distribution g( δ |σ 0 2 ) ∼ LN(log(γ 0 ) − σ 0 2 n , σ 0 2 n ) preserves the same variance σ 0 2 n as in g( γ|σ 0 2 ) at (10), and gives the mean exp(log(γ 0 ) − σ 0 2 2n ).This is consistent with γ estimating exp(µ 0 ).exp( σ 0 2 2n ) and δ estimating exp(µ 0 ).However the dashed line in Fig. 6 shows that this gives only an approximate fit when n = 2.It was also verified that it gives only an approximate fit to simulations when n = 6 with the same parameter values.
Empirical investigation suggests that the following distribution works better for n = 2.
This gives the mean exp(log )) = 0.733 for the illustration with n = 2.The solid analytic line in Fig. 6 according to equation ( 15) closely follows the shape of the histogram.Unlike equation (12) for g( γ| σ 2 ), equation ( 15) is not directly dependent on its conditional argument σ 2 and can be written as g( δ ).

The pdf of the misfitted sample variance
Here some steps are shown towards developing the PDF of the normal EM sample variance η2 .As in Section 3.6, the intention is then to seek to multiply g( η2 ) by g( δ | η2 ) = g( δ ) from Section 4.1, in order to determine the joint PDF g( δ , η2 ).
The log likelihood conditional on δ is written in a similar fashion to equation (14).
Differentiating by η 2 , From this, the MLE is η2 = 1 n ∑((wi − δ ) 2 ) = Σ r i n , where r i = (w i − δ ) 2 .Unlike the situation in Section 3.4, the PDF for η2 depends on γ 0 as well as σ 0 2 .Consider the case n = 1.Since w ∼ LN(µ 0 , σ 0 2 ), with µ 0 = log(γ 0 ) − σ 0 2 2 , the quantity v i = (log(w i )−µ 0 ) 2 σ 0 2 follows a chi-square distribution on 1 degree of freedom.That is,  Fig. 9 can be compared with Fig. 4 and Fig. 5 for the correct lognormal EM.But a direct comparison of the spread of σ 2 with that of η2 needs to take account of the different scales involved for the variance terms.

Expected value of the misfitted normal variance
Given the difficulties in constructing the analytical form of the two way plots for g( δ , η2 ), in this section another analytical approach is taken for the case n = 2.
Attention will be restricted to the effects of the wrong estimation model on the expected value of the variance term E 2 , when n = 2.This is the same as E[r i ] due to independence of the sample members.Say that E From equation ( 16), By making the substitution u = √ s + δ , using the positive square root only, this can be written as follows.
Here k is the order of the incomplete moment, Φ is the cumulative distribution function of the standard normal distribution N(0, 1) (from −∞ to the argument), and µ k is the corresponding complete moment exp[kp + k 2 2 q].Equation ( 17) can be split into three terms of this type.
The evaluated expression is found by using (18) and (19).
Fig. 10 The frequency distribution of numbers of employees per applicant for patents from survey data [14].Note that this representation gives equal weight to the grouped classes and is not arithmetic.For n = 2, the expected one sided upper range limit for 95 percent of the sample means using the correct lognormal EM, under g( γ|σ 0 2 ) as in Equation ( 10), was obtained numerically as 73019.As was discussed in Section 4.4, for usage in equation (20) δ can be set to exp( −6.√ 2, which is 3680.This would be higher in case a Student's t distribution based limit was used.See Fig. 13. There are other ways to calculate an expected 95 percent one sided upper confidence limit for the mean.The variance of the lognormal distribution is [exp(σ 2 ) − 1]exp(2µ + σ 2 ) [1].With µ = 4.595 and σ 2 = 6.1815, the square root of this variance is 2172, which suggests a one sided upper range limit of only 2633, although this could be made larger by using a Student's t distribution based limit.

Conclusions
The above approaches demonstrate the effect of misspecifying the normal model for estimation on data that were generated by the lognormal distribution.This can be useful at the experimental design stage where model robustness issues may be of concern.While the context of a data set may sometimes give knowledge about the DGM, in other cases this will not be known.Clearly the distribution of the MLE of the lognormal mean can differ considerably from that of the arithmetic mean with consequences for the statistical inferences from a sample.Inferences that are made