In this final project, it will analyze bias between a Poisson distribution and Binomial distribution by R simulation. There will be three parts included in this project: firstly, what are Poisson distribution and Binomial distribution, and why bias happens in these two distributions. Secondly, I will try to explain the general regularity of bias from the simulation. At last, I will try to show the regularity for each *p*, and I will explain what *p* is later in the first part. According to this simulation analysis, I will try to figure out the regularity of the bias.

First of all, what are Poisson distribution and Binomial distribution? In Wikipedia, it says “In probability theory and statistics, the Poisson distribution, named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event.” This a good explanation: Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event. For example, it is known that in a hospital, an average of 3 babies is born every hour in general, but how many children will be born in the next hour? Then we can use Poisson distribution to predict it.

For the definition of Binomial distribution, in Wikipedia, it says “In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments.” The relationship between Poisson distribution and Binomial distribution is when *n* of the binomial distribution is large and *p* is small, the Poisson distribution can be used as an approximation of the binomial distribution, where λ equals *np. *In other words, when *n *is not large and *p* is not small, their bias probably will appear. To figure out how the bias looks like, I did the following research.

In the research, I use R to make a simulation in which *p* is from 0.1 to 0.9, and *n *equals 50. The size of dots will show you the bias between a Poisson distribution and Binomial distribution. Y-axis is *p*, and the x-axis is *j *which represents each Integer number of *n*. (Figure 1) From this graph, we can find that for each *p, *the biggest bias is always about a number which *j* = *p ** *n. *For example, when *p* = 0.5, the highest bias will almost appear at *j* = 0.25. I guess it is probably because that when *j* = *p ** *n, *the number of Poisson distribution and Binomial distribution will both be relatively high, so the bias will be more obvious.

Figure1. General Graph

To check my assumption, I closely look at the bias for each *p*. It seems that I was true. Whatever when p = 0.9, 0.1 or 0.5, the highest bias is always at the point which *j* = *p ** *n. *(Figure 2 to 4, which x-axis is *j* and y-axis is bias.)

Figure2. *p* = 0.9

Figure3. *p* = 0.1

Figure4. *p* = 0.5

However, from the first general graph, there is something strange: the bias does not continuously get bigger and smaller, it has some hollows. For example: As I marked in Figure 5, when *p* equal to 0.5.

Figure 5.

To figure out the regularity of these hallows, I picked the lowest bias point around the highest bias. I connect those points together and show them in a new graph which x-axis is *p* and y-axis is *j*, so here it is. (Figure 6) We can see that the general trend is increasing, but it is not perfectly linear, so I guess probably, sometimes the lowest bias is on the left, but sometimes it is on the right. I picked all the lowest points on the right firstly and put them together, (Figure 7) and pick all the lowest points on the left to do the same thing(Figure 8). They proved my assumption! These plots look more liner. Although I can’t prove why sometimes the lowest bias is at left or right, according to the general graph, I can almost certain that the two-side lowest bias is symmetrical. As a result, I can say that the lowest bias is liner increasing as soon as *p* increases.

Figure 6

Figure 7 plots on the right

Figure 8. plots on the left

To figure out the regularity more deeply, one question I asked myself: Will the difference between them change as soon as *p* change? To solve my doubt, I also use R to create a graph. (Figure 9) Here you are. The y-axis is the difference between the highest bias and lowest bias around the highest bias, and the x-axis is *p*. To be honest, this is not a liner plot, but it is more like a square function. However, I am not totally sure, but one thing we can see is as soon as *p* increase the difference also increases and the difference between this “difference” keeps increasing.

Figure 9

In conclusion, from these simulations, we can know that: Firstly, the highest bias points are always at when *j* = *p ** *n.* In other words, they are also linear increasing as soon as *p* increasing. Secondly, the lowest bias around those highest bias is liner increasing as soon as *p* increasing, and the lowest bias is symmetrical. At last, as soon as *p* increase the difference also increase and the difference of this “difference” keeps increasing, and I guess it is more like a square function increasing.

```
PoisApprox <- function(n, p, j){
TrueValue <- dbinom(j, n, p)
ApprValue <- dpois(j, n*p)
bias <- abs(TrueValue - ApprValue)
return(bias)
}
j <- 0:50
p <- seq(0.1, 0.9, 0.1)
bias <- sapply(j, function(x) sapply(p, function(y) PoisApprox(50, y, x)))
colnames(bias) <- 0:50
rownames(bias) <- p
x <- rep(0:50, each=9)
y <- rep(p, 51)
library(ggplot2)
dat <- data.frame(x=x, y=y, bias=c(round(bias, 4)))
ggplot(dat, aes(x, y)) + geom_point(aes(size=bias)) +
xlab('j') + ylab('p') + ggtitle('B(50, p)') + ggsubmain('Figure 1')
P <- seq(0.05, 0.95, 0.05)
Bias <- sapply(j, function(x) sapply(P, function(y) PoisApprox(50, y, x)))
plot(j, Bias[19, ], pch=19, main='p=0.95', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[19, ])], max(Bias[19, ]), pch=8, col='blue', cex=2)
plot(j, Bias[18, ], pch=19, main='p=0.9', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[18, ])], max(Bias[18, ]), pch=8, col='blue', cex=2)
plot(j, Bias[17, ], pch=19, main='p=0.85', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[17, ])], max(Bias[17, ]), pch=8, col='blue', cex=2)
plot(j, Bias[16, ], pch=19, main='p=0.8', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[16, ])], max(Bias[16, ]), pch=8, col='blue', cex=2)
plot(j, Bias[15, ], pch=19, main='p=0.75', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[15, ])], max(Bias[15, ]), pch=8, col='blue', cex=2)
plot(j, Bias[14, ], pch=19, main='p=0.7', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[14, ])], max(Bias[14, ]), pch=8, col='blue', cex=2)
plot(j, Bias[13, ], pch=19, main='p=0.65', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[13, ])], max(Bias[13, ]), pch=8, col='blue', cex=2)
plot(j, Bias[12, ], pch=19, main='p=0.6', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[12, ])], max(Bias[12, ]), pch=8, col='blue', cex=2)
plot(j, Bias[11, ], pch=19, main='p=0.55', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[11, ])], max(Bias[11, ]), pch=8, col='blue', cex=2)
plot(j, Bias[10, ], pch=19, main='p=0.5', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[10, ])], max(Bias[10, ]), pch=8, col='blue', cex=2)
plot(j, Bias[9, ], pch=19, main='p=0.45', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[9, ])], max(Bias[9, ]), pch=8, col='blue', cex=2)
plot(j, Bias[8, ], pch=19, main='p=0.4', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[8, ])], max(Bias[8, ]), pch=8, col='blue', cex=2)
plot(j, Bias[7, ], pch=19, main='p=0.35', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[7, ])], max(Bias[7, ]), pch=8, col='blue', cex=2)
plot(j, Bias[6, ], pch=19, main='p=0.3', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[6, ])], max(Bias[6, ]), pch=8, col='blue', cex=2)
plot(j, Bias[5, ], pch=19, main='p=0.25', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[5, ])], max(Bias[5, ]), pch=8, col='blue', cex=2)
plot(j, Bias[4, ], pch=19, main='p=0.2', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[4, ])], max(Bias[4, ]), pch=8, col='blue', cex=2)
plot(j, Bias[3, ], pch=19, main='p=0.15', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[3, ])], max(Bias[3, ]), pch=8, col='blue', cex=2)
plot(j, Bias[2, ], pch=19, main='p=0.1', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[2, ])], max(Bias[2, ]), pch=8, col='blue', cex=2)
plot(j, Bias[1, ], pch=19, main='p=0.05', xaxt='n', ylab='bias')
axis(1, j, j, las=2)
points(j[which.max(Bias[1, ])], max(Bias[1, ]), pch=8, col='blue', cex=2)
f <- function(x){
ind.max <- which.max(x)
ind.tmp <- (ind.max-4):(ind.max+4)
ind <- ind.tmp[ind.tmp>=1 & ind.tmp<=50]
return(c(ind[which.min(x[ind])], min(x[ind])))
}
val.min <- apply(Bias, 1, f)
plot(1:19, val.min[1, ], xaxt='n', xlab='p', pch=19, ylab='j')
axis(1, 1:19, P)
f.l <- function(x){
ind.max <- which.max(x)
ind.tmp <- (ind.max-4):ind.max
ind <- ind.tmp[ind.tmp>=1 & ind.tmp<=50]
return(c(ind[which.min(x[ind])], min(x[ind])))
}
val.min <- apply(Bias, 1, f.l)
plot(1:19, val.min[1, ], xaxt='n', xlab='p', pch=19, ylab='j')
axis(1, 1:19, P)
f.r <- function(x){
ind.max <- which.max(x)
ind.tmp <- ind.max:(ind.max+4)
ind <- ind.tmp[ind.tmp>=1 & ind.tmp<=50]
return(c(ind[which.min(x[ind])], min(x[ind])))
}
val.min <- apply(Bias, 1, f.r)
plot(1:19, val.min[1, ], xaxt='n', xlab='p', pch=19, ylab='j')
axis(1, 1:19, P)
g <- function(x) c(which.max(x), max(x))
val.max <- apply(Bias, 1, g)
plot(1:19, val.max[1, ], xaxt='n', xlab='p', pch=19)
axis(1, 1:19, P)
plot(1:19, val.max[2, ] - val.min[2, ], xaxt='n', xlab='p', pch=19)
axis(1, 1:19, P)
```