GTU Probability and Statistics (P&S) Winter 2021 Paper Solutions
Q1
(a) Define a term random variable and explain different types of random variable
A random variable is not a traditional numerical variable, but rather a function that assigns a numerical value to each outcome in a sample space of a random experiment.
There are two main types of random variables:
- Discrete Random Variables: These are the random variables that can take on a countable number of distinct values. For example, the number of heads in 10 coin tosses is a discrete random variable because it can only take on integer values between 0 and 10.
- Continuous Random Variables: These are random variables that can take an infinite number of possible values. Continuous random variables are typically measurements and can include any value in an interval. For example, the height of students in a class is a continuous random variable because it can take on any value within a range (like between 4 feet and 7 feet).
(b) A card is drawn at random from a pack of 52 cards. What is the probability that the card is a spade or a king?
In a standard deck of 52 cards:
- There are 13 spades.
- There are 4 kings.
To find the probability, we use the formula for the probability of either of two events (A or B):
Here, A is the event of drawing a spade, and B is the event of drawing a king.
P(A and B) is the probability of drawing the king of spades, which is 1/52
Plugging these into the formula:
P(Spade or King) = 13/52 + 4/52 - 1/52
The probability of drawing a card that is either a spade or a king from a standard deck of 52 cards is approximately 0.308, or 30.8%
(c) State Baye’s theorem. There are three bags; first containing 1 white, 2 red and 3green balls; second 2 white, 3 red and 1 green balls and third 3 white, 1 red and 2 green balls. Two balls are drawn from a beg chosen at random. These are found to be 1 white and 1 red. Find the probability that the balls so drawn came from the second bag.
Bayes' theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event.
Bag 1 | Bag 2 | Bag 3 |
---|---|---|
1 White | 2 White | 3 White |
2 Red | 3 Red | 1 Red |
3 Green | 1 Green | 2 Green |
Let’s say A1 | Let’s say A2 | Let’s say A3 |
B = 1 white + 1 red
P(A2/B) = P(A2).P(B/A2) / P(A1).P(B/A1) + P(A2).P(B/A2) + P(A3).P(B/A3)
After doing the calculation, the result will be 0.545
Q.2
(a) Two judges in a beauty contest rank the 12 contestants as follows:
x | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 12 | 9 | 6 | 10 | 3 | 5 | 4 | 7 | 6 | 2 | 11 | 1 |
Calculate rank correlation coefficient.
x | y | d^2 (x-y)^2 |
---|---|---|
1 | 12 | 121 |
2 | 9 | 49 |
3 | 6 | 9 |
4 | 10 | 36 |
5 | 3 | 4 |
6 | 5 | 1 |
7 | 4 | 9 |
8 | 7 | 1 |
9 | 6 | 9 |
10 | 2 | 64 |
11 | 11 | 0 |
12 | 1 | 121 |
∑=424 |
r = 1 - 6 (424) / 12 (143) = -0.4825
It indicates that the judges have fairly strong divergent likes & dislikes.
(b) A book contains 100 misprints distributed randomly throughout its 100 pages. What is the probability that a page observed at random contains at least 2 misprints.
Given that there are 100 misprints distributed randomly over 100 pages, we can assume that the average rate (λ) of misprints per page is 1. The Poisson probability mass function is given by:
P(X=x)= e^(−λ) * (λ)ˣ / x!
n=100
λ=1
P(X≥2)=1-P(X<2)
= 1-[P(X=0)+P(X=1)]
= 1-[0.3679+0.3679]
= 0.2642
(c) A die is thrown six times. If getting an odd number is a success, find the
probability of (i) 5 success (ii) at least five success and (iii) at most five success.
n=6. Because the value of n is less, we will use Binomial distribution
P = 1/2 = 0.5
i) P(x=5) =
ii) P(X≥5) = P(X=5)+P(X=6)
iii) P(X≤5)
OR
(c) If a random variable x is Gamma distribution with parameter 𝜆 = 3, compute the value of (i) P(x≤1) and (ii) P(1≤x≤2)
f(x) =
= 0
Let r = 1 (suppose)
𝜆 = 3 (given)
f(x) =
i) P(x≤1) =
ii) P(1≤x≤2) =
Q3
(a) Calculate the coefficient of variance for the following data:
Class Interval | 0-10 | 10-20 | 20-30 | 30-40 | 40-50 |
---|---|---|---|---|---|
Frequency | 5 | 7 | 2 | 3 | 3 |
Class | xi | fi | fixi | fi(xi-x bar)^2 |
---|---|---|---|---|
0-10 | 5 | 5 | 25 | 1280 |
10-20 | 15 | 7 | 105 | 252 |
20-30 | 25 | 2 | 50 | 32 |
30-40 | 35 | 3 | 105 | 588 |
40-50 | 45 | 3 | 135 | 1728 |
∑=20 | ∑=420 | ∑=3880 |
CV = %
% = 66.3257%x̄ =
Therefore,
(b) Calculate the median for the following data:
Class Interval | 0-30 | 30-60 | 60-90 | 90-120 | 120-150 | 150-180 |
---|---|---|---|---|---|---|
Frequency | 8 | 13 | 22 | 27 | 18 | 7 |
Class | Fi | CF |
---|---|---|
0-30 | 8 | 8 |
30-60 | 13 | 21 |
60-90 | 22 | 48 |
90-120 (L) | 27 | 70 |
120-150 | 18 | 88 |
150-180 | 7 | 95 |
∑=95 |
M =
= 95
(c) Compute the correlation coefficients between X and Y using following data:
X | 2 | 4 | 5 | 6 | 8 | 11 |
---|---|---|---|---|---|---|
Y | 18 | 12 | 10 | 8 | 7 | 5 |
X | Y | x-x̄ | Y-Ȳ | (x-x̄)(Y-Ȳ) | (x-x̄)^2 | (Y-Ȳ)^2 |
---|---|---|---|---|---|---|
2 | 18 | -4 | 8 | -32 | 16 | 64 |
4 | 12 | -2 | 2 | -4 | 4 | 4 |
5 | 10 | -1 | 0 | 0 | 1 | 0 |
6 | 8 | 0 | -2 | 0 | 0 | 4 |
8 | 7 | 2 | -3 | -6 | 4 | 9 |
11 | 5 | 5 | -5 | -25 | 25 | 25 |
∑=-67 | 50 | 106 |
x̄ =
Ȳ =
r =
OR
Q.3 (a) Obtain correlation coefficient between x and y if two regression lines are 4x5y+33=0 and 20x-9y-107=0.
Ans.
5y=4x+33:
y=4/5x + 33/5
byx = 4/3 < 1
20x=9y+107: x = 9/20y + 107/20 bxy = 9/20 < 1
= 3/5 = 0.6
(b) Calculate the mode for the following data:
Class Interval | 0-10 | 10-20 | 20-30 | 30-40 | 40-50 |
---|---|---|---|---|---|
Frequency | 10 | 14 | 19 | 7 | 13 |
Mode z =
Class | fi |
---|---|
0-10 | 10 |
10-20 | 14 f0 |
l=20-30 | 19 (highest freq. or f1) |
30-40 | 7 f2 |
40-50 | 13 |
After putting all the values in formula and calculating,
= 22.9412
(c) Obtain the regression line of y on x for the following data:
x | 100 | 98 | 78 | 85 | 110 | 93 | 80 |
---|---|---|---|---|---|---|---|
y | 85 | 90 | 70 | 72 | 95 | 81 | 74 |
Ans.
x | y | x-x̄ | Y-Ȳ | (x-x̄)(Y-Ȳ) | (x-x̄)^2 |
---|---|---|---|---|---|
100 | 85 | 8 | 4 | 32 | 64 |
98 | 90 | 6 | 9 | 54 | 36 |
78 | 70 | -14 | -11 | 154 | 196 |
85 | 72 | -7 | -9 | 63 | 49 |
110 | 95 | 18 | 14 | 252 | 324 |
93 | 81 | 1 | 0 | 0 | 1 |
80 | 74 | -12 | -7 | 84 | 144 |
639 | 814 |
Reg y on x,
Y-Ȳ=byx(x-x̄)
byx =
x̄ =
Ȳ=
= 639/814
= 0.7850
Q.4
(a) Explain the term related to testing of hypothesis:
(i) Null hypothesis
- Null Hypothesis (H0): It means that there is no significant difference or effect observed, essentially stating that any observed variation in data is due to chance or random occurrences.
(ii)Alternate hypothesis
- Alternate Hypothesis (H1 or Ha): It proposes an effect or difference, suggesting that observed data variations are not just due to random chance but due to a specific cause or factor.
(iii) Errors while accepting or rejecting a hypothesis
Errors in Hypothesis Testing:
- Type I Error: Occurs when the null hypothesis is true, but is incorrectly rejected. It's also known as a "false positive."
- Type II Error: Occurs when the null hypothesis is false, but is incorrectly accepted. It's known as a "false negative."
(b) The mean of 35 sample of the thermal conductivity of a certain kind of cement brick is 0.343 with standard deviation of 0.010. Test the hypothesis that the population mean is 0.340 at 5% level of significance
Ans. n=35
x̄=0.343
S=0.010
- H0: μ = 0.340
- H1: μ ≠ 0.340
- α = 0.05
- z =
- Z0.05 = 1.96
- MOD(x) < MOD(Z0.05)
Therefore, the hypothesis is accepted.
(c) Fit a binomial distribution for the following data showing the survey of 800 families with 4 children and test the goodness of fit
No. of boys | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
No. of girls | 4 | 3 | 2 | 1 | 0 |
No. of families | 32 | 178 | 290 | 238 | 64 |
Ans.
No. of boys | f0 | fe | (f0-fe)^2/fe |
---|---|---|---|
0 | 32 | 50 | 6.48 |
1 | 178 | 200 | 2.42 |
2 | 290 | 300 | 0.333 |
3 | 238 | 200 | 7.22 |
4 | 64 | 50 | 3.92 |
∑=20.3733 |
For expected freq.
n=4, P=0.5, Q=0.5
P(X=0) = 4C0(0.5)^0 (0.5^4)
= 0.0625
No. of families = 800 * 0.0625
= 50
P(X=1) = 0.25
No. of families = 0.25 * 800
= 200
P(X=2) = 0.375
No. of families = 300
P(X=3) = 0.25, family = 200
P(X=4) = 0.0625, family = 50
Total=800
Testing the hypothesis:
- H0: Birth rate of boys and girls are same
- H1: Birth rate of boys and girls are not same
- α=0.05
v=n-1 = 5-1 = 4
Therefore, it is rejected
OR
Q.4 (a) A random sample of size 15 from bivariate normal distribution gave a correlation coefficient r=0.5. Is this indicate the existence of correlation in the population?
Ans. n=15, r=0.5
Testing the hypothesis:
- H0: There is no co-relation 3=0
- H1 = 3≠0
- α=0.05
v=n-2=15-2=13
t0.05(v=13) = 2.16
t < t0.05
Therefore, it is accepted
(b) A tire company is suspicious to claim that the average lifetime of certain tires is at least 28000 km. To check the claim, the company takes the sample of 40 tires and gets a mean life time of 27463 km with standard deviation of 1348 km. Test the hypothesis at 1% level of significance
It has large sample & single mean.
μ=28000
n=40
x=27463
s=1348
Testing the hypothesis:
- H0: μ=28000 km
- H1: μ≠28000 km
- α=0.01
z =
MOD(z) = 2.5795
Z0.01 = -2.333
MOD(Z0.01) = 2.333
MOD(z)>MOD(z0.01)
Therefore, it is rejected
(c) Fit a Poisson distribution for the following data and test the goodness of fit.
x | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
f | 112 | 73 | 30 | 4 | 1 |
Ans.
x | f0 | fixi | fe | (f0-fe)^2/fe |
---|---|---|---|---|
0 | 112 | 0 | 42 | |
1 | 73 | 73 | 76 | |
2 | 30 | 60 | 2.5 | |
3 | 4 | 12 | 6 | |
4 | 1 | 4 | 1 | |
∑=220 | ∑=149 | ∑=0.3997 |
The final answer will be 220.
Q.5
(a) In 𝑦 = 𝑎 + 𝑏𝑥 if ∑ 𝑥 = 50, ∑ 𝑦 = 80, ∑ 𝑥𝑦 = 1030, ∑ 𝑥^2 = 750 and 𝑛 =
10, then find 𝑎 and 𝑏
y= a + bx
∑y = na + b∑x
∑xy = a∑x + b∑x^2
a = 1.7
b = 1.26
y = 1.7 + 1.26x
(b) Fit a curve for the following data:
x | 1 | 2 | 3 | 4 |
---|---|---|---|---|
y | 7 | 11 | 17 | 27 |
lny = lna + bx
Say, lny = Y, lna = A and bx = bx
n= 4
∑Y = nA + b∑x
∑xY = A∑x + b∑x^2
x | y | Y=lny | xY | x^2 |
---|---|---|---|---|
1 | 7 | 1.9459 | 1.9459 | 1 |
2 | 11 | 2.3979 | 4.7958 | 4 |
3 | 17 | 2.8332 | 8.4996 | 9 |
4 | 27 | 3.2958 | 13.1832 | 6 |
10 | 10.47 | 28.42 | 30 |
A = 1.4969 b = 0.4485
y = ae^bx = (4.4678)e^0.44485x
(c) State properties of the normal distribution. Suppose the marks of 800 students are normally distributed with mean 66 and standard deviation 5. Find number of students getting marks (i) between 65 and 70 (ii) greater than or equal to 72 (Given that P(0≤z≤0.20)=0.0793, that P(0≤z≤0.80)=0.2881 and that P(0≤z≤1.2)=0.3849)
Properties of normal distribution:
- It is symmetric
- Its mean/median/mode are equal
n=800, μ=66, σ = 5
(i) P(65<x<70)
z1 =
z2 =
P(-0.20<Z<0.80) = P(-0.20<z<0) + P(0<Z<0.80)
= P(0<Z<0.20) + P (0<Z<0.81)
= 0.0793 + 0.2881
= 0.3674
No. of students = 0.3674 * 800 = 294
(ii) P(x≥72)
P(z≥1.2) = P(0≤z≤∞) - P(0≤z≤1.2)
= 0.5 - 0.3849
= 0.1151
= No. of students = 0.1151 * 800 = 92 (approx)
OR
Q.5 (a) A random variable x has the following probability distribution:
xi | 0 | 1 | 2 | 3 |
---|---|---|---|---|
pi | 1/6 | 3/8 | 3/8 | 1/8 |
Find the standard deviation of x for the given distribution.
Ans.
Standard Deviation =
Q.5 (b) With usual notations, find the value of p for a binomial random variable x when n=6 and 9P(x=4)=P(x=2).
q p(x=4) = p(x=2)
P(X=x) =
(2P+1)(4P-1)=0
P= but that’s not possible so
P= 1/4
(c) Fit a parabola 𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐 for the following data:
x | -1 | 0 | 1 | 2 |
---|---|---|---|---|
y | -2 | 1 | 2 | 4 |
Ans.
x | y | x^2 | x^3 | x^4 | xy | x^2y |
---|---|---|---|---|---|---|
.. | .. | .. | .. | .. | .. | .. |
∑=2 | ∑=5 | ∑=6 | ∑=8 | ∑=18 | ∑=12 | ∑=16 |
a = -0.25
b= 2.15
c=0.55
∑y = a∑x^2 + b∑x + nc
∑xy = a∑x^3 + b∑x^2 + c∑x