GTU Probability and Statistics (P&S) Winter 2021 Paper Solutions

Q1

(a) Define a term random variable and explain different types of random variable

A random variable is not a traditional numerical variable, but rather a function that assigns a numerical value to each outcome in a sample space of a random experiment.

There are two main types of random variables:

  1. Discrete Random Variables: These are the random variables that can take on a countable number of distinct values. For example, the number of heads in 10 coin tosses is a discrete random variable because it can only take on integer values between 0 and 10.
  2. Continuous Random Variables: These are random variables that can take an infinite number of possible values. Continuous random variables are typically measurements and can include any value in an interval. For example, the height of students in a class is a continuous random variable because it can take on any value within a range (like between 4 feet and 7 feet).

(b) A card is drawn at random from a pack of 52 cards. What is the probability that the card is a spade or a king?

In a standard deck of 52 cards:

  • There are 13 spades.
  • There are 4 kings.

To find the probability, we use the formula for the probability of either of two events (A or B):

P(AorB)=P(A)+P(B)P(AandB)P(A or B)=P(A)+P(B)−P(A and B)

Here, A is the event of drawing a spade, and B is the event of drawing a king.

P(A)=Totalnumberofspades/Totalnumberofcards=13/52P(A)=Total number of spades/ Total number of cards = 13/52
KaTeX can only parse string typed expression

P(A and B) is the probability of drawing the king of spades, which is 1/52

Plugging these into the formula:

P(Spade or King) = 13/52 + 4/52 - 1/52

The probability of drawing a card that is either a spade or a king from a standard deck of 52 cards is approximately 0.308, or 30.8%

(c) State Baye’s theorem. There are three bags; first containing 1 white, 2 red and 3green balls; second 2 white, 3 red and 1 green balls and third 3 white, 1 red and 2 green balls. Two balls are drawn from a beg chosen at random. These are found to be 1 white and 1 red. Find the probability that the balls so drawn came from the second bag.

Bayes' theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event.

P(AB)=(P(BA)P(A))/P(B)P(A∣B)=(P(B∣A)⋅P(A)) / P(B)
Bag 1Bag 2Bag 3
1 White2 White3 White
2 Red3 Red1 Red
3 Green1 Green2 Green
Let’s say A1Let’s say A2Let’s say A3

B = 1 white + 1 red

P(A2/B) = P(A2).P(B/A2) / P(A1).P(B/A1) + P(A2).P(B/A2) + P(A3).P(B/A3)

After doing the calculation, the result will be 0.545

Q.2

(a) Two judges in a beauty contest rank the 12 contestants as follows:

x123456789101112
y129610354762111

Calculate rank correlation coefficient.

xyd^2 (x-y)^2
112121
2949
369
41036
534
651
749
871
969
10264
11110
121121
∑=424

r = 1 - 6 (424) / 12 (143) = -0.4825

It indicates that the judges have fairly strong divergent likes & dislikes.

(b) A book contains 100 misprints distributed randomly throughout its 100 pages. What is the probability that a page observed at random contains at least 2 misprints.

Given that there are 100 misprints distributed randomly over 100 pages, we can assume that the average rate (λ) of misprints per page is 1. The Poisson probability mass function is given by:

P(X=x)= e^(−λ) * (λ)ˣ / x!

n=100

λ=1

P(X≥2)=1-P(X<2)

= 1-[P(X=0)+P(X=1)]

= 1-[0.3679+0.3679]

= 0.2642

(c) A die is thrown six times. If getting an odd number is a success, find the

probability of (i) 5 success (ii) at least five success and (iii) at most five success.

n=6. Because the value of n is less, we will use Binomial distribution

P = 1/2 = 0.5

i) P(x=5) = nCxPxQnxnCx P^xQ^{n-x}

6C5(0.5)5(0.5)1=0.09386C5 (0.5)^5 (0.5)^1 = 0.0938

ii) P(X≥5) = P(X=5)+P(X=6)

=0.0938+6C6(0.5)6(0.5)0=0.938+0.0156=0.1094= 0.0938 + 6C6 (0.5)^6 (0.5)^0 = 0.938+0.0156 = 0.1094

iii) P(X≤5)

1P(x>5)=1P(x=6)=10.0156=0.98441 - P(x>5) = 1-P(x=6) = 1-0.0156 = 0.9844

OR

(c) If a random variable x is Gamma distribution with parameter 𝜆 = 3, compute the value of (i) P(x≤1) and (ii) P(1≤x≤2)

f(x) = 𝜆rΓrxr1e𝜆x,x>0\frac {𝜆^r} {Γr} x^{r-1} e^{-𝜆x}, x>0

= 0

Let r = 1 (suppose)

𝜆 = 3 (given)

f(x) = 3x0e3x3x^0e^{-3x}

=3e3x,x>0= 3e^{-3x}, x>0

i) P(x≤1) = 3[e3x3]1=e3+e0=0.95023[\frac{e^{-3x}}{-3}]^1 = -e^-3 + e^0 = 0.9502

ii) P(1≤x≤2) = 3(e3x3)2=e6+e3=0.04733(\frac{e^-3x}{-3})^2 = -e^{-6}+e^{-3} = 0.0473

Q3

(a) Calculate the coefficient of variance for the following data:

Class Interval0-1010-2020-3030-4040-50
Frequency57233
Classxififixifi(xi-x bar)^2
0-1055251280
10-20157105252
20-302525032
30-40353105588
40-504531351728
∑=20∑=420∑=3880

CV = 6xˉ100\frac{6}{x̄}*100%%

=13.928421100= \frac {13.9284}{21} * 100% = 66.3257%

x̄ = fixifi=42020=21\frac{∑fixi}{∑fi} = \frac{420}{20} = 21

Therefore,

388020=13.9284√\frac{3880}{20} = 13.9284

(b) Calculate the median for the following data:

Class Interval0-3030-6060-9090-120120-150150-180
Frequency8132227187
ClassFiCF
0-3088
30-601321
60-902248
90-120 (L)2770
120-1501888
150-180795
∑=95

M = l+n2mfcl+\frac{\frac{n}{2}-m}{f} * c

n2=f2=75/2=47.5\frac{n}{2} = \frac{∑f}{2} = 75/2=47.5%

= 95

(c) Compute the correlation coefficients between X and Y using following data:

X2456811
Y181210875
XYx-x̄Y-Ȳ(x-x̄)(Y-Ȳ)(x-x̄)^2(Y-Ȳ)^2
218-48-321664
412-22-444
510-10010
680-2004
872-3-649
1155-5-252525
∑=-6750106

x̄ = xin=6\frac{∑xi}{n} = 6

Ȳ = yin=10\frac{∑yi}{n} = 10

r = (xxˉ)(YYˉ)(xxˉ)2(YYˉ)2\frac{∑(x-x̄)(Y-Ȳ)}{√∑(x-x̄)^2√∑(Y-Ȳ)^2}

6750106=0.9203\frac{-67}{√50√106} = -0.9203

OR

Q.3 (a) Obtain correlation coefficient between x and y if two regression lines are 4x5y+33=0 and 20x-9y-107=0.

Ans. r=byxbxyr = √byxbxy

  • 5y=4x+33:

    y=4/5x + 33/5

    byx = 4/3 < 1

  • 20x=9y+107: x = 9/20y + 107/20 bxy = 9/20 < 1

= 3/5 = 0.6

(b) Calculate the mode for the following data:

Class Interval0-1010-2020-3030-4040-50
Frequency101419713

Mode z = l+f1f02f1f0f2cl+ \frac{f1-f0}{2f1-f0-f2}*c

Classfi
0-1010
10-2014 f0
l=20-3019 (highest freq. or f1)
30-407 f2
40-5013

After putting all the values in formula and calculating,

= 22.9412

(c) Obtain the regression line of y on x for the following data:

x1009878851109380
y85907072958174

Ans.

xyx-x̄Y-Ȳ(x-x̄)(Y-Ȳ)(x-x̄)^2
10085843264
9890695436
7870-14-11154196
8572-7-96349
110951814252324
93811001
8074-12-784144
639814

Reg y on x,

Y-Ȳ=byx(x-x̄)

byx = (xxˉ)(YYˉ)(xxˉ)2\frac{∑(x-x̄)(Y-Ȳ)}{∑(x-x̄)^2}

x̄ = xin=92\frac{∑xi}{n} = 92

Ȳ= yin=81\frac{∑yi}{n} = 81

= 639/814

= 0.7850

Q.4

(i) Null hypothesis

  • Null Hypothesis (H0): It means that there is no significant difference or effect observed, essentially stating that any observed variation in data is due to chance or random occurrences.

(ii)Alternate hypothesis

  • Alternate Hypothesis (H1 or Ha): It proposes an effect or difference, suggesting that observed data variations are not just due to random chance but due to a specific cause or factor.

(iii) Errors while accepting or rejecting a hypothesis

Errors in Hypothesis Testing:

  • Type I Error: Occurs when the null hypothesis is true, but is incorrectly rejected. It's also known as a "false positive."
  • Type II Error: Occurs when the null hypothesis is false, but is incorrectly accepted. It's known as a "false negative."

(b) The mean of 35 sample of the thermal conductivity of a certain kind of cement brick is 0.343 with standard deviation of 0.010. Test the hypothesis that the population mean is 0.340 at 5% level of significance

Ans. n=35

x̄=0.343

S=0.010

  1. H0: μ = 0.340
  2. H1: μ ≠ 0.340
  3. α = 0.05
  4. z = xˉμS/n=1.7748\frac{x̄-μ}{S/√n} = 1.7748
  5. Z0.05 = 1.96
  6. MOD(x) < MOD(Z0.05)

Therefore, the hypothesis is accepted.

(c) Fit a binomial distribution for the following data showing the survey of 800 families with 4 children and test the goodness of fit

No. of boys01234
No. of girls43210
No. of families3217829023864

Ans.

No. of boysf0fe(f0-fe)^2/fe
032506.48
11782002.42
22903000.333
32382007.22
464503.92
∑=20.3733

For expected freq.

P(X=x)=ncxPxQnxP(X=x)=ncxP^x Q^{n-x}

n=4, P=0.5, Q=0.5

P(X=0) = 4C0(0.5)^0 (0.5^4)

= 0.0625

No. of families = 800 * 0.0625

= 50

P(X=1) = 0.25

No. of families = 0.25 * 800

= 200

P(X=2) = 0.375

No. of families = 300

P(X=3) = 0.25, family = 200

P(X=4) = 0.0625, family = 50

Total=800

Testing the hypothesis:

  1. H0: Birth rate of boys and girls are same
  2. H1: Birth rate of boys and girls are not same
  3. α=0.05
  4. x2=(f0fe)2fe=20.3733x^2=\frac{∑(f0-fe)^2}{fe} = 20.3733
  5. v=n-1 = 5-1 = 4

    x20.05(v=4)=9.488x^20.05(v=4)=9.488
  6. x2>x20.05x^2>x^20.05

Therefore, it is rejected

OR

Q.4 (a) A random sample of size 15 from bivariate normal distribution gave a correlation coefficient r=0.5. Is this indicate the existence of correlation in the population?

Ans. n=15, r=0.5

Testing the hypothesis:

  1. H0: There is no co-relation 3=0
  2. H1 = 3≠0
  3. α=0.05
  4. t=rn21r2=2.0817t = \frac{r√n-2}{1-r^2}=2.0817
  5. v=n-2=15-2=13

    t0.05(v=13) = 2.16

  6. t < t0.05

Therefore, it is accepted

(b) A tire company is suspicious to claim that the average lifetime of certain tires is at least 28000 km. To check the claim, the company takes the sample of 40 tires and gets a mean life time of 27463 km with standard deviation of 1348 km. Test the hypothesis at 1% level of significance

It has large sample & single mean.

μ=28000

n=40

x=27463

s=1348

Testing the hypothesis:

  1. H0: μ=28000 km
  2. H1: μ≠28000 km
  3. α=0.01
  4. z = xˉμS/n=2.5795\frac{x̄-μ}{S/√n} = -2.5795

    MOD(z) = 2.5795

  5. Z0.01 = -2.333

    MOD(Z0.01) = 2.333

  6. MOD(z)>MOD(z0.01)

Therefore, it is rejected

(c) Fit a Poisson distribution for the following data and test the goodness of fit.

x01234
f112733041

Ans.

xf0fixife(f0-fe)^2/fe
0112042
1737376
230602.5
34126
4141
∑=220∑=149∑=0.3997

The final answer will be 220.

Q.5

(a) In 𝑦 = 𝑎 + 𝑏𝑥 if ∑ 𝑥 = 50, ∑ 𝑦 = 80, ∑ 𝑥𝑦 = 1030, ∑ 𝑥^2 = 750 and 𝑛 =

10, then find 𝑎 and 𝑏

y= a + bx

∑y = na + b∑x

∑xy = a∑x + b∑x^2

a = 1.7

b = 1.26

y = 1.7 + 1.26x

(b) Fit a curve 𝑦=𝑎𝑒𝑏𝑥𝑦 = 𝑎𝑒^{𝑏𝑥} for the following data:

x1234
y7111727

lny = lna + bx

Say, lny = Y, lna = A and bx = bx

n= 4

∑Y = nA + b∑x

∑xY = A∑x + b∑x^2

xyY=lnyxYx^2
171.94591.94591
2112.39794.79584
3172.83328.49969
4273.295813.18326
1010.4728.4230

A = 1.4969 b = 0.4485

a=eA=4.4678a=e^A=4.4678

y = ae^bx = (4.4678)e^0.44485x

(c) State properties of the normal distribution. Suppose the marks of 800 students are normally distributed with mean 66 and standard deviation 5. Find number of students getting marks (i) between 65 and 70 (ii) greater than or equal to 72 (Given that P(0≤z≤0.20)=0.0793, that P(0≤z≤0.80)=0.2881 and that P(0≤z≤1.2)=0.3849)

Properties of normal distribution:

  1. It is symmetric
  2. Its mean/median/mode are equal

n=800, μ=66, σ = 5

(i) P(65<x<70)

z1 = x1μσ=0.20\frac{x1-μ}{σ} = -0.20

z2 = x2μσ=0.80\frac{x2-μ}{σ} = 0.80

P(-0.20<Z<0.80) = P(-0.20<z<0) + P(0<Z<0.80)

= P(0<Z<0.20) + P (0<Z<0.81)

= 0.0793 + 0.2881

= 0.3674

No. of students = 0.3674 * 800 = 294

(ii) P(x≥72)

z=x1μσ=72.665=1.2z= \frac{x1-μ}{σ} = \frac{72.66}{5}=1.2

P(z≥1.2) = P(0≤z≤) - P(0≤z≤1.2)

= 0.5 - 0.3849

= 0.1151

= No. of students = 0.1151 * 800 = 92 (approx)

OR

Q.5 (a) A random variable x has the following probability distribution:

xi0123
pi1/63/83/81/8

Find the standard deviation of x for the given distribution.

Ans.

Standard Deviation = (E(x2)[E(x)]2)=(3(1.5)2)=0.8660√(E(x^2)-[E(x)]^2) = √(3-(1.5)^2) = 0.8660

E(x2)=xi2Pi=(0)2(1/8)+(1)2(3/8)+(2)2(3/8)+(3)2(1/8)=3E(x^2) = ∑xi^2Pi = (0)^2(1/8)+(1)^2(3/8)+(2)^2(3/8) + (3)^2(1/8) = 3E(x)=xiPi=(0)(1/8)+(2)(3/8)+(3)(1/8)=1.5E(x)=∑xiPi= (0)(1/8)+(2)(3/8)+(3)(1/8) = 1.5

Q.5 (b) With usual notations, find the value of p for a binomial random variable x when n=6 and 9P(x=4)=P(x=2).

q p(x=4) = p(x=2)

P(X=x) = nCxPxqnxnCxP^xq^{n-x}

q[6C4P4q2]=6C2P2q4q[6C4P^4q^2] = 6C2P^2q^4q(15p4q2)=15p2q4q(15p^4q^2) = 15p^2q^4qp2=q2qp^2=q^2qp2=(1p)2qp^2 = (1-p)^2qp2=12p+p2qp^2=1-2p+p^28p2+2p1=08p^2+2p-1=0

(2P+1)(4P-1)=0

P=12-\frac{1}{2} but that’s not possible so

P= 1/4

(c) Fit a parabola 𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐 for the following data:

x-1012
y-2124

Ans.

xyx^2x^3x^4xyx^2y
..............
∑=2∑=5∑=6∑=8∑=18∑=12∑=16

a = -0.25

b= 2.15

c=0.55

y=ax2+bx+cy = ax^2 + bx +c

∑y = a∑x^2 + b∑x + nc

∑xy = a∑x^3 + b∑x^2 + c∑x

x2y=ax4+bx3+cx2∑x^2y = a∑x^4 + b∑x^3 + c∑x^2