PDA

View Full Version : Order Statistics

Jo_M.
August 7th 2007, 02:51 PM
Does anyone know of a good website to learn about the theory involved with order statistics.

Also, what are the things about order stats that I should know by heart for the exam (pdf, cdf, etc etc)

The_Czar52
August 7th 2007, 04:30 PM
Does anyone know of a good website to learn about the theory involved with order statistics.

Also, what are the things about order stats that I should know by heart for the exam (pdf, cdf, etc etc)

The important order statistics (for the purpose of the exam) are the min and max. From what I hear, questions about order statistics don't appear on the exam with high frequency but nonetheless it is something you should know should it come up. I haven't had a textbook that really covered this in depth. I purchased the ASM manual and it covered it. Basically just practice problems that involve order statistics and thoroughly look over the solutions until you understand it. Wikipedia (below) gives a descent explanation and the "SOA 123" (2nd link below) has a lot of practice questions you could do after you have a basic grasp of the information.

max(X_1, X_2,...,X_n) =P(X_1<=x, X_2<=x,...,X_n<=x)
=P(X_1<=x)*P(X_2<=x)*...*P(X_n<=x)
=F_x1(x)*F_x2(x)*...*F_xn(x)

min(X_1, X_2,...,X_n) =P(X_1>=x, X_2>=x,...,X_n>=x)
=P(X_1>=x)*P(X_2>=x)*...*P(X_n>=x)
=(1-P(X_1<=x))*(1-P(X_2<=x))*...*(1-P(X_n<=x))
=S_x1(x)*S_x2(x)*...*S_xn(x)

(where S_xn(x) = 1-F_xn(x) is the survival function)

These are the definitions for max/min. You need to memorize these either by using notecards or by doing a ton of practice problems.

Hope this helps

http://en.wikipedia.org/wiki/Order_statistics

Jo_M.
August 7th 2007, 06:17 PM
Thanks for the definitions. I will look things up. Does anyone know which of Dr. O or Broverman's sample questions are about order stats?

JGET
August 8th 2007, 12:07 AM
I suggest taking a look at Ross's book " A first course in probability." It doesn't have an extensive chapter on Order Statistics, but its dense and has good exercises.

JGET

krzysio
August 8th 2007, 06:12 PM
Thanks for the definitions. I will look things up. Does anyone know which of Dr. O or Broverman's sample questions are about order stats?

The ASM Manual (i.e., the one written by Dr. O., i.e., me) has extensive coverage of order statistics.
Yours,
Krzys'

sineintegral
August 8th 2007, 11:31 PM
Hi,
I have a question about order statistics and I hope this is a good as place as any to ask this:

Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample from the pdf e^(-x), x > 0. What is the P(Y4 > 2)?

My solution:
P(Y4 > 2) = 1 - P(Y4 <= 2) = 1 - [F(2)] ^4 = 1 - [1- e^(-2)] ^4.

Manual's solution:
P(Y4 > 2) = P(X > 2) ^4 = [1 - P(X <=2)]^4 =[1-(1 -e^(-2))]^4 = e^(-8).

Which is correct? :confused-:

krzysio
August 9th 2007, 12:23 AM
Hi,
I have a question about order statistics and I hope this is a good as place as any to ask this:

Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample from the pdf e^(-x), x > 0. What is the P(Y4 > 2)?

My solution:
P(Y4 > 2) = 1 - P(Y4 <= 2) = 1 - [F(2)] ^4 = 1 - [1- e^(-2)] ^4.

Manual's solution:
P(Y4 > 2) = P(X > 2) ^4 = [1 - P(X <=2)]^4 =[1-(1 -e^(-2))]^4 = e^(-8).

Which is correct? :confused-:

I thought it was yours... but I misread it. Manual. Sorry.

Yours,
Krzys'

cactus smash
August 9th 2007, 07:25 AM
Isn't e^-8 the probability that the minimum is greater than 2?

sineintegral
August 9th 2007, 08:14 PM
Thanks for the check. As for the question whether the manual is saying what I claim it to be saying, anybody who has access to the manual can check page 34 of the practice exams of Averbach & Mehta.

ctperng
August 9th 2007, 08:34 PM
Thanks for the check. As for the question whether the manual is saying what I claim it to be saying, anybody who has access to the manual can check page 34 of the practice exams of Averbach & Mehta.

Another way to solve is to use the order statistic density function.

For the highest order, Y4, it has a density function
4*F(x)^(4-1)*f(x) = 4*(1-e^(-x))^3 *e^(-x),

therefore P(Y4 > 2) = int(2 to infinity) of 4*(1-e^(-x))^3 *e^(-x) dx

ctperng

Please check if there are further typos.

sineintegral
August 10th 2007, 09:16 AM
It's common for people to make mistake.

As far as the question is concerned, it cannot be solved like that. A reasonable way to solve is to use the order statistic density function.

For the highest order, Y4, it has a density function
4*F(x)^(4-1)*f(x) = 4*(1-e^(-x))^3 *e^(-x),

therefore P(Y4 > 2) = int(2 to infinity) of 4*(1-e^(-x))^3 *e^(-x) dx

ctperng

Please check if there are further typos.

Thanks, but you are mistaken. If you check the integration of the pdf for the max you have written out you will see that it is equivalent to my original answer. Using 1-[F(x)]^(n) to solve P(max > x) is a shortcut.:)

ctperng
August 10th 2007, 12:32 PM
Thanks, but you are mistaken. If you check the integration of the pdf for the max you have written out you will see that it is equivalent to my original answer. Using 1-[F(x)]^(n) to solve P(max > x) is a shortcut.:)

I didn't understand what you meant by what I have been mistaken. My original comment was not on your solutions. The formula I provided is actually correct, but I didn't bother to check the number. Now I see what you mean by working out the numbers: it turned out your appoach is correct. I will post a clarification again.

ctperng

Anu Dhanuka
August 10th 2007, 11:01 PM
where are the brackets? i mean is it this [1-F(x)]^n or 1-[F(x)]^n....

.Godspeed.
August 11th 2007, 02:20 AM
where are the brackets? i mean is it this [1-F(x)]^n or 1-[F(x)]^n....
For a continuous random variable X:

1) The CDF of the maximum of n random variables is F(x)^n; thus, the survival function of the maximum of n random variables is 1-F(x)^n.
2) The survival function of the minimum of n random variables is S(x)^n; thus, the CDF of the minimum of n random variables is 1-S(x)^n = 1-(1-F(x))^n.

Be clear on whether you are dealing with the maximum or minimum of a set of random variables, as their distributions are very different.

ctperng
August 11th 2007, 03:28 AM
Hi,
I have a question about order statistics and I hope this is a good as place as any to ask this:

Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample from the pdf e^(-x), x > 0. What is the P(Y4 > 2)?

My solution:
P(Y4 > 2) = 1 - P(Y4 <= 2) = 1 - [F(2)] ^4 = 1 - [1- e^(-2)] ^4.

Manual's solution:
P(Y4 > 2) = P(X > 2) ^4 = [1 - P(X <=2)]^4 =[1-(1 -e^(-2))]^4 = e^(-8).

Which is correct? :confused-:

Following my previous post by working out the pdf for order statitics, your solution is correct, but the manual solution is wrong.

Here is an intuitive interpretation: Let X1, X2, X3, X4 be 4 independent identically distributed random variables (here we don't attach any order).

The statement of highest order of X1,...,X4 is greater than 2 is equivalent to the negation of all four are less or equal to 2.

By writing out the statement in terms of mathematical symbols this means
P(Y4 > 2) = 1 - P(X1 < 2, X2 < 2, X3 < 2, X4 < 2)
= 1 - P(X1 < 2)*P(X2 < 2)*P(X3 < 2)*P(X4 < 2) (using independence of Xi)
= 1 - F(2)^4 (this is your result)

I don't understand the manual solution, and that's why I mentioned that it's not uncommon for people to make mistakes, as I sometimes do when trying to understand what other people are saying. :)

By the way, as a note, and hopefully this is not erroneous:

Analogously P(Y1 < 2) can be computed as

P(Y1 <= 2) = 1 - (1 - F(2))^4

Logically, this means that the least order statistic is <= 2 is equivalent to the negation of the fact that all the four random variables X1, X2, X3, X4(without ordering) are each greater than 2.

Isn't this obvious? You mentioned that this is shortcut. In fact this is how we use it to write down the corresponding highest or lowest order statistic pdfs. For example if we know P(Y1 <= y) = 1 - (1 - F(y))^4, then by taking derivative, we get the pdf for Y1, which is

4(1 -F(y))^3*f(y)

ctperng

ctperng
August 11th 2007, 03:37 AM
There is no message here.

ctperng
August 11th 2007, 03:41 AM
For a continuous random variable X:

1) The CDF of the maximum of n random variables is F(x)^n; thus, the survival function of the maximum of n random variables is 1-F(x)^n.
2) The survival function of the minimum of n random variables is S(x)^n; thus, the CDF of the minimum of n random variables is 1-S(x)^n = 1-(1-F(x))^n.

Be clear on whether you are dealing with the maximum or minimum of a set of random variables, as their distributions are very different.

Thank you for the clarification. My initial formula was correct, but I didn't compute the number. And I misunderstood what sineintegral was referring to. So just to emphasize (correct me if I am wrong):

P(Y4 > y) = 1 - F(y)^4

ctperng

.Godspeed.
August 11th 2007, 03:42 PM
So just to emphasize (correct me if I am wrong):

P(Y4 > y) = 1 - F(y)^4

ctperng
That is correct.

sineintegral
August 12th 2007, 12:54 AM
Following my previous post by working out the pdf for order statitics, your solution is correct, but the manual solution is wrong.

Here is an intuitive interpretation: Let X1, X2, X3, X4 be 4 independent identically distributed random variables (here we don't attach any order).

The statement of highest order of X1,...,X4 is greater than 2 is equivalent to the negation of all four are less or equal to 2.

By writing out the statement in terms of mathematical symbols this means
P(Y4 > 2) = 1 - P(X1 < 2, X2 < 2, X3 < 2, X4 < 2)
= 1 - P(X1 < 2)*P(X2 < 2)*P(X3 < 2)*P(X4 < 2) (using independence of Xi)
= 1 - F(2)^4 (this is your result)

I don't understand the manual solution, and that's why I mentioned that it's not uncommon for people to make mistakes, as I sometimes do when trying to understand what other people are saying. :)

By the way, as a note, and hopefully this is not erroneous:

Analogously P(Y1 < 2) can be computed as

P(Y1 <= 2) = 1 - (1 - F(2))^4

Logically, this means that the least order statistic is <= 2 is equivalent to the negation of the fact that all the four random variables X1, X2, X3, X4(without ordering) are each greater than 2.

Isn't this obvious? You mentioned that this is shortcut. In fact this is how we use it to write down the corresponding highest or lowest order statistic pdfs. For example if we know P(Y1 <= y) = 1 - (1 - F(y))^4, then by taking derivative, we get the pdf for Y1, which is

4(1 -F(y))^3*f(y)

ctperng

Yes, that is the way I understood it as well. The only reason I mentioned it as a shortcut was because earlier you had posted that one could only reasonably solve the problem by integrating the pdf (a perfectly valid method), and I assumed that you were not familiar with the faster formula.

I was fairly confident I was right about the answer, but I am hesitant in imputing error to the manual. Order statistics were barely touched upon in my classes, and it is daunting when your sole reference contradicts itself many pages later. Anyway, thanks to those who offered genuine clarification.:smiloe:

Jo_M.
August 12th 2007, 06:36 PM
Could someone psot their solution to this question?:

A friend of ours takes the bus five days per week to her job. The five waiting times until she can board the bus are a random sample from a uniform distribution on the interval from 0 to 10 min.

Suppose you learn that the smallest of the five waiting times is 4 min. What is the conditional density function of the largest waiting time?

My solution:

g(Y5 l y1 = 4 ) = g(y1,y5) / g(y1 = 4)

g(y1,y5) = (5! / (0!*3!*0!)) * (y1/10)^0 * (y2/10 - y1/10)^3 * (1-y2/10)^0 * 1/10 * 1/10

g(y1,y5) = 0.2 * (y2/10 - y1/10)^3

g(y1) = (5! / (0! * 4!)) * (y/10)^0 * (1 - y/10)^4 * 1/10

g(y1=4) = 0.5 * 0.6^4

g(y5 l y1=4) = (0.2 * (y5/10 - 4/10)^3) / (0.5 * 0.6^4)

The correct answer: 2/3 * [(y5-4)/6]^3

Where have I gone wrong?

(source: Devore & Berk, Modern Mathematical Statistics with Applications , 2007, chapter 5#67)

ctperng
August 12th 2007, 08:39 PM
Could someone psot their solution to this question?:

A friend of ours takes the bus five days per week to her job. The five waiting times until she can board the bus are a random sample from a uniform distribution on the interval from 0 to 10 min.

Suppose you learn that the smallest of the five waiting times is 4 min. What is the conditional density function of the largest waiting time?

My solution:

g(Y5 l y1 = 4 ) = g(y1,y5) / g(y1 = 4)

g(y1,y5) = (5! / (0!*3!*0!)) * (y1/10)^0 * (y2/10 - y1/10)^3 * (1-y2/10)^0 * 1/10 * 1/10

g(y1,y5) = 0.2 * (y2/10 - y1/10)^3

g(y1) = (5! / (0! * 4!)) * (y/10)^0 * (1 - y/10)^4 * 1/10

g(y1=4) = 0.5 * 0.6^4

g(y5 l y1=4) = (0.2 * (y5/10 - 4/10)^3) / (0.5 * 0.6^4)

The correct answer: 2/3 * [(y5-4)/6]^3

Where have I gone wrong?

(source: Devore & Berk, Modern Mathematical Statistics with Applications , 2007, chapter 5#67)

The random variables y1 and y5 are clearly dependent, but you are treating them like they are independent.

It turns out the solution to this is very short: Given the condition on the smallest, now you have only 4 to compare, each of which has a new distribution which is uniform on [4,10], i.e. with conditional pdf = 1/6, therefore

Under the given condition, the largest of the 4 (and hence 5) has pdf of the form 4*[(x-4)/6]^3*(1/6) = (2/3)*[(x-4)/6]^3

ctperng

Jo_M.
August 12th 2007, 10:18 PM
Thank you for your help ctperng!

August 13th 2007, 06:27 PM
Thank you for your help ctperng!

since you get it, can you elaborate on the steps? and what formula(s) you used

Anu Dhanuka
August 13th 2007, 08:09 PM
m eager too to knw the steps...
do anyone knws what is the percent weightage of questions related to order statistics in exam...??

.Godspeed.
August 13th 2007, 08:30 PM
Again, the CDF of the maximum of n random variables is F(x)^n; thus, the pdf of the maximum of n random variables is d/dx[F(x)^n] = nF(x)^(n-1)*f(x).

Like ctperng stated, conditional on the minimum waiting time being 4, our new f(x) for the four remaining random variables is no longer Uniform(0,10); it is Uniform(4,10) with pdf = 1/6 and CDF = (x-4)/6.

Thus, the CDF of the maximum of the four remaining random variables is ((x-4)/6)^4. The corresponding pdf is 4 *((x-4)/6)^3 * 1/6. This reduces to ctperng's final answer of 2/3 * ((x-4)/6)^3.

ctperng
August 13th 2007, 10:36 PM
...
do anyone knws what is the percent weightage of questions related to order statistics in exam...??

I remember they asked the second order statistic when I was taking the exam. (I suppose they just ask one question then.)

ctperng

Anu Dhanuka
August 14th 2007, 01:11 AM
I remember they asked the second order statistic when I was taking the exam. (I suppose they just ask one question then.)

ctperng

it might sound stupid asking this at this point of time, but what is second order statistics...is it joint order statistics or something else...???:wacko:

Jo_M.
August 14th 2007, 07:58 AM
If you take a sample of x1,x2,x3,..,xn (This can be, for example, the numbers 1, 5, 4, 6...) and order them to get y1,y2,y3...yn (1,.. 4, ..5,.. 6, ...), then the pdf of the second order statistic (y2) is found using the following equation:

g(yi) = n! / ((i-1)! (n-i)!) * [F(y)]^(i-1) * [1-F(y)]^(n-i) * f(y)

where f(x) is the pdf of any xi and F(x) is the corresponding cdf.

I hope this helps

Anu Dhanuka
August 14th 2007, 12:18 PM
seriously i was soo stupid to ask this....as exams are coming nearer, it seems my mind is getting '''''ed more...

Jo_M.
August 14th 2007, 12:32 PM
seriously i was soo stupid to ask this....as exams are coming nearer, it seems my mind is getting '''''ed more...

I guess its the stress level getting higher! Just relax a bit and give your brain some time to become imbued with all the concepts.

(I say that, but Im sure I'll be freaking out when my turn will come in November! :wacko: )

Jo_M.
August 14th 2007, 11:41 PM
Here's a problem I found very interesting. It took me some time to figure out.

It involves both some transformations and order stats:

Let Y1 and Yn be the smallest and largest order statistics, respectively, from a random sample of size n.
Let W2 = Yn-Y1 (this is the sample range).

a) Let W1 = Y1, obtain the joint pdf of the W_i 's and then derive an expression involving an integral for the pdf of the sample range.

ans: f_w2 (w2) = n(n-1) * int[-inf,inf] ( [F(w1 + w2) - F(w1)]^(n-2) * f(w1) * f(w1 + w2)) dw1

(Source: Devore & Berk, Modern Mathematical Statistics with Applications, 2007, chapter 5, #77)

Anu Dhanuka
August 15th 2007, 10:11 AM
Here's a problem I found very interesting. It took me some time to figure out.

It involves both some transformations and order stats:

Let Y1 and Yn be the smallest and largest order statistics, respectively, from a random sample of size n.
Let W2 = Yn-Y1 (this is the sample range).

a) Let W1 = Y1, obtain the joint pdf of the W_i 's and then derive an expression involving an integral for the pdf of the sample range.

ans: f_w2 (w2) = n(n-1) * int[-inf,inf] ( [F(w1 + w2) - F(w1)]^(n-2) * f(w1) * f(w1 + w2)) dw1

No doubts, problem is interesting...
can u please solve it for me?
the way i followed is as under:
W2=Yn-Y1
W1=Y1
g(W2,W1)=f(Y1,Yn) * lJl
lJl i obtained is 1.
hw come u get this answer, may be i didnt got the problem properly...

Jo_M.
August 15th 2007, 11:07 AM
First, determine the joint pdf of the y_i's:

f(y1,yn) = (n! / [(1-1)! * (n-1-1)! * (n-n)!] ) * [F(y1)]^0 * [F(yn) - F(y1)]^(n-2) * [1-F(yn)]^0 * f(y1) * f(yn)

f(y1,yn) = n*(n-1) * [F(yn) - F(y1)]^(n-2) * f(y1) * f(yn)

Now, we have that W1 = Y1 and W2 = Yn-Y1

So Yn = W1 + W2 and Y1 = W1

Then, we have the formula: g(w1,w2) = f(y1,yn) * lJl

This is equivalent to g(w1,w2) = f(w1,w1+w2) * 1

So: g(w1,w2) = n*(n-1) * [F(w1+w2) - F(w1)]^(n-2) * f(w1) * f(w1+w2) *1

Finally, since we are looking for the pdf of the sample range (i.e. w2), you need to integrate with respect to w1 over its domain (-infinity to infinity)

This gives the answer: f_w2 (w2) = n(n-1) * int[-inf,inf] ( [F(w1 + w2) - F(w1)]^(n-2) * f(w1) * f(w1 + w2)) dw1

Anu Dhanuka
August 15th 2007, 12:29 PM
thnks Jo, i made a silly mistake...my joint pdf was wrong...