Are there 99 percentiles, or 100 percentiles? And are they groups of numbers, or dividers or pointers to individual numbers?“-iles” terminology for the top half a percentWhat are some examples of reversed usage of “percentiles”?(very basic) One-sample test for binary dataIs calculating a percentile the same as evaluating a cumulative density function?How do you classify based on percentile ranking when most scores are the same?Removing outliers and calculating percentiles with highly variable dataDoes it take progressively fewer EXTRA correct answers to move up a grading curve in standardised exams?Does there exist terminology for percentiles/ranges sorted by question difficulty?How do I aggregate percentiles from previously aggregated percentiles?
2 Guards, 3 Keys, 2 Locks
Usefulness of Nash embedding theorem
How do I reset the TSA-unlocked indicator on my lock?
Showing a limit approaches e: base of natural log
How to prove that invoices are really UNPAID?
Is it possible to have 2 ports open on SSH with 2 different authentication schemes?
In "Avatar: The Last Airbender" can a metalbender bloodbend if there is metal in our blood?
Does my protagonist need to be the most important character?
What is the "Applicable country" field on the Icelandair check-in form?
When did 5 foot squares become standard in D&D?
What do you call a document which has no content?
Can digital computers understand infinity?
Why do adjectives come before nouns in English?
How to create a vimrc macro using :sort?
Slaad Chaos Phage: Weak Combat Ability?
3-prong to 4-prong conversion - EXTRA MISLABELLED WIRES - Dryer cable upgrade and installation
A Problem of Succession
How can I learn to write better questions to test for conceptual understanding?
Is there a push, in the United States, to use gender-neutral language and gender pronouns (when they are given)?
Extra battery in the bay of an HDD
Neither Raman nor IR Active vibrational modes
What does this text mean with capitalized letters?
Do you say "good game" after a game in which your opponent played poorly?
Why is technology bad for children?
Are there 99 percentiles, or 100 percentiles? And are they groups of numbers, or dividers or pointers to individual numbers?
“-iles” terminology for the top half a percentWhat are some examples of reversed usage of “percentiles”?(very basic) One-sample test for binary dataIs calculating a percentile the same as evaluating a cumulative density function?How do you classify based on percentile ranking when most scores are the same?Removing outliers and calculating percentiles with highly variable dataDoes it take progressively fewer EXTRA correct answers to move up a grading curve in standardised exams?Does there exist terminology for percentiles/ranges sorted by question difficulty?How do I aggregate percentiles from previously aggregated percentiles?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
$begingroup$
Are there 99 percentiles, or 100 percentiles? And are they groups of numbers, or divider lines, or pointers to individual numbers?
I suppose the same question would apply for quartiles or any quantile.
I have read that the index of a number at a particular percentile(p), given n items, is i = (p / 100) * n
That suggests to me that there are 100 percentiles.. because supposing you have 100 numbers(i=1 to i=100), then each would have an index(1 to 100).
If you had 200 numbers, there'd be 100 percentiles, but would each refer to a group of two numbers. Or 100 dividers excluding either the far left or far right divider 'cos otherwise you'd get 101 dividers. Or pointers to individual numbers so the first percentile would refer to the second number, (1/100)*200=2 And the hundredth percentile would refer to the 200th number (100/100)*200=200
I have sometimes heard of there being 99 percentiles though..
Google shows the oxford dictionary that says of percentile- "each of the 100 equal groups into which a population can be divided according to the distribution of values of a particular variable." and "each of the 99 intermediate values of a random variable which divide a frequency distribution into 100 such groups."
Wikipedia says "the 20th percentile is the value below which 20% of the observations may be found" But does it actually mean "the value below or equal to which, 20% of the observations may be found" i.e. "the value for which 20% of the values are <= to it". If it were just < and not <=, then By that reasoning, the 100th percentile would be the value below which 100% of the values may be found. I have heard that as an argument that there can be no 100th percentile, because you can't have a number where there are 100% of the numbers below it. But I think maybe that argument that you can't have a 100th percentile is incorrect and is based an error that the definition of a percentile involves <= not <. (or >= not >). So the hundredth percentile would be the final number and would be >= 100% of the numbers.
quantiles
New contributor
$endgroup$
|
show 2 more comments
$begingroup$
Are there 99 percentiles, or 100 percentiles? And are they groups of numbers, or divider lines, or pointers to individual numbers?
I suppose the same question would apply for quartiles or any quantile.
I have read that the index of a number at a particular percentile(p), given n items, is i = (p / 100) * n
That suggests to me that there are 100 percentiles.. because supposing you have 100 numbers(i=1 to i=100), then each would have an index(1 to 100).
If you had 200 numbers, there'd be 100 percentiles, but would each refer to a group of two numbers. Or 100 dividers excluding either the far left or far right divider 'cos otherwise you'd get 101 dividers. Or pointers to individual numbers so the first percentile would refer to the second number, (1/100)*200=2 And the hundredth percentile would refer to the 200th number (100/100)*200=200
I have sometimes heard of there being 99 percentiles though..
Google shows the oxford dictionary that says of percentile- "each of the 100 equal groups into which a population can be divided according to the distribution of values of a particular variable." and "each of the 99 intermediate values of a random variable which divide a frequency distribution into 100 such groups."
Wikipedia says "the 20th percentile is the value below which 20% of the observations may be found" But does it actually mean "the value below or equal to which, 20% of the observations may be found" i.e. "the value for which 20% of the values are <= to it". If it were just < and not <=, then By that reasoning, the 100th percentile would be the value below which 100% of the values may be found. I have heard that as an argument that there can be no 100th percentile, because you can't have a number where there are 100% of the numbers below it. But I think maybe that argument that you can't have a 100th percentile is incorrect and is based an error that the definition of a percentile involves <= not <. (or >= not >). So the hundredth percentile would be the final number and would be >= 100% of the numbers.
quantiles
New contributor
$endgroup$
1
$begingroup$
I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes. Cases can be made for either 99 (as in the definition you quote) or 101.
$endgroup$
– whuber♦
8 hours ago
2
$begingroup$
Historically quantiles — as we now say generically — were first summary points, and then by extension the bins, classes or intervals they delimit. So three quartiles, including the median, define four bins, and so forth.
$endgroup$
– Nick Cox
7 hours ago
$begingroup$
@NickCox do you have a source for that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes." <-- can you elaborate on that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "Cases can be made for either 99 (as in the definition you quote) or 101" <-- though percent means per 100, so how can you have 101? And if 101 would you number them 1st 2nd ... 101st, or 0th 1st 100th? 0th seems problematic because the th/st is for counting and counting is from 1. Even in computer science, you index from 0 but still counting is 0=no items, from 1 for the first item!
$endgroup$
– barlop
5 hours ago
|
show 2 more comments
$begingroup$
Are there 99 percentiles, or 100 percentiles? And are they groups of numbers, or divider lines, or pointers to individual numbers?
I suppose the same question would apply for quartiles or any quantile.
I have read that the index of a number at a particular percentile(p), given n items, is i = (p / 100) * n
That suggests to me that there are 100 percentiles.. because supposing you have 100 numbers(i=1 to i=100), then each would have an index(1 to 100).
If you had 200 numbers, there'd be 100 percentiles, but would each refer to a group of two numbers. Or 100 dividers excluding either the far left or far right divider 'cos otherwise you'd get 101 dividers. Or pointers to individual numbers so the first percentile would refer to the second number, (1/100)*200=2 And the hundredth percentile would refer to the 200th number (100/100)*200=200
I have sometimes heard of there being 99 percentiles though..
Google shows the oxford dictionary that says of percentile- "each of the 100 equal groups into which a population can be divided according to the distribution of values of a particular variable." and "each of the 99 intermediate values of a random variable which divide a frequency distribution into 100 such groups."
Wikipedia says "the 20th percentile is the value below which 20% of the observations may be found" But does it actually mean "the value below or equal to which, 20% of the observations may be found" i.e. "the value for which 20% of the values are <= to it". If it were just < and not <=, then By that reasoning, the 100th percentile would be the value below which 100% of the values may be found. I have heard that as an argument that there can be no 100th percentile, because you can't have a number where there are 100% of the numbers below it. But I think maybe that argument that you can't have a 100th percentile is incorrect and is based an error that the definition of a percentile involves <= not <. (or >= not >). So the hundredth percentile would be the final number and would be >= 100% of the numbers.
quantiles
New contributor
$endgroup$
Are there 99 percentiles, or 100 percentiles? And are they groups of numbers, or divider lines, or pointers to individual numbers?
I suppose the same question would apply for quartiles or any quantile.
I have read that the index of a number at a particular percentile(p), given n items, is i = (p / 100) * n
That suggests to me that there are 100 percentiles.. because supposing you have 100 numbers(i=1 to i=100), then each would have an index(1 to 100).
If you had 200 numbers, there'd be 100 percentiles, but would each refer to a group of two numbers. Or 100 dividers excluding either the far left or far right divider 'cos otherwise you'd get 101 dividers. Or pointers to individual numbers so the first percentile would refer to the second number, (1/100)*200=2 And the hundredth percentile would refer to the 200th number (100/100)*200=200
I have sometimes heard of there being 99 percentiles though..
Google shows the oxford dictionary that says of percentile- "each of the 100 equal groups into which a population can be divided according to the distribution of values of a particular variable." and "each of the 99 intermediate values of a random variable which divide a frequency distribution into 100 such groups."
Wikipedia says "the 20th percentile is the value below which 20% of the observations may be found" But does it actually mean "the value below or equal to which, 20% of the observations may be found" i.e. "the value for which 20% of the values are <= to it". If it were just < and not <=, then By that reasoning, the 100th percentile would be the value below which 100% of the values may be found. I have heard that as an argument that there can be no 100th percentile, because you can't have a number where there are 100% of the numbers below it. But I think maybe that argument that you can't have a 100th percentile is incorrect and is based an error that the definition of a percentile involves <= not <. (or >= not >). So the hundredth percentile would be the final number and would be >= 100% of the numbers.
quantiles
quantiles
New contributor
New contributor
New contributor
asked 8 hours ago
barlopbarlop
1261 bronze badge
1261 bronze badge
New contributor
New contributor
1
$begingroup$
I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes. Cases can be made for either 99 (as in the definition you quote) or 101.
$endgroup$
– whuber♦
8 hours ago
2
$begingroup$
Historically quantiles — as we now say generically — were first summary points, and then by extension the bins, classes or intervals they delimit. So three quartiles, including the median, define four bins, and so forth.
$endgroup$
– Nick Cox
7 hours ago
$begingroup$
@NickCox do you have a source for that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes." <-- can you elaborate on that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "Cases can be made for either 99 (as in the definition you quote) or 101" <-- though percent means per 100, so how can you have 101? And if 101 would you number them 1st 2nd ... 101st, or 0th 1st 100th? 0th seems problematic because the th/st is for counting and counting is from 1. Even in computer science, you index from 0 but still counting is 0=no items, from 1 for the first item!
$endgroup$
– barlop
5 hours ago
|
show 2 more comments
1
$begingroup$
I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes. Cases can be made for either 99 (as in the definition you quote) or 101.
$endgroup$
– whuber♦
8 hours ago
2
$begingroup$
Historically quantiles — as we now say generically — were first summary points, and then by extension the bins, classes or intervals they delimit. So three quartiles, including the median, define four bins, and so forth.
$endgroup$
– Nick Cox
7 hours ago
$begingroup$
@NickCox do you have a source for that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes." <-- can you elaborate on that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "Cases can be made for either 99 (as in the definition you quote) or 101" <-- though percent means per 100, so how can you have 101? And if 101 would you number them 1st 2nd ... 101st, or 0th 1st 100th? 0th seems problematic because the th/st is for counting and counting is from 1. Even in computer science, you index from 0 but still counting is 0=no items, from 1 for the first item!
$endgroup$
– barlop
5 hours ago
1
1
$begingroup$
I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes. Cases can be made for either 99 (as in the definition you quote) or 101.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes. Cases can be made for either 99 (as in the definition you quote) or 101.
$endgroup$
– whuber♦
8 hours ago
2
2
$begingroup$
Historically quantiles — as we now say generically — were first summary points, and then by extension the bins, classes or intervals they delimit. So three quartiles, including the median, define four bins, and so forth.
$endgroup$
– Nick Cox
7 hours ago
$begingroup$
Historically quantiles — as we now say generically — were first summary points, and then by extension the bins, classes or intervals they delimit. So three quartiles, including the median, define four bins, and so forth.
$endgroup$
– Nick Cox
7 hours ago
$begingroup$
@NickCox do you have a source for that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@NickCox do you have a source for that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes." <-- can you elaborate on that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes." <-- can you elaborate on that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "Cases can be made for either 99 (as in the definition you quote) or 101" <-- though percent means per 100, so how can you have 101? And if 101 would you number them 1st 2nd ... 101st, or 0th 1st 100th? 0th seems problematic because the th/st is for counting and counting is from 1. Even in computer science, you index from 0 but still counting is 0=no items, from 1 for the first item!
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "Cases can be made for either 99 (as in the definition you quote) or 101" <-- though percent means per 100, so how can you have 101? And if 101 would you number them 1st 2nd ... 101st, or 0th 1st 100th? 0th seems problematic because the th/st is for counting and counting is from 1. Even in computer science, you index from 0 but still counting is 0=no items, from 1 for the first item!
$endgroup$
– barlop
5 hours ago
|
show 2 more comments
2 Answers
2
active
oldest
votes
$begingroup$
One nice way to treat this is to start with simple math and work backwards to the more complicated case of real data. Let's start with PDF's, CDF's, and inverse CDF's (also known as quantile functions). The $x$th quantile of a distribution with pdf $f$ and cdf $F$ is $F^-1(x)$. Suppose the $z$th percentile is $F^-1(z/100)$. For a uniform 0,1 distribution, the 100th and 0th percentiles are ill-defined since $F^-1$ is only unique when $F$ is not constant. For a normal distribution, they do not exist (or they "are" $pm infty$).
For continuous distributions, non-extreme quantiles exist and are unique. For a discrete distribution such as the Poisson distribution, most percentiles don't exist because for most $z/100$, there is no $y$ with $F(y) = z/100$.
When it comes to real data, all distributions are discrete. (The empirical CDF of runif(100)
or np.random.random(100)
has 100 increments clustered around 0.5.) We still want to have a useful concept of quantiles and percentiles. So, we can define a quantile as any consistent estimator of the corresponding theoretical quantile. For example, the median (the 50th percentile or 0.5 quantile) of the sample 3,4, 5, 6, 7, 8 can be any number between 5 and 6. If you draw 2n samples from a unif(3,8) distribution and take any number between the nth and (n+1)th sample, you will converge on 5.5 as n increases.
(In practice, I would always compute the median of 3, 4, 5, 6, 7, 8 as 5.5 because that particular median is also a trimmed mean, which means it has other good properties as an estimator.)
$endgroup$
1
$begingroup$
Your first paragraph has some incorrect information: $F^-1$ is indeed unique in many cases, including for the uniform distribution on $[0,1]$ (when $F$ is restricted to $[0,1]$ itself). This has little to do with $F$ being "constant." I think you are making misleading arguments that mix up the roles of continuity, invertibility, and boundedness of support of distributions. Introducing estimators and referring to them also as "quantiles" is interesting but threatens to make things even more confusing.
$endgroup$
– whuber♦
7 hours ago
add a comment
|
$begingroup$
I was taught that an observation in the nth percentile was greater than n% of observations in the dataset under consideration. Which to me implies that there is no 0th or 100th percentile. No observation can be greater than 100% of observations because it forms part of that 100% (and a similar logic applies in the case of 0).
But I unfortunately have no source for this that I can point you to.
$endgroup$
2
$begingroup$
Do you have an authoritative reference for what you remember being taught? Note that you are implicitly adopting a definition of "percentile" as being a group of numbers. The other definition quoted in the question is that the percentile is a boundary between such groups.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
@whuber Unfortunately not. And yes, I see the distinction.
$endgroup$
– mkt
8 hours ago
$begingroup$
That doesn't make sense to me because suppose your data is 2,2,2,2,2,2,2,2,2,2,2 so an item in one quantile is equal to an item to its left in a prior quantile. So an item in the nth quantile is not greater than all quantiles left of it. So an item in the nth percentile is not greater than n% of observations in the dataset. It's >= n% of observations in the dataset, but not simply >. And hence you can have a 100th pecentile.. what do you make of that logic?
$endgroup$
– barlop
5 hours ago
$begingroup$
Many definitions come under strain if all values are identical!
$endgroup$
– Nick Cox
3 hours ago
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
barlop is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f430391%2fare-there-99-percentiles-or-100-percentiles-and-are-they-groups-of-numbers-or%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
One nice way to treat this is to start with simple math and work backwards to the more complicated case of real data. Let's start with PDF's, CDF's, and inverse CDF's (also known as quantile functions). The $x$th quantile of a distribution with pdf $f$ and cdf $F$ is $F^-1(x)$. Suppose the $z$th percentile is $F^-1(z/100)$. For a uniform 0,1 distribution, the 100th and 0th percentiles are ill-defined since $F^-1$ is only unique when $F$ is not constant. For a normal distribution, they do not exist (or they "are" $pm infty$).
For continuous distributions, non-extreme quantiles exist and are unique. For a discrete distribution such as the Poisson distribution, most percentiles don't exist because for most $z/100$, there is no $y$ with $F(y) = z/100$.
When it comes to real data, all distributions are discrete. (The empirical CDF of runif(100)
or np.random.random(100)
has 100 increments clustered around 0.5.) We still want to have a useful concept of quantiles and percentiles. So, we can define a quantile as any consistent estimator of the corresponding theoretical quantile. For example, the median (the 50th percentile or 0.5 quantile) of the sample 3,4, 5, 6, 7, 8 can be any number between 5 and 6. If you draw 2n samples from a unif(3,8) distribution and take any number between the nth and (n+1)th sample, you will converge on 5.5 as n increases.
(In practice, I would always compute the median of 3, 4, 5, 6, 7, 8 as 5.5 because that particular median is also a trimmed mean, which means it has other good properties as an estimator.)
$endgroup$
1
$begingroup$
Your first paragraph has some incorrect information: $F^-1$ is indeed unique in many cases, including for the uniform distribution on $[0,1]$ (when $F$ is restricted to $[0,1]$ itself). This has little to do with $F$ being "constant." I think you are making misleading arguments that mix up the roles of continuity, invertibility, and boundedness of support of distributions. Introducing estimators and referring to them also as "quantiles" is interesting but threatens to make things even more confusing.
$endgroup$
– whuber♦
7 hours ago
add a comment
|
$begingroup$
One nice way to treat this is to start with simple math and work backwards to the more complicated case of real data. Let's start with PDF's, CDF's, and inverse CDF's (also known as quantile functions). The $x$th quantile of a distribution with pdf $f$ and cdf $F$ is $F^-1(x)$. Suppose the $z$th percentile is $F^-1(z/100)$. For a uniform 0,1 distribution, the 100th and 0th percentiles are ill-defined since $F^-1$ is only unique when $F$ is not constant. For a normal distribution, they do not exist (or they "are" $pm infty$).
For continuous distributions, non-extreme quantiles exist and are unique. For a discrete distribution such as the Poisson distribution, most percentiles don't exist because for most $z/100$, there is no $y$ with $F(y) = z/100$.
When it comes to real data, all distributions are discrete. (The empirical CDF of runif(100)
or np.random.random(100)
has 100 increments clustered around 0.5.) We still want to have a useful concept of quantiles and percentiles. So, we can define a quantile as any consistent estimator of the corresponding theoretical quantile. For example, the median (the 50th percentile or 0.5 quantile) of the sample 3,4, 5, 6, 7, 8 can be any number between 5 and 6. If you draw 2n samples from a unif(3,8) distribution and take any number between the nth and (n+1)th sample, you will converge on 5.5 as n increases.
(In practice, I would always compute the median of 3, 4, 5, 6, 7, 8 as 5.5 because that particular median is also a trimmed mean, which means it has other good properties as an estimator.)
$endgroup$
1
$begingroup$
Your first paragraph has some incorrect information: $F^-1$ is indeed unique in many cases, including for the uniform distribution on $[0,1]$ (when $F$ is restricted to $[0,1]$ itself). This has little to do with $F$ being "constant." I think you are making misleading arguments that mix up the roles of continuity, invertibility, and boundedness of support of distributions. Introducing estimators and referring to them also as "quantiles" is interesting but threatens to make things even more confusing.
$endgroup$
– whuber♦
7 hours ago
add a comment
|
$begingroup$
One nice way to treat this is to start with simple math and work backwards to the more complicated case of real data. Let's start with PDF's, CDF's, and inverse CDF's (also known as quantile functions). The $x$th quantile of a distribution with pdf $f$ and cdf $F$ is $F^-1(x)$. Suppose the $z$th percentile is $F^-1(z/100)$. For a uniform 0,1 distribution, the 100th and 0th percentiles are ill-defined since $F^-1$ is only unique when $F$ is not constant. For a normal distribution, they do not exist (or they "are" $pm infty$).
For continuous distributions, non-extreme quantiles exist and are unique. For a discrete distribution such as the Poisson distribution, most percentiles don't exist because for most $z/100$, there is no $y$ with $F(y) = z/100$.
When it comes to real data, all distributions are discrete. (The empirical CDF of runif(100)
or np.random.random(100)
has 100 increments clustered around 0.5.) We still want to have a useful concept of quantiles and percentiles. So, we can define a quantile as any consistent estimator of the corresponding theoretical quantile. For example, the median (the 50th percentile or 0.5 quantile) of the sample 3,4, 5, 6, 7, 8 can be any number between 5 and 6. If you draw 2n samples from a unif(3,8) distribution and take any number between the nth and (n+1)th sample, you will converge on 5.5 as n increases.
(In practice, I would always compute the median of 3, 4, 5, 6, 7, 8 as 5.5 because that particular median is also a trimmed mean, which means it has other good properties as an estimator.)
$endgroup$
One nice way to treat this is to start with simple math and work backwards to the more complicated case of real data. Let's start with PDF's, CDF's, and inverse CDF's (also known as quantile functions). The $x$th quantile of a distribution with pdf $f$ and cdf $F$ is $F^-1(x)$. Suppose the $z$th percentile is $F^-1(z/100)$. For a uniform 0,1 distribution, the 100th and 0th percentiles are ill-defined since $F^-1$ is only unique when $F$ is not constant. For a normal distribution, they do not exist (or they "are" $pm infty$).
For continuous distributions, non-extreme quantiles exist and are unique. For a discrete distribution such as the Poisson distribution, most percentiles don't exist because for most $z/100$, there is no $y$ with $F(y) = z/100$.
When it comes to real data, all distributions are discrete. (The empirical CDF of runif(100)
or np.random.random(100)
has 100 increments clustered around 0.5.) We still want to have a useful concept of quantiles and percentiles. So, we can define a quantile as any consistent estimator of the corresponding theoretical quantile. For example, the median (the 50th percentile or 0.5 quantile) of the sample 3,4, 5, 6, 7, 8 can be any number between 5 and 6. If you draw 2n samples from a unif(3,8) distribution and take any number between the nth and (n+1)th sample, you will converge on 5.5 as n increases.
(In practice, I would always compute the median of 3, 4, 5, 6, 7, 8 as 5.5 because that particular median is also a trimmed mean, which means it has other good properties as an estimator.)
edited 7 hours ago
answered 7 hours ago
eric_kernfelderic_kernfeld
3,4531 gold badge10 silver badges32 bronze badges
3,4531 gold badge10 silver badges32 bronze badges
1
$begingroup$
Your first paragraph has some incorrect information: $F^-1$ is indeed unique in many cases, including for the uniform distribution on $[0,1]$ (when $F$ is restricted to $[0,1]$ itself). This has little to do with $F$ being "constant." I think you are making misleading arguments that mix up the roles of continuity, invertibility, and boundedness of support of distributions. Introducing estimators and referring to them also as "quantiles" is interesting but threatens to make things even more confusing.
$endgroup$
– whuber♦
7 hours ago
add a comment
|
1
$begingroup$
Your first paragraph has some incorrect information: $F^-1$ is indeed unique in many cases, including for the uniform distribution on $[0,1]$ (when $F$ is restricted to $[0,1]$ itself). This has little to do with $F$ being "constant." I think you are making misleading arguments that mix up the roles of continuity, invertibility, and boundedness of support of distributions. Introducing estimators and referring to them also as "quantiles" is interesting but threatens to make things even more confusing.
$endgroup$
– whuber♦
7 hours ago
1
1
$begingroup$
Your first paragraph has some incorrect information: $F^-1$ is indeed unique in many cases, including for the uniform distribution on $[0,1]$ (when $F$ is restricted to $[0,1]$ itself). This has little to do with $F$ being "constant." I think you are making misleading arguments that mix up the roles of continuity, invertibility, and boundedness of support of distributions. Introducing estimators and referring to them also as "quantiles" is interesting but threatens to make things even more confusing.
$endgroup$
– whuber♦
7 hours ago
$begingroup$
Your first paragraph has some incorrect information: $F^-1$ is indeed unique in many cases, including for the uniform distribution on $[0,1]$ (when $F$ is restricted to $[0,1]$ itself). This has little to do with $F$ being "constant." I think you are making misleading arguments that mix up the roles of continuity, invertibility, and boundedness of support of distributions. Introducing estimators and referring to them also as "quantiles" is interesting but threatens to make things even more confusing.
$endgroup$
– whuber♦
7 hours ago
add a comment
|
$begingroup$
I was taught that an observation in the nth percentile was greater than n% of observations in the dataset under consideration. Which to me implies that there is no 0th or 100th percentile. No observation can be greater than 100% of observations because it forms part of that 100% (and a similar logic applies in the case of 0).
But I unfortunately have no source for this that I can point you to.
$endgroup$
2
$begingroup$
Do you have an authoritative reference for what you remember being taught? Note that you are implicitly adopting a definition of "percentile" as being a group of numbers. The other definition quoted in the question is that the percentile is a boundary between such groups.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
@whuber Unfortunately not. And yes, I see the distinction.
$endgroup$
– mkt
8 hours ago
$begingroup$
That doesn't make sense to me because suppose your data is 2,2,2,2,2,2,2,2,2,2,2 so an item in one quantile is equal to an item to its left in a prior quantile. So an item in the nth quantile is not greater than all quantiles left of it. So an item in the nth percentile is not greater than n% of observations in the dataset. It's >= n% of observations in the dataset, but not simply >. And hence you can have a 100th pecentile.. what do you make of that logic?
$endgroup$
– barlop
5 hours ago
$begingroup$
Many definitions come under strain if all values are identical!
$endgroup$
– Nick Cox
3 hours ago
add a comment
|
$begingroup$
I was taught that an observation in the nth percentile was greater than n% of observations in the dataset under consideration. Which to me implies that there is no 0th or 100th percentile. No observation can be greater than 100% of observations because it forms part of that 100% (and a similar logic applies in the case of 0).
But I unfortunately have no source for this that I can point you to.
$endgroup$
2
$begingroup$
Do you have an authoritative reference for what you remember being taught? Note that you are implicitly adopting a definition of "percentile" as being a group of numbers. The other definition quoted in the question is that the percentile is a boundary between such groups.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
@whuber Unfortunately not. And yes, I see the distinction.
$endgroup$
– mkt
8 hours ago
$begingroup$
That doesn't make sense to me because suppose your data is 2,2,2,2,2,2,2,2,2,2,2 so an item in one quantile is equal to an item to its left in a prior quantile. So an item in the nth quantile is not greater than all quantiles left of it. So an item in the nth percentile is not greater than n% of observations in the dataset. It's >= n% of observations in the dataset, but not simply >. And hence you can have a 100th pecentile.. what do you make of that logic?
$endgroup$
– barlop
5 hours ago
$begingroup$
Many definitions come under strain if all values are identical!
$endgroup$
– Nick Cox
3 hours ago
add a comment
|
$begingroup$
I was taught that an observation in the nth percentile was greater than n% of observations in the dataset under consideration. Which to me implies that there is no 0th or 100th percentile. No observation can be greater than 100% of observations because it forms part of that 100% (and a similar logic applies in the case of 0).
But I unfortunately have no source for this that I can point you to.
$endgroup$
I was taught that an observation in the nth percentile was greater than n% of observations in the dataset under consideration. Which to me implies that there is no 0th or 100th percentile. No observation can be greater than 100% of observations because it forms part of that 100% (and a similar logic applies in the case of 0).
But I unfortunately have no source for this that I can point you to.
edited 7 hours ago
answered 8 hours ago
mktmkt
8,4516 gold badges32 silver badges93 bronze badges
8,4516 gold badges32 silver badges93 bronze badges
2
$begingroup$
Do you have an authoritative reference for what you remember being taught? Note that you are implicitly adopting a definition of "percentile" as being a group of numbers. The other definition quoted in the question is that the percentile is a boundary between such groups.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
@whuber Unfortunately not. And yes, I see the distinction.
$endgroup$
– mkt
8 hours ago
$begingroup$
That doesn't make sense to me because suppose your data is 2,2,2,2,2,2,2,2,2,2,2 so an item in one quantile is equal to an item to its left in a prior quantile. So an item in the nth quantile is not greater than all quantiles left of it. So an item in the nth percentile is not greater than n% of observations in the dataset. It's >= n% of observations in the dataset, but not simply >. And hence you can have a 100th pecentile.. what do you make of that logic?
$endgroup$
– barlop
5 hours ago
$begingroup$
Many definitions come under strain if all values are identical!
$endgroup$
– Nick Cox
3 hours ago
add a comment
|
2
$begingroup$
Do you have an authoritative reference for what you remember being taught? Note that you are implicitly adopting a definition of "percentile" as being a group of numbers. The other definition quoted in the question is that the percentile is a boundary between such groups.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
@whuber Unfortunately not. And yes, I see the distinction.
$endgroup$
– mkt
8 hours ago
$begingroup$
That doesn't make sense to me because suppose your data is 2,2,2,2,2,2,2,2,2,2,2 so an item in one quantile is equal to an item to its left in a prior quantile. So an item in the nth quantile is not greater than all quantiles left of it. So an item in the nth percentile is not greater than n% of observations in the dataset. It's >= n% of observations in the dataset, but not simply >. And hence you can have a 100th pecentile.. what do you make of that logic?
$endgroup$
– barlop
5 hours ago
$begingroup$
Many definitions come under strain if all values are identical!
$endgroup$
– Nick Cox
3 hours ago
2
2
$begingroup$
Do you have an authoritative reference for what you remember being taught? Note that you are implicitly adopting a definition of "percentile" as being a group of numbers. The other definition quoted in the question is that the percentile is a boundary between such groups.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
Do you have an authoritative reference for what you remember being taught? Note that you are implicitly adopting a definition of "percentile" as being a group of numbers. The other definition quoted in the question is that the percentile is a boundary between such groups.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
@whuber Unfortunately not. And yes, I see the distinction.
$endgroup$
– mkt
8 hours ago
$begingroup$
@whuber Unfortunately not. And yes, I see the distinction.
$endgroup$
– mkt
8 hours ago
$begingroup$
That doesn't make sense to me because suppose your data is 2,2,2,2,2,2,2,2,2,2,2 so an item in one quantile is equal to an item to its left in a prior quantile. So an item in the nth quantile is not greater than all quantiles left of it. So an item in the nth percentile is not greater than n% of observations in the dataset. It's >= n% of observations in the dataset, but not simply >. And hence you can have a 100th pecentile.. what do you make of that logic?
$endgroup$
– barlop
5 hours ago
$begingroup$
That doesn't make sense to me because suppose your data is 2,2,2,2,2,2,2,2,2,2,2 so an item in one quantile is equal to an item to its left in a prior quantile. So an item in the nth quantile is not greater than all quantiles left of it. So an item in the nth percentile is not greater than n% of observations in the dataset. It's >= n% of observations in the dataset, but not simply >. And hence you can have a 100th pecentile.. what do you make of that logic?
$endgroup$
– barlop
5 hours ago
$begingroup$
Many definitions come under strain if all values are identical!
$endgroup$
– Nick Cox
3 hours ago
$begingroup$
Many definitions come under strain if all values are identical!
$endgroup$
– Nick Cox
3 hours ago
add a comment
|
barlop is a new contributor. Be nice, and check out our Code of Conduct.
barlop is a new contributor. Be nice, and check out our Code of Conduct.
barlop is a new contributor. Be nice, and check out our Code of Conduct.
barlop is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f430391%2fare-there-99-percentiles-or-100-percentiles-and-are-they-groups-of-numbers-or%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes. Cases can be made for either 99 (as in the definition you quote) or 101.
$endgroup$
– whuber♦
8 hours ago
2
$begingroup$
Historically quantiles — as we now say generically — were first summary points, and then by extension the bins, classes or intervals they delimit. So three quartiles, including the median, define four bins, and so forth.
$endgroup$
– Nick Cox
7 hours ago
$begingroup$
@NickCox do you have a source for that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "I think it unlikely 100 would be a reasonable answer due to its asymmetric treatment of the extremes." <-- can you elaborate on that?
$endgroup$
– barlop
5 hours ago
$begingroup$
@whuber You write "Cases can be made for either 99 (as in the definition you quote) or 101" <-- though percent means per 100, so how can you have 101? And if 101 would you number them 1st 2nd ... 101st, or 0th 1st 100th? 0th seems problematic because the th/st is for counting and counting is from 1. Even in computer science, you index from 0 but still counting is 0=no items, from 1 for the first item!
$endgroup$
– barlop
5 hours ago