how to set the columns in pandasHow to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
Hiding a solar system in a nebula
What does "another" mean in this case?
Where is read command?
How exactly do you calculate matrix f(m) when m is a matrix, f a polynomial?
Blood-based alcohol for vampires?
Why would a propellor have blades of different lengths?
How can I get a file's size with C++17?
Should I hide my travel history to the UK when I apply for an Australian visa?
Versicle and response symbols
What is the difference between a historical drama and a period drama?
Is there ever a reason not to use Java 8's parallelSort?
Can I deep fry food in butter instead of vegetable oil?
Show that there are infinitely more problems than we will ever be able to compute
How frequently do Russian people still refer to others by their patronymic (отчество)?
When do I make my first save against the Web spell?
Why is quantum gravity non-renormalizable?
Cannot update a field to a Lookup, MasterDetail, or Hierarchy from something else (44:13)
What does the ash content of broken wheat really mean?
Term for a character that only exists to be talked to
My mother co-signed for my car. Can she take it away from me if I am the one making car payments?
Bypass with wrong cvv of debit card and getting OTP
Story about two rival crews terraforming a planet
Isn't "Dave's protocol" good if only the database, and not the code, is leaked?
Recolour existing plots
how to set the columns in pandas
How to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
here is my dataframe -:
Dec-18 Jan-19 Feb-19 Mar-19 Apr-19 May-19
Saturday 2540.0 2441.0 3832.0 4093.0 1455.0 2552.0
Sunday 1313.0 1891.0 2968.0 2260.0 1454.0 1798.0
Monday 1360.0 1558.0 2967.0 2156.0 1564.0 1752.0
Tuesday 1089.0 2105.0 2476.0 1577.0 1744.0 1457.0
Wednesday 1329.0 1658.0 2073.0 2403.0 1231.0 874.0
Thursday 798.0 1195.0 2183.0 1287.0 1460.0 1269.0
i have tried some pandas ops but i am not able to do that
this is what i want to do
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
Sunday 1891.0
Monday 1558.0
Tuesday 2105.0
Wednesday 1658.0
Thursday 1195.0 ............ and so on
i want to set those rows into rows in downside
how to do that?
thanks in advance
python pandas dataframe
add a comment |
here is my dataframe -:
Dec-18 Jan-19 Feb-19 Mar-19 Apr-19 May-19
Saturday 2540.0 2441.0 3832.0 4093.0 1455.0 2552.0
Sunday 1313.0 1891.0 2968.0 2260.0 1454.0 1798.0
Monday 1360.0 1558.0 2967.0 2156.0 1564.0 1752.0
Tuesday 1089.0 2105.0 2476.0 1577.0 1744.0 1457.0
Wednesday 1329.0 1658.0 2073.0 2403.0 1231.0 874.0
Thursday 798.0 1195.0 2183.0 1287.0 1460.0 1269.0
i have tried some pandas ops but i am not able to do that
this is what i want to do
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
Sunday 1891.0
Monday 1558.0
Tuesday 2105.0
Wednesday 1658.0
Thursday 1195.0 ............ and so on
i want to set those rows into rows in downside
how to do that?
thanks in advance
python pandas dataframe
2
Try: df.reset_index().melt() by index
– political scientist
8 hours ago
add a comment |
here is my dataframe -:
Dec-18 Jan-19 Feb-19 Mar-19 Apr-19 May-19
Saturday 2540.0 2441.0 3832.0 4093.0 1455.0 2552.0
Sunday 1313.0 1891.0 2968.0 2260.0 1454.0 1798.0
Monday 1360.0 1558.0 2967.0 2156.0 1564.0 1752.0
Tuesday 1089.0 2105.0 2476.0 1577.0 1744.0 1457.0
Wednesday 1329.0 1658.0 2073.0 2403.0 1231.0 874.0
Thursday 798.0 1195.0 2183.0 1287.0 1460.0 1269.0
i have tried some pandas ops but i am not able to do that
this is what i want to do
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
Sunday 1891.0
Monday 1558.0
Tuesday 2105.0
Wednesday 1658.0
Thursday 1195.0 ............ and so on
i want to set those rows into rows in downside
how to do that?
thanks in advance
python pandas dataframe
here is my dataframe -:
Dec-18 Jan-19 Feb-19 Mar-19 Apr-19 May-19
Saturday 2540.0 2441.0 3832.0 4093.0 1455.0 2552.0
Sunday 1313.0 1891.0 2968.0 2260.0 1454.0 1798.0
Monday 1360.0 1558.0 2967.0 2156.0 1564.0 1752.0
Tuesday 1089.0 2105.0 2476.0 1577.0 1744.0 1457.0
Wednesday 1329.0 1658.0 2073.0 2403.0 1231.0 874.0
Thursday 798.0 1195.0 2183.0 1287.0 1460.0 1269.0
i have tried some pandas ops but i am not able to do that
this is what i want to do
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
Sunday 1891.0
Monday 1558.0
Tuesday 2105.0
Wednesday 1658.0
Thursday 1195.0 ............ and so on
i want to set those rows into rows in downside
how to do that?
thanks in advance
python pandas dataframe
python pandas dataframe
asked 8 hours ago
jonyjony
505 bronze badges
505 bronze badges
2
Try: df.reset_index().melt() by index
– political scientist
8 hours ago
add a comment |
2
Try: df.reset_index().melt() by index
– political scientist
8 hours ago
2
2
Try: df.reset_index().melt() by index
– political scientist
8 hours ago
Try: df.reset_index().melt() by index
– political scientist
8 hours ago
add a comment |
3 Answers
3
active
oldest
votes
df.reset_index().melt(id_vars='index').drop('variable',1)
Output:
index value
0 Saturday 2540.0
1 Sunday 1313.0
2 Monday 1360.0
3 Tuesday 1089.0
4 Wednesday 1329.0
5 Thursday 798.0
6 Saturday 2441.0
7 Sunday 1891.0
8 Monday 1558.0
9 Tuesday 2105.0
10 Wednesday 1658.0
11 Thursday 1195.0
12 Saturday 3832.0
13 Sunday 2968.0
14 Monday 2967.0
15 Tuesday 2476.0
16 Wednesday 2073.0
17 Thursday 2183.0
18 Saturday 4093.0
19 Sunday 2260.0
20 Monday 2156.0
21 Tuesday 1577.0
22 Wednesday 2403.0
23 Thursday 1287.0
24 Saturday 1455.0
25 Sunday 1454.0
26 Monday 1564.0
27 Tuesday 1744.0
28 Wednesday 1231.0
29 Thursday 1460.0
30 Saturday 2552.0
31 Sunday 1798.0
32 Monday 1752.0
33 Tuesday 1457.0
34 Wednesday 874.0
35 Thursday 1269.0
Note: just noted a commented suggesting to do the same thing, I will delete my post if requested :)
add a comment |
Create it with numpy
by reshaping the data.
import pandas as pd
import numpy as np
pd.DataFrame(df.to_numpy().flatten('F'),
index=np.tile(df.index, df.shape[1]),
columns=['items'])
Output:
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
...
Sunday 1798.0
Monday 1752.0
Tuesday 1457.0
Wednesday 874.0
Thursday 1269.0
1
My answer was virtually identical to this.a = df.to_numpy(); pd.DataFrame(np.reshape(a, (-1, 1), 'F'), np.resize(df.index, a.size), ['items'])
– piRSquared
8 hours ago
@piRSquared my answer was faster than the accepted answer and matches the output requested exactly, while the accepted answer does not. Mine was also first posted. Sometimes things just don't make sense do they :P.
– d_kennetz
7 hours ago
@piRSquared want me to add yours to this solution? (or feel free to edit yourself :D)
– ALollz
5 hours ago
1
Minor fix: the argument tonp.tile
should bedf.shape[1]
instead ofdf.shape[0]
, which only happens to work on this example data because it is square!
– Peter Leimbigler
3 hours ago
add a comment |
You can do:
df = df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
It is interesting that this method got overlooked even though it is the fastest:
import time
start = time.time()
df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.006181955337524414
while this:
start = time.time()
df.reset_index().melt(id_vars='days').drop('variable',1)
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.010072708129882812
Any my output format matches OP's requested exactly.
Interesting: why does this work? I would expectdf.stack().sort_index(level=1)
to lexicographically sort the stringsDec-18
,Jan-19
, etc., but in fact they get sorted in date order, even if they're strings and not datetime objects.df.stack().index.get_level_values(1).sort_values()
lexsorts.
– Peter Leimbigler
8 hours ago
@PeterLeimbigler it is sorting based on the order of the columns, not datetime or string. If jan-19 was the first column that would've been sorted first. try it using this setup:df = pd.DataFrame('days': ['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday'], 'Dec-18': [400,300,200,100,1000,1200], 'Jan-19': [500, 300, 200, 800, 900, 1000])
then:df = df.set_index('days')
then:df.stack().sort_index(level=1)
Then go back and change the order of the columns and see what appears first.
– d_kennetz
7 hours ago
Thanks for the explanation! This is unexpected behaviour to me. From my testing, it appears that if you stack a DataFrame's columns into a MultiIndex and the result is a Series, then the index remembers the order of the columns, and sorts according to that order. But if the.stack()
returns a DataFrame (or if you convert to DataFrame using.stack().to_frame()
), the index no longer remembers the order of the original columns.
– Peter Leimbigler
7 hours ago
2
@d_kennetz sometimes they do not. I usually think of answers as general ideas. I judge them accordingly. I give credit for ingenuity and presentation/explanation. I like to see the output from proposed solutions because all to often answers provide a solution that doesn't produce correct output. This doesn't show the results. Also, most of the time, DataFrames aren't big enough for performance to matter. OP goes with what is most understandable to them. Keep up the good fight and answer questions that are beneficial long term. (-:
– piRSquared
7 hours ago
Also, usedf.unstack().reset_index(0, drop=True).to_frame('items')
. Byunstack
-ing rather thanstack
-ing, you save yourself from the sorting shenanigans.
– piRSquared
7 hours ago
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56841907%2fhow-to-set-the-columns-in-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
df.reset_index().melt(id_vars='index').drop('variable',1)
Output:
index value
0 Saturday 2540.0
1 Sunday 1313.0
2 Monday 1360.0
3 Tuesday 1089.0
4 Wednesday 1329.0
5 Thursday 798.0
6 Saturday 2441.0
7 Sunday 1891.0
8 Monday 1558.0
9 Tuesday 2105.0
10 Wednesday 1658.0
11 Thursday 1195.0
12 Saturday 3832.0
13 Sunday 2968.0
14 Monday 2967.0
15 Tuesday 2476.0
16 Wednesday 2073.0
17 Thursday 2183.0
18 Saturday 4093.0
19 Sunday 2260.0
20 Monday 2156.0
21 Tuesday 1577.0
22 Wednesday 2403.0
23 Thursday 1287.0
24 Saturday 1455.0
25 Sunday 1454.0
26 Monday 1564.0
27 Tuesday 1744.0
28 Wednesday 1231.0
29 Thursday 1460.0
30 Saturday 2552.0
31 Sunday 1798.0
32 Monday 1752.0
33 Tuesday 1457.0
34 Wednesday 874.0
35 Thursday 1269.0
Note: just noted a commented suggesting to do the same thing, I will delete my post if requested :)
add a comment |
df.reset_index().melt(id_vars='index').drop('variable',1)
Output:
index value
0 Saturday 2540.0
1 Sunday 1313.0
2 Monday 1360.0
3 Tuesday 1089.0
4 Wednesday 1329.0
5 Thursday 798.0
6 Saturday 2441.0
7 Sunday 1891.0
8 Monday 1558.0
9 Tuesday 2105.0
10 Wednesday 1658.0
11 Thursday 1195.0
12 Saturday 3832.0
13 Sunday 2968.0
14 Monday 2967.0
15 Tuesday 2476.0
16 Wednesday 2073.0
17 Thursday 2183.0
18 Saturday 4093.0
19 Sunday 2260.0
20 Monday 2156.0
21 Tuesday 1577.0
22 Wednesday 2403.0
23 Thursday 1287.0
24 Saturday 1455.0
25 Sunday 1454.0
26 Monday 1564.0
27 Tuesday 1744.0
28 Wednesday 1231.0
29 Thursday 1460.0
30 Saturday 2552.0
31 Sunday 1798.0
32 Monday 1752.0
33 Tuesday 1457.0
34 Wednesday 874.0
35 Thursday 1269.0
Note: just noted a commented suggesting to do the same thing, I will delete my post if requested :)
add a comment |
df.reset_index().melt(id_vars='index').drop('variable',1)
Output:
index value
0 Saturday 2540.0
1 Sunday 1313.0
2 Monday 1360.0
3 Tuesday 1089.0
4 Wednesday 1329.0
5 Thursday 798.0
6 Saturday 2441.0
7 Sunday 1891.0
8 Monday 1558.0
9 Tuesday 2105.0
10 Wednesday 1658.0
11 Thursday 1195.0
12 Saturday 3832.0
13 Sunday 2968.0
14 Monday 2967.0
15 Tuesday 2476.0
16 Wednesday 2073.0
17 Thursday 2183.0
18 Saturday 4093.0
19 Sunday 2260.0
20 Monday 2156.0
21 Tuesday 1577.0
22 Wednesday 2403.0
23 Thursday 1287.0
24 Saturday 1455.0
25 Sunday 1454.0
26 Monday 1564.0
27 Tuesday 1744.0
28 Wednesday 1231.0
29 Thursday 1460.0
30 Saturday 2552.0
31 Sunday 1798.0
32 Monday 1752.0
33 Tuesday 1457.0
34 Wednesday 874.0
35 Thursday 1269.0
Note: just noted a commented suggesting to do the same thing, I will delete my post if requested :)
df.reset_index().melt(id_vars='index').drop('variable',1)
Output:
index value
0 Saturday 2540.0
1 Sunday 1313.0
2 Monday 1360.0
3 Tuesday 1089.0
4 Wednesday 1329.0
5 Thursday 798.0
6 Saturday 2441.0
7 Sunday 1891.0
8 Monday 1558.0
9 Tuesday 2105.0
10 Wednesday 1658.0
11 Thursday 1195.0
12 Saturday 3832.0
13 Sunday 2968.0
14 Monday 2967.0
15 Tuesday 2476.0
16 Wednesday 2073.0
17 Thursday 2183.0
18 Saturday 4093.0
19 Sunday 2260.0
20 Monday 2156.0
21 Tuesday 1577.0
22 Wednesday 2403.0
23 Thursday 1287.0
24 Saturday 1455.0
25 Sunday 1454.0
26 Monday 1564.0
27 Tuesday 1744.0
28 Wednesday 1231.0
29 Thursday 1460.0
30 Saturday 2552.0
31 Sunday 1798.0
32 Monday 1752.0
33 Tuesday 1457.0
34 Wednesday 874.0
35 Thursday 1269.0
Note: just noted a commented suggesting to do the same thing, I will delete my post if requested :)
answered 8 hours ago
YucaYuca
3,8032 gold badges10 silver badges27 bronze badges
3,8032 gold badges10 silver badges27 bronze badges
add a comment |
add a comment |
Create it with numpy
by reshaping the data.
import pandas as pd
import numpy as np
pd.DataFrame(df.to_numpy().flatten('F'),
index=np.tile(df.index, df.shape[1]),
columns=['items'])
Output:
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
...
Sunday 1798.0
Monday 1752.0
Tuesday 1457.0
Wednesday 874.0
Thursday 1269.0
1
My answer was virtually identical to this.a = df.to_numpy(); pd.DataFrame(np.reshape(a, (-1, 1), 'F'), np.resize(df.index, a.size), ['items'])
– piRSquared
8 hours ago
@piRSquared my answer was faster than the accepted answer and matches the output requested exactly, while the accepted answer does not. Mine was also first posted. Sometimes things just don't make sense do they :P.
– d_kennetz
7 hours ago
@piRSquared want me to add yours to this solution? (or feel free to edit yourself :D)
– ALollz
5 hours ago
1
Minor fix: the argument tonp.tile
should bedf.shape[1]
instead ofdf.shape[0]
, which only happens to work on this example data because it is square!
– Peter Leimbigler
3 hours ago
add a comment |
Create it with numpy
by reshaping the data.
import pandas as pd
import numpy as np
pd.DataFrame(df.to_numpy().flatten('F'),
index=np.tile(df.index, df.shape[1]),
columns=['items'])
Output:
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
...
Sunday 1798.0
Monday 1752.0
Tuesday 1457.0
Wednesday 874.0
Thursday 1269.0
1
My answer was virtually identical to this.a = df.to_numpy(); pd.DataFrame(np.reshape(a, (-1, 1), 'F'), np.resize(df.index, a.size), ['items'])
– piRSquared
8 hours ago
@piRSquared my answer was faster than the accepted answer and matches the output requested exactly, while the accepted answer does not. Mine was also first posted. Sometimes things just don't make sense do they :P.
– d_kennetz
7 hours ago
@piRSquared want me to add yours to this solution? (or feel free to edit yourself :D)
– ALollz
5 hours ago
1
Minor fix: the argument tonp.tile
should bedf.shape[1]
instead ofdf.shape[0]
, which only happens to work on this example data because it is square!
– Peter Leimbigler
3 hours ago
add a comment |
Create it with numpy
by reshaping the data.
import pandas as pd
import numpy as np
pd.DataFrame(df.to_numpy().flatten('F'),
index=np.tile(df.index, df.shape[1]),
columns=['items'])
Output:
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
...
Sunday 1798.0
Monday 1752.0
Tuesday 1457.0
Wednesday 874.0
Thursday 1269.0
Create it with numpy
by reshaping the data.
import pandas as pd
import numpy as np
pd.DataFrame(df.to_numpy().flatten('F'),
index=np.tile(df.index, df.shape[1]),
columns=['items'])
Output:
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
...
Sunday 1798.0
Monday 1752.0
Tuesday 1457.0
Wednesday 874.0
Thursday 1269.0
edited 3 hours ago
answered 8 hours ago
ALollzALollz
19.8k5 gold badges18 silver badges40 bronze badges
19.8k5 gold badges18 silver badges40 bronze badges
1
My answer was virtually identical to this.a = df.to_numpy(); pd.DataFrame(np.reshape(a, (-1, 1), 'F'), np.resize(df.index, a.size), ['items'])
– piRSquared
8 hours ago
@piRSquared my answer was faster than the accepted answer and matches the output requested exactly, while the accepted answer does not. Mine was also first posted. Sometimes things just don't make sense do they :P.
– d_kennetz
7 hours ago
@piRSquared want me to add yours to this solution? (or feel free to edit yourself :D)
– ALollz
5 hours ago
1
Minor fix: the argument tonp.tile
should bedf.shape[1]
instead ofdf.shape[0]
, which only happens to work on this example data because it is square!
– Peter Leimbigler
3 hours ago
add a comment |
1
My answer was virtually identical to this.a = df.to_numpy(); pd.DataFrame(np.reshape(a, (-1, 1), 'F'), np.resize(df.index, a.size), ['items'])
– piRSquared
8 hours ago
@piRSquared my answer was faster than the accepted answer and matches the output requested exactly, while the accepted answer does not. Mine was also first posted. Sometimes things just don't make sense do they :P.
– d_kennetz
7 hours ago
@piRSquared want me to add yours to this solution? (or feel free to edit yourself :D)
– ALollz
5 hours ago
1
Minor fix: the argument tonp.tile
should bedf.shape[1]
instead ofdf.shape[0]
, which only happens to work on this example data because it is square!
– Peter Leimbigler
3 hours ago
1
1
My answer was virtually identical to this.
a = df.to_numpy(); pd.DataFrame(np.reshape(a, (-1, 1), 'F'), np.resize(df.index, a.size), ['items'])
– piRSquared
8 hours ago
My answer was virtually identical to this.
a = df.to_numpy(); pd.DataFrame(np.reshape(a, (-1, 1), 'F'), np.resize(df.index, a.size), ['items'])
– piRSquared
8 hours ago
@piRSquared my answer was faster than the accepted answer and matches the output requested exactly, while the accepted answer does not. Mine was also first posted. Sometimes things just don't make sense do they :P.
– d_kennetz
7 hours ago
@piRSquared my answer was faster than the accepted answer and matches the output requested exactly, while the accepted answer does not. Mine was also first posted. Sometimes things just don't make sense do they :P.
– d_kennetz
7 hours ago
@piRSquared want me to add yours to this solution? (or feel free to edit yourself :D)
– ALollz
5 hours ago
@piRSquared want me to add yours to this solution? (or feel free to edit yourself :D)
– ALollz
5 hours ago
1
1
Minor fix: the argument to
np.tile
should be df.shape[1]
instead of df.shape[0]
, which only happens to work on this example data because it is square!– Peter Leimbigler
3 hours ago
Minor fix: the argument to
np.tile
should be df.shape[1]
instead of df.shape[0]
, which only happens to work on this example data because it is square!– Peter Leimbigler
3 hours ago
add a comment |
You can do:
df = df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
It is interesting that this method got overlooked even though it is the fastest:
import time
start = time.time()
df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.006181955337524414
while this:
start = time.time()
df.reset_index().melt(id_vars='days').drop('variable',1)
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.010072708129882812
Any my output format matches OP's requested exactly.
Interesting: why does this work? I would expectdf.stack().sort_index(level=1)
to lexicographically sort the stringsDec-18
,Jan-19
, etc., but in fact they get sorted in date order, even if they're strings and not datetime objects.df.stack().index.get_level_values(1).sort_values()
lexsorts.
– Peter Leimbigler
8 hours ago
@PeterLeimbigler it is sorting based on the order of the columns, not datetime or string. If jan-19 was the first column that would've been sorted first. try it using this setup:df = pd.DataFrame('days': ['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday'], 'Dec-18': [400,300,200,100,1000,1200], 'Jan-19': [500, 300, 200, 800, 900, 1000])
then:df = df.set_index('days')
then:df.stack().sort_index(level=1)
Then go back and change the order of the columns and see what appears first.
– d_kennetz
7 hours ago
Thanks for the explanation! This is unexpected behaviour to me. From my testing, it appears that if you stack a DataFrame's columns into a MultiIndex and the result is a Series, then the index remembers the order of the columns, and sorts according to that order. But if the.stack()
returns a DataFrame (or if you convert to DataFrame using.stack().to_frame()
), the index no longer remembers the order of the original columns.
– Peter Leimbigler
7 hours ago
2
@d_kennetz sometimes they do not. I usually think of answers as general ideas. I judge them accordingly. I give credit for ingenuity and presentation/explanation. I like to see the output from proposed solutions because all to often answers provide a solution that doesn't produce correct output. This doesn't show the results. Also, most of the time, DataFrames aren't big enough for performance to matter. OP goes with what is most understandable to them. Keep up the good fight and answer questions that are beneficial long term. (-:
– piRSquared
7 hours ago
Also, usedf.unstack().reset_index(0, drop=True).to_frame('items')
. Byunstack
-ing rather thanstack
-ing, you save yourself from the sorting shenanigans.
– piRSquared
7 hours ago
|
show 2 more comments
You can do:
df = df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
It is interesting that this method got overlooked even though it is the fastest:
import time
start = time.time()
df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.006181955337524414
while this:
start = time.time()
df.reset_index().melt(id_vars='days').drop('variable',1)
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.010072708129882812
Any my output format matches OP's requested exactly.
Interesting: why does this work? I would expectdf.stack().sort_index(level=1)
to lexicographically sort the stringsDec-18
,Jan-19
, etc., but in fact they get sorted in date order, even if they're strings and not datetime objects.df.stack().index.get_level_values(1).sort_values()
lexsorts.
– Peter Leimbigler
8 hours ago
@PeterLeimbigler it is sorting based on the order of the columns, not datetime or string. If jan-19 was the first column that would've been sorted first. try it using this setup:df = pd.DataFrame('days': ['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday'], 'Dec-18': [400,300,200,100,1000,1200], 'Jan-19': [500, 300, 200, 800, 900, 1000])
then:df = df.set_index('days')
then:df.stack().sort_index(level=1)
Then go back and change the order of the columns and see what appears first.
– d_kennetz
7 hours ago
Thanks for the explanation! This is unexpected behaviour to me. From my testing, it appears that if you stack a DataFrame's columns into a MultiIndex and the result is a Series, then the index remembers the order of the columns, and sorts according to that order. But if the.stack()
returns a DataFrame (or if you convert to DataFrame using.stack().to_frame()
), the index no longer remembers the order of the original columns.
– Peter Leimbigler
7 hours ago
2
@d_kennetz sometimes they do not. I usually think of answers as general ideas. I judge them accordingly. I give credit for ingenuity and presentation/explanation. I like to see the output from proposed solutions because all to often answers provide a solution that doesn't produce correct output. This doesn't show the results. Also, most of the time, DataFrames aren't big enough for performance to matter. OP goes with what is most understandable to them. Keep up the good fight and answer questions that are beneficial long term. (-:
– piRSquared
7 hours ago
Also, usedf.unstack().reset_index(0, drop=True).to_frame('items')
. Byunstack
-ing rather thanstack
-ing, you save yourself from the sorting shenanigans.
– piRSquared
7 hours ago
|
show 2 more comments
You can do:
df = df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
It is interesting that this method got overlooked even though it is the fastest:
import time
start = time.time()
df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.006181955337524414
while this:
start = time.time()
df.reset_index().melt(id_vars='days').drop('variable',1)
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.010072708129882812
Any my output format matches OP's requested exactly.
You can do:
df = df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
It is interesting that this method got overlooked even though it is the fastest:
import time
start = time.time()
df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.006181955337524414
while this:
start = time.time()
df.reset_index().melt(id_vars='days').drop('variable',1)
end = time.time()
print("time taken ".format(end-start))
yields: time taken 0.010072708129882812
Any my output format matches OP's requested exactly.
edited 7 hours ago
answered 8 hours ago
d_kennetzd_kennetz
2,8014 gold badges9 silver badges28 bronze badges
2,8014 gold badges9 silver badges28 bronze badges
Interesting: why does this work? I would expectdf.stack().sort_index(level=1)
to lexicographically sort the stringsDec-18
,Jan-19
, etc., but in fact they get sorted in date order, even if they're strings and not datetime objects.df.stack().index.get_level_values(1).sort_values()
lexsorts.
– Peter Leimbigler
8 hours ago
@PeterLeimbigler it is sorting based on the order of the columns, not datetime or string. If jan-19 was the first column that would've been sorted first. try it using this setup:df = pd.DataFrame('days': ['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday'], 'Dec-18': [400,300,200,100,1000,1200], 'Jan-19': [500, 300, 200, 800, 900, 1000])
then:df = df.set_index('days')
then:df.stack().sort_index(level=1)
Then go back and change the order of the columns and see what appears first.
– d_kennetz
7 hours ago
Thanks for the explanation! This is unexpected behaviour to me. From my testing, it appears that if you stack a DataFrame's columns into a MultiIndex and the result is a Series, then the index remembers the order of the columns, and sorts according to that order. But if the.stack()
returns a DataFrame (or if you convert to DataFrame using.stack().to_frame()
), the index no longer remembers the order of the original columns.
– Peter Leimbigler
7 hours ago
2
@d_kennetz sometimes they do not. I usually think of answers as general ideas. I judge them accordingly. I give credit for ingenuity and presentation/explanation. I like to see the output from proposed solutions because all to often answers provide a solution that doesn't produce correct output. This doesn't show the results. Also, most of the time, DataFrames aren't big enough for performance to matter. OP goes with what is most understandable to them. Keep up the good fight and answer questions that are beneficial long term. (-:
– piRSquared
7 hours ago
Also, usedf.unstack().reset_index(0, drop=True).to_frame('items')
. Byunstack
-ing rather thanstack
-ing, you save yourself from the sorting shenanigans.
– piRSquared
7 hours ago
|
show 2 more comments
Interesting: why does this work? I would expectdf.stack().sort_index(level=1)
to lexicographically sort the stringsDec-18
,Jan-19
, etc., but in fact they get sorted in date order, even if they're strings and not datetime objects.df.stack().index.get_level_values(1).sort_values()
lexsorts.
– Peter Leimbigler
8 hours ago
@PeterLeimbigler it is sorting based on the order of the columns, not datetime or string. If jan-19 was the first column that would've been sorted first. try it using this setup:df = pd.DataFrame('days': ['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday'], 'Dec-18': [400,300,200,100,1000,1200], 'Jan-19': [500, 300, 200, 800, 900, 1000])
then:df = df.set_index('days')
then:df.stack().sort_index(level=1)
Then go back and change the order of the columns and see what appears first.
– d_kennetz
7 hours ago
Thanks for the explanation! This is unexpected behaviour to me. From my testing, it appears that if you stack a DataFrame's columns into a MultiIndex and the result is a Series, then the index remembers the order of the columns, and sorts according to that order. But if the.stack()
returns a DataFrame (or if you convert to DataFrame using.stack().to_frame()
), the index no longer remembers the order of the original columns.
– Peter Leimbigler
7 hours ago
2
@d_kennetz sometimes they do not. I usually think of answers as general ideas. I judge them accordingly. I give credit for ingenuity and presentation/explanation. I like to see the output from proposed solutions because all to often answers provide a solution that doesn't produce correct output. This doesn't show the results. Also, most of the time, DataFrames aren't big enough for performance to matter. OP goes with what is most understandable to them. Keep up the good fight and answer questions that are beneficial long term. (-:
– piRSquared
7 hours ago
Also, usedf.unstack().reset_index(0, drop=True).to_frame('items')
. Byunstack
-ing rather thanstack
-ing, you save yourself from the sorting shenanigans.
– piRSquared
7 hours ago
Interesting: why does this work? I would expect
df.stack().sort_index(level=1)
to lexicographically sort the strings Dec-18
, Jan-19
, etc., but in fact they get sorted in date order, even if they're strings and not datetime objects. df.stack().index.get_level_values(1).sort_values()
lexsorts.– Peter Leimbigler
8 hours ago
Interesting: why does this work? I would expect
df.stack().sort_index(level=1)
to lexicographically sort the strings Dec-18
, Jan-19
, etc., but in fact they get sorted in date order, even if they're strings and not datetime objects. df.stack().index.get_level_values(1).sort_values()
lexsorts.– Peter Leimbigler
8 hours ago
@PeterLeimbigler it is sorting based on the order of the columns, not datetime or string. If jan-19 was the first column that would've been sorted first. try it using this setup:
df = pd.DataFrame('days': ['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday'], 'Dec-18': [400,300,200,100,1000,1200], 'Jan-19': [500, 300, 200, 800, 900, 1000])
then: df = df.set_index('days')
then: df.stack().sort_index(level=1)
Then go back and change the order of the columns and see what appears first.– d_kennetz
7 hours ago
@PeterLeimbigler it is sorting based on the order of the columns, not datetime or string. If jan-19 was the first column that would've been sorted first. try it using this setup:
df = pd.DataFrame('days': ['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday'], 'Dec-18': [400,300,200,100,1000,1200], 'Jan-19': [500, 300, 200, 800, 900, 1000])
then: df = df.set_index('days')
then: df.stack().sort_index(level=1)
Then go back and change the order of the columns and see what appears first.– d_kennetz
7 hours ago
Thanks for the explanation! This is unexpected behaviour to me. From my testing, it appears that if you stack a DataFrame's columns into a MultiIndex and the result is a Series, then the index remembers the order of the columns, and sorts according to that order. But if the
.stack()
returns a DataFrame (or if you convert to DataFrame using .stack().to_frame()
), the index no longer remembers the order of the original columns.– Peter Leimbigler
7 hours ago
Thanks for the explanation! This is unexpected behaviour to me. From my testing, it appears that if you stack a DataFrame's columns into a MultiIndex and the result is a Series, then the index remembers the order of the columns, and sorts according to that order. But if the
.stack()
returns a DataFrame (or if you convert to DataFrame using .stack().to_frame()
), the index no longer remembers the order of the original columns.– Peter Leimbigler
7 hours ago
2
2
@d_kennetz sometimes they do not. I usually think of answers as general ideas. I judge them accordingly. I give credit for ingenuity and presentation/explanation. I like to see the output from proposed solutions because all to often answers provide a solution that doesn't produce correct output. This doesn't show the results. Also, most of the time, DataFrames aren't big enough for performance to matter. OP goes with what is most understandable to them. Keep up the good fight and answer questions that are beneficial long term. (-:
– piRSquared
7 hours ago
@d_kennetz sometimes they do not. I usually think of answers as general ideas. I judge them accordingly. I give credit for ingenuity and presentation/explanation. I like to see the output from proposed solutions because all to often answers provide a solution that doesn't produce correct output. This doesn't show the results. Also, most of the time, DataFrames aren't big enough for performance to matter. OP goes with what is most understandable to them. Keep up the good fight and answer questions that are beneficial long term. (-:
– piRSquared
7 hours ago
Also, use
df.unstack().reset_index(0, drop=True).to_frame('items')
. By unstack
-ing rather than stack
-ing, you save yourself from the sorting shenanigans.– piRSquared
7 hours ago
Also, use
df.unstack().reset_index(0, drop=True).to_frame('items')
. By unstack
-ing rather than stack
-ing, you save yourself from the sorting shenanigans.– piRSquared
7 hours ago
|
show 2 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56841907%2fhow-to-set-the-columns-in-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Try: df.reset_index().melt() by index
– political scientist
8 hours ago