split a six digits number column into separated columns with one digitHow do you split a list into evenly sized chunks?How to add an extra column to a NumPy arrayRenaming columns in pandasAdding new column to existing DataFrame in Python pandas“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasConvert list of dictionaries to a pandas DataFrame
My Friend James
Solve the given inequality below in the body.
Why there is no wireless switch?
How can I oppose my advisor granting gift authorship to a collaborator?
'This one' as a pronoun
Professor refuses to write a recommendation letter to students who haven't written a research paper with him
Tiny image scraper for xkcd.com
Left my gmail logged in when I was fired
How do I make my fill-in-the-blank exercise more obvious?
How does the UK House of Commons think they can prolong the deadline of Brexit?
Would you recommend a keyboard for beginners with or without lights in keys for learning?
What is the source of the fear in the Hallow spell's extra Fear effect?
Tying double knot of garbarge bag
Are buttons really enough to bound validities by S4.2?
A magician's sleight of hand
How were the names on the memorial stones in Avengers: Endgame chosen, out-of-universe?
Zermelo's proof for unique factorisation
Label "Alto en grasa saturada, sal, ..." should there also be Alta?
What is hot spotting in the context of adding files to tempdb?
Low quality postdoc application and deadline extension
What drugs were used in England during the High Middle Ages?
How could a planet have one hemisphere way warmer than the other without the planet being tidally locked?
GFI outlets tripped after power outage
Why did Boris Johnson call for new elections?
split a six digits number column into separated columns with one digit
How do you split a list into evenly sized chunks?How to add an extra column to a NumPy arrayRenaming columns in pandasAdding new column to existing DataFrame in Python pandas“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasConvert list of dictionaries to a pandas DataFrame
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
what I have is like this one below
Number
654321
223344
The desired outcome should be like this one below.
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
python pandas numpy
add a comment |
how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
what I have is like this one below
Number
654321
223344
The desired outcome should be like this one below.
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
python pandas numpy
If you don't have to use numpy or pandas -for num in str(my_number): print(num)
– wcarhart
8 hours ago
What is source of your data?numpy.array
orpandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?
– Daweo
7 hours ago
add a comment |
how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
what I have is like this one below
Number
654321
223344
The desired outcome should be like this one below.
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
python pandas numpy
how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
what I have is like this one below
Number
654321
223344
The desired outcome should be like this one below.
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
python pandas numpy
python pandas numpy
edited 7 hours ago
msalem85
asked 8 hours ago
msalem85msalem85
364 bronze badges
364 bronze badges
If you don't have to use numpy or pandas -for num in str(my_number): print(num)
– wcarhart
8 hours ago
What is source of your data?numpy.array
orpandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?
– Daweo
7 hours ago
add a comment |
If you don't have to use numpy or pandas -for num in str(my_number): print(num)
– wcarhart
8 hours ago
What is source of your data?numpy.array
orpandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?
– Daweo
7 hours ago
If you don't have to use numpy or pandas -
for num in str(my_number): print(num)
– wcarhart
8 hours ago
If you don't have to use numpy or pandas -
for num in str(my_number): print(num)
– wcarhart
8 hours ago
What is source of your data?
numpy.array
or pandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?– Daweo
7 hours ago
What is source of your data?
numpy.array
or pandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?– Daweo
7 hours ago
add a comment |
8 Answers
8
active
oldest
votes
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
add a comment |
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
7 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
add a comment |
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
5 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
4 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
|
show 2 more comments
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
add a comment |
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
add a comment |
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57792952%2fsplit-a-six-digits-number-column-into-separated-columns-with-one-digit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
8 Answers
8
active
oldest
votes
8 Answers
8
active
oldest
votes
active
oldest
votes
active
oldest
votes
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
edited 7 hours ago
answered 8 hours ago
jlandercyjlandercy
1,9761 gold badge17 silver badges31 bronze badges
1,9761 gold badge17 silver badges31 bronze badges
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
1
1
get rid off
apply
, you can simply do digit(df['Number'], i)
.– Quang Hoang
8 hours ago
get rid off
apply
, you can simply do digit(df['Number'], i)
.– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
answered 8 hours ago
user3483203user3483203
38.5k8 gold badges32 silver badges62 bronze badges
38.5k8 gold badges32 silver badges62 bronze badges
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
add a comment |
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
add a comment |
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
answered 8 hours ago
piRSquaredpiRSquared
178k26 gold badges195 silver badges352 bronze badges
178k26 gold badges195 silver badges352 bronze badges
add a comment |
add a comment |
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
7 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
add a comment |
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
7 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
add a comment |
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
answered 8 hours ago
jdehesajdehesa
35k4 gold badges42 silver badges66 bronze badges
35k4 gold badges42 silver badges66 bronze badges
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
7 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
add a comment |
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
7 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
7 hours ago
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
7 hours ago
1
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.– piRSquared
6 hours ago
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.– piRSquared
6 hours ago
add a comment |
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
5 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
4 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
|
show 2 more comments
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
5 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
4 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
|
show 2 more comments
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
edited 56 mins ago
answered 6 hours ago
Paul PanzerPaul Panzer
35.1k2 gold badges23 silver badges53 bronze badges
35.1k2 gold badges23 silver badges53 bronze badges
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
5 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
4 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
|
show 2 more comments
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
5 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
4 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
1
1
This is an excellent trick and one-lines @Paul +1, What does
**
in assign
, would you mind explaining the code.– Karn Kumar
5 hours ago
This is an excellent trick and one-lines @Paul +1, What does
**
in assign
, would you mind explaining the code.– Karn Kumar
5 hours ago
@KarnKumar
**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much about pandas
, so this part of the code may be far from being optimal.– Paul Panzer
5 hours ago
@KarnKumar
**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much about pandas
, so this part of the code may be far from being optimal.– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
1
One alternative way to return a new data frame using
digits
is df.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also, df['Number']
can be used as a numpy array directly without explicitly accessing the .values
attribute.– GZ0
4 hours ago
One alternative way to return a new data frame using
digits
is df.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also, df['Number']
can be used as a numpy array directly without explicitly accessing the .values
attribute.– GZ0
4 hours ago
1
1
@PaulPanzer You solution is indeed a lot more performant.
df.assign
makes a copy of the orignal dataframe and then add columns one by one. The df.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) In pandas
version >= 0.24.0, df.to_numpy()
is recommended in favor of df.values
; (2) the index of the original data frame should be preserved by passing index=df.index
into the constructor function.– GZ0
2 hours ago
@PaulPanzer You solution is indeed a lot more performant.
df.assign
makes a copy of the orignal dataframe and then add columns one by one. The df.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) In pandas
version >= 0.24.0, df.to_numpy()
is recommended in favor of df.values
; (2) the index of the original data frame should be preserved by passing index=df.index
into the constructor function.– GZ0
2 hours ago
|
show 2 more comments
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
add a comment |
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
add a comment |
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
answered 8 hours ago
DaweoDaweo
2,0651 gold badge2 silver badges6 bronze badges
2,0651 gold badge2 silver badges6 bronze badges
add a comment |
add a comment |
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
add a comment |
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
add a comment |
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
edited 7 hours ago
answered 8 hours ago
political scientistpolitical scientist
1,8121 gold badge8 silver badges18 bronze badges
1,8121 gold badge8 silver badges18 bronze badges
add a comment |
add a comment |
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
add a comment |
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
add a comment |
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
edited 5 hours ago
answered 7 hours ago
Karn KumarKarn Kumar
3,7081 gold badge7 silver badges22 bronze badges
3,7081 gold badge7 silver badges22 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57792952%2fsplit-a-six-digits-number-column-into-separated-columns-with-one-digit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you don't have to use numpy or pandas -
for num in str(my_number): print(num)
– wcarhart
8 hours ago
What is source of your data?
numpy.array
orpandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?– Daweo
7 hours ago