split 1 column input into 5 column bed fileHow to extract values from second file on the basis common first column?

I transpose the source code, you transpose the input!

Is population size a parameter, or sample size a statistic?

Why does Captain Marvel in the MCU not have her sash?

If a spaceship ran out of fuel somewhere in space between Earth and Mars, does it slowly drift off to the Sun?

Received a package but didn't order it

How can the dynamic linker/loader itself be dynamically linked as reported by `file`?

London Congestion Charge on A205

Counting files between two corresponding strings in multiple directories

One-digit products in a row of numbers

After viewing logs with journalctl, how do I exit the screen that says "lines 1-2/2 (END)"?

How to stop the death waves in my city?

Can you trip a breaker from a different circuit?

Convert a string of digits from words to an integer

Fix Ethernet 10/100 PoE cable with 7 out of 8 wires alive

Problematic Nature of Views

How to justify getting additional team member when the current team is doing well?

Is it ok if I haven't decided my research topic when I first meet with a potential phd advisor?

How to prepare for a rapid tournament

How can I find Marin?

Population of post-Soviet states. Why decreasing?

Delete n lines skip 1 line script

What does it mean by "my days-of-the-week underwear only go to Thursday" in this context?

What happens to a net with the Returning Weapon artificer infusion after it hits?

A word that refers to saying something in an attempt to anger or embarrass someone into doing something that they don’t want to do?



split 1 column input into 5 column bed file


How to extract values from second file on the basis common first column?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


In f1 below I am trying to split $1 based on each line and create a bed file. If the line is a snp then the : is spilt and the text is $1 the last digit is $2 -1 and $3 and the letter to the left of the > is $4 and the letter to the right is $5. (lines 1 and 2)



If the line has dup then the : is spilt and the text is $1 the last digit is $2 and $3 and the $4 is - and the letter to the right is $5. (line 4)



If the line has del with nothing else then the : is spilt and the text is $1 the last digit is $2 and $3 and the letter to the left of the > is $4 and $5 is -. (line 5)



I am not sure how to format for line 3 (the complex indel)... maybe split $1 on the : and the text is $1 the last digit is $2 and $3 and the letters to the left of the > is $4 and the letters to the right is $5. Thank you :)



f1



chr7:140453145A>T
chr7:140453136A>T 
chr7:140453135_140453136delCAinsTT
chr20:31022287dupA
chr19:13054614delG


desired



chr7 140453144 140453145 A T 
chr7 140453135 140453136 A T 
chr7 140453135 140453136 TT CA
chr20 31022287 31022287 - A
chr19 13054614 13054614 G -


awk



awk 'BEGINOFS="t" sub(/[^0-9]+$/, "", $1)' f1

awk 'BEGIN FS = "[ -]"; OFS="t" NF==3 print $1, $2 - 1, $2, $3 NF==4 print $1, $2, $3, $4 ' f1









share|improve this question











$endgroup$









  • 2




    $begingroup$
    Do you have to do this in awk for some reason? Sure it's possible, but there are enough cases that'd it'd make more sense to use python that's easier to write longer scripts in.
    $endgroup$
    – Devon Ryan
    9 hours ago










  • $begingroup$
    python is fine I just don't know it as well.... only the basics but I am interested to see what it would look like as I can learn from it. Thank you :).
    $endgroup$
    – justaguy
    8 hours ago

















1












$begingroup$


In f1 below I am trying to split $1 based on each line and create a bed file. If the line is a snp then the : is spilt and the text is $1 the last digit is $2 -1 and $3 and the letter to the left of the > is $4 and the letter to the right is $5. (lines 1 and 2)



If the line has dup then the : is spilt and the text is $1 the last digit is $2 and $3 and the $4 is - and the letter to the right is $5. (line 4)



If the line has del with nothing else then the : is spilt and the text is $1 the last digit is $2 and $3 and the letter to the left of the > is $4 and $5 is -. (line 5)



I am not sure how to format for line 3 (the complex indel)... maybe split $1 on the : and the text is $1 the last digit is $2 and $3 and the letters to the left of the > is $4 and the letters to the right is $5. Thank you :)



f1



chr7:140453145A>T
chr7:140453136A>T 
chr7:140453135_140453136delCAinsTT
chr20:31022287dupA
chr19:13054614delG


desired



chr7 140453144 140453145 A T 
chr7 140453135 140453136 A T 
chr7 140453135 140453136 TT CA
chr20 31022287 31022287 - A
chr19 13054614 13054614 G -


awk



awk 'BEGINOFS="t" sub(/[^0-9]+$/, "", $1)' f1

awk 'BEGIN FS = "[ -]"; OFS="t" NF==3 print $1, $2 - 1, $2, $3 NF==4 print $1, $2, $3, $4 ' f1









share|improve this question











$endgroup$









  • 2




    $begingroup$
    Do you have to do this in awk for some reason? Sure it's possible, but there are enough cases that'd it'd make more sense to use python that's easier to write longer scripts in.
    $endgroup$
    – Devon Ryan
    9 hours ago










  • $begingroup$
    python is fine I just don't know it as well.... only the basics but I am interested to see what it would look like as I can learn from it. Thank you :).
    $endgroup$
    – justaguy
    8 hours ago













1












1








1





$begingroup$


In f1 below I am trying to split $1 based on each line and create a bed file. If the line is a snp then the : is spilt and the text is $1 the last digit is $2 -1 and $3 and the letter to the left of the > is $4 and the letter to the right is $5. (lines 1 and 2)



If the line has dup then the : is spilt and the text is $1 the last digit is $2 and $3 and the $4 is - and the letter to the right is $5. (line 4)



If the line has del with nothing else then the : is spilt and the text is $1 the last digit is $2 and $3 and the letter to the left of the > is $4 and $5 is -. (line 5)



I am not sure how to format for line 3 (the complex indel)... maybe split $1 on the : and the text is $1 the last digit is $2 and $3 and the letters to the left of the > is $4 and the letters to the right is $5. Thank you :)



f1



chr7:140453145A>T
chr7:140453136A>T 
chr7:140453135_140453136delCAinsTT
chr20:31022287dupA
chr19:13054614delG


desired



chr7 140453144 140453145 A T 
chr7 140453135 140453136 A T 
chr7 140453135 140453136 TT CA
chr20 31022287 31022287 - A
chr19 13054614 13054614 G -


awk



awk 'BEGINOFS="t" sub(/[^0-9]+$/, "", $1)' f1

awk 'BEGIN FS = "[ -]"; OFS="t" NF==3 print $1, $2 - 1, $2, $3 NF==4 print $1, $2, $3, $4 ' f1









share|improve this question











$endgroup$




In f1 below I am trying to split $1 based on each line and create a bed file. If the line is a snp then the : is spilt and the text is $1 the last digit is $2 -1 and $3 and the letter to the left of the > is $4 and the letter to the right is $5. (lines 1 and 2)



If the line has dup then the : is spilt and the text is $1 the last digit is $2 and $3 and the $4 is - and the letter to the right is $5. (line 4)



If the line has del with nothing else then the : is spilt and the text is $1 the last digit is $2 and $3 and the letter to the left of the > is $4 and $5 is -. (line 5)



I am not sure how to format for line 3 (the complex indel)... maybe split $1 on the : and the text is $1 the last digit is $2 and $3 and the letters to the left of the > is $4 and the letters to the right is $5. Thank you :)



f1



chr7:140453145A>T
chr7:140453136A>T 
chr7:140453135_140453136delCAinsTT
chr20:31022287dupA
chr19:13054614delG


desired



chr7 140453144 140453145 A T 
chr7 140453135 140453136 A T 
chr7 140453135 140453136 TT CA
chr20 31022287 31022287 - A
chr19 13054614 13054614 G -


awk



awk 'BEGINOFS="t" sub(/[^0-9]+$/, "", $1)' f1

awk 'BEGIN FS = "[ -]"; OFS="t" NF==3 print $1, $2 - 1, $2, $3 NF==4 print $1, $2, $3, $4 ' f1






awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 9 hours ago







justaguy

















asked 9 hours ago









justaguyjustaguy

1085 bronze badges




1085 bronze badges










  • 2




    $begingroup$
    Do you have to do this in awk for some reason? Sure it's possible, but there are enough cases that'd it'd make more sense to use python that's easier to write longer scripts in.
    $endgroup$
    – Devon Ryan
    9 hours ago










  • $begingroup$
    python is fine I just don't know it as well.... only the basics but I am interested to see what it would look like as I can learn from it. Thank you :).
    $endgroup$
    – justaguy
    8 hours ago












  • 2




    $begingroup$
    Do you have to do this in awk for some reason? Sure it's possible, but there are enough cases that'd it'd make more sense to use python that's easier to write longer scripts in.
    $endgroup$
    – Devon Ryan
    9 hours ago










  • $begingroup$
    python is fine I just don't know it as well.... only the basics but I am interested to see what it would look like as I can learn from it. Thank you :).
    $endgroup$
    – justaguy
    8 hours ago







2




2




$begingroup$
Do you have to do this in awk for some reason? Sure it's possible, but there are enough cases that'd it'd make more sense to use python that's easier to write longer scripts in.
$endgroup$
– Devon Ryan
9 hours ago




$begingroup$
Do you have to do this in awk for some reason? Sure it's possible, but there are enough cases that'd it'd make more sense to use python that's easier to write longer scripts in.
$endgroup$
– Devon Ryan
9 hours ago












$begingroup$
python is fine I just don't know it as well.... only the basics but I am interested to see what it would look like as I can learn from it. Thank you :).
$endgroup$
– justaguy
8 hours ago




$begingroup$
python is fine I just don't know it as well.... only the basics but I am interested to see what it would look like as I can learn from it. Thank you :).
$endgroup$
– justaguy
8 hours ago










3 Answers
3






active

oldest

votes


















3














$begingroup$

The following Python script seems to do the job.



#!/usr/bin/env python3

from __future__ import print_function
import re
import sys

for line in sys.stdin:
if '_' in line:
match = re.search(r'(S+):(d+)_(d+)del([ACGT]+)ins([ACGT]+)', line)
print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t')
elif 'dup' in line or 'del' in line:
match = re.search(r'(S+):(d+)(del|dup)([ACGT]+)', line)
if match.group(3) == 'del':
print(match.group(1), match.group(2), match.group(2), match.group(4), '-', sep='t')
else:
print(match.group(1), match.group(2), match.group(2), '-', match.group(4), sep='t')
else:
match = re.search(r'(S+):(d+)([ACGT]+)>([ACGT]+)', line)
coord = int(match.group(2))
print(match.group(1), coord - 1, coord, match.group(3), match.group(4), sep='t')


Invoked like so on the command line.



[standage@lappy ~] $ ./transform < f1
chr7 140453144 140453145 A T
chr7 140453135 140453136 A T
chr7 140453135 140453136 TT CA
chr20 31022287 31022287 - A
chr19 13054614 13054614 G -





share|improve this answer











$endgroup$














  • $begingroup$
    Thank you very much: print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t') ^ SyntaxError: invalid syntax ~$ python --version Python 2.7.9 (not sure if this helps, sorry new to python. :)
    $endgroup$
    – justaguy
    7 hours ago







  • 1




    $begingroup$
    It looks like you're running Python 2, which unfortunately is the default version on many operating systems even though it's at its end-of-life. I'll update the script so that it's compatible with Python 2.
    $endgroup$
    – Daniel Standage
    7 hours ago






  • 1




    $begingroup$
    "print" was a statement in Python 2 but is a function in Python 3 hence the syntax error.
    $endgroup$
    – haci
    7 hours ago


















2














$begingroup$

The Python script in my first response is heavy on regex matching, which is pretty clunky in Python. I like Python much better than Perl overall, but a throwaway script like this will be clearer and more concise in Perl.



#!/usr/bin/env perl
use strict;

while(<STDIN>) dup)([ACGT]+)/)
if ($3 eq "del")
print("$1t$2t$2t$4t-n");

else
print("$1t$2t$2t-t$4n");


elsif (m/(S+):(d+)([ACGT]+)>([ACGT]+)/)
my $coord2 = $2;
my $coord1 = $2 - 1;
print("$1t$coord1t$coord2t$3t$4n")




It's been several years since I wrote Perl on a regular basis, so there are probably ways to make the script even more clear and concise. (Although I have a love/hate relationship with the default/hidden variables, which can make a script more concise but also harder for a newcomer to understand.)



Invoked like so on the command line.



[standage@lappy ~] $ ./transform < f1
chr7 140453144 140453145 A T
chr7 140453135 140453136 A T
chr7 140453135 140453136 TT CA
chr20 31022287 31022287 - A
chr19 13054614 13054614 G -





share|improve this answer









$endgroup$






















    1














    $begingroup$

    Here is an R solution, would probably be slower than the suggested Python solution though:



    bed <- readLines("bed.txt")

    res <- data.frame()

    for(i in bed)

    temp <- strsplit(i, ":") %>% unlist()

    # case N>N

    if(grepl(">", temp[[2]]))

    begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer() - 1
    end <- begin + 1
    ref <- str_extract(temp[[2]], pattern = "[A-Z]")
    alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

    temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
    res <- rbind(res, temp_res)


    # case del_ins

    if(grepl("ins", temp[[2]]))

    begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
    end <- str_extract(temp[[2]], pattern = "_[:digit:]+")
    end <- substring(end, 2, nchar(end)) %>% as.integer()
    ref <- str_extract(temp[[2]], pattern = "[A-Z]+$")
    alt <- str_extract(temp[[2]], pattern = "[A-Z]+")

    temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
    res <- rbind(res, temp_res)



    # case dup

    if(grepl("dup", temp[[2]]))

    begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
    end <- begin
    ref <- "-"
    alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

    temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
    res <- rbind(res, temp_res)


    # case del only

    if(grepl("del", temp[[2]]) & !grepl("ins", temp[[2]]))

    begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
    end <- begin
    ref <- str_extract(temp[[2]], pattern = "[A-Z]$")
    alt <- "-"

    temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
    res <- rbind(res, temp_res)




    print(res)

    temp..1.. begin end ref alt
    1 chr7 140453144 140453145 A T
    2 chr7 140453135 140453136 A T
    3 chr7 140453135 140453136 TT CA
    4 chr20 31022287 31022287 - A
    5 chr19 13054614 13054614 G -





    share|improve this answer









    $endgroup$














    • $begingroup$
      Thank you all very much :).
      $endgroup$
      – justaguy
      1 hour ago













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "676"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );














    draft saved

    draft discarded
















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f10454%2fsplit-1-column-input-into-5-column-bed-file%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    $begingroup$

    The following Python script seems to do the job.



    #!/usr/bin/env python3

    from __future__ import print_function
    import re
    import sys

    for line in sys.stdin:
    if '_' in line:
    match = re.search(r'(S+):(d+)_(d+)del([ACGT]+)ins([ACGT]+)', line)
    print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t')
    elif 'dup' in line or 'del' in line:
    match = re.search(r'(S+):(d+)(del|dup)([ACGT]+)', line)
    if match.group(3) == 'del':
    print(match.group(1), match.group(2), match.group(2), match.group(4), '-', sep='t')
    else:
    print(match.group(1), match.group(2), match.group(2), '-', match.group(4), sep='t')
    else:
    match = re.search(r'(S+):(d+)([ACGT]+)>([ACGT]+)', line)
    coord = int(match.group(2))
    print(match.group(1), coord - 1, coord, match.group(3), match.group(4), sep='t')


    Invoked like so on the command line.



    [standage@lappy ~] $ ./transform < f1
    chr7 140453144 140453145 A T
    chr7 140453135 140453136 A T
    chr7 140453135 140453136 TT CA
    chr20 31022287 31022287 - A
    chr19 13054614 13054614 G -





    share|improve this answer











    $endgroup$














    • $begingroup$
      Thank you very much: print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t') ^ SyntaxError: invalid syntax ~$ python --version Python 2.7.9 (not sure if this helps, sorry new to python. :)
      $endgroup$
      – justaguy
      7 hours ago







    • 1




      $begingroup$
      It looks like you're running Python 2, which unfortunately is the default version on many operating systems even though it's at its end-of-life. I'll update the script so that it's compatible with Python 2.
      $endgroup$
      – Daniel Standage
      7 hours ago






    • 1




      $begingroup$
      "print" was a statement in Python 2 but is a function in Python 3 hence the syntax error.
      $endgroup$
      – haci
      7 hours ago















    3














    $begingroup$

    The following Python script seems to do the job.



    #!/usr/bin/env python3

    from __future__ import print_function
    import re
    import sys

    for line in sys.stdin:
    if '_' in line:
    match = re.search(r'(S+):(d+)_(d+)del([ACGT]+)ins([ACGT]+)', line)
    print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t')
    elif 'dup' in line or 'del' in line:
    match = re.search(r'(S+):(d+)(del|dup)([ACGT]+)', line)
    if match.group(3) == 'del':
    print(match.group(1), match.group(2), match.group(2), match.group(4), '-', sep='t')
    else:
    print(match.group(1), match.group(2), match.group(2), '-', match.group(4), sep='t')
    else:
    match = re.search(r'(S+):(d+)([ACGT]+)>([ACGT]+)', line)
    coord = int(match.group(2))
    print(match.group(1), coord - 1, coord, match.group(3), match.group(4), sep='t')


    Invoked like so on the command line.



    [standage@lappy ~] $ ./transform < f1
    chr7 140453144 140453145 A T
    chr7 140453135 140453136 A T
    chr7 140453135 140453136 TT CA
    chr20 31022287 31022287 - A
    chr19 13054614 13054614 G -





    share|improve this answer











    $endgroup$














    • $begingroup$
      Thank you very much: print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t') ^ SyntaxError: invalid syntax ~$ python --version Python 2.7.9 (not sure if this helps, sorry new to python. :)
      $endgroup$
      – justaguy
      7 hours ago







    • 1




      $begingroup$
      It looks like you're running Python 2, which unfortunately is the default version on many operating systems even though it's at its end-of-life. I'll update the script so that it's compatible with Python 2.
      $endgroup$
      – Daniel Standage
      7 hours ago






    • 1




      $begingroup$
      "print" was a statement in Python 2 but is a function in Python 3 hence the syntax error.
      $endgroup$
      – haci
      7 hours ago













    3














    3










    3







    $begingroup$

    The following Python script seems to do the job.



    #!/usr/bin/env python3

    from __future__ import print_function
    import re
    import sys

    for line in sys.stdin:
    if '_' in line:
    match = re.search(r'(S+):(d+)_(d+)del([ACGT]+)ins([ACGT]+)', line)
    print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t')
    elif 'dup' in line or 'del' in line:
    match = re.search(r'(S+):(d+)(del|dup)([ACGT]+)', line)
    if match.group(3) == 'del':
    print(match.group(1), match.group(2), match.group(2), match.group(4), '-', sep='t')
    else:
    print(match.group(1), match.group(2), match.group(2), '-', match.group(4), sep='t')
    else:
    match = re.search(r'(S+):(d+)([ACGT]+)>([ACGT]+)', line)
    coord = int(match.group(2))
    print(match.group(1), coord - 1, coord, match.group(3), match.group(4), sep='t')


    Invoked like so on the command line.



    [standage@lappy ~] $ ./transform < f1
    chr7 140453144 140453145 A T
    chr7 140453135 140453136 A T
    chr7 140453135 140453136 TT CA
    chr20 31022287 31022287 - A
    chr19 13054614 13054614 G -





    share|improve this answer











    $endgroup$



    The following Python script seems to do the job.



    #!/usr/bin/env python3

    from __future__ import print_function
    import re
    import sys

    for line in sys.stdin:
    if '_' in line:
    match = re.search(r'(S+):(d+)_(d+)del([ACGT]+)ins([ACGT]+)', line)
    print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t')
    elif 'dup' in line or 'del' in line:
    match = re.search(r'(S+):(d+)(del|dup)([ACGT]+)', line)
    if match.group(3) == 'del':
    print(match.group(1), match.group(2), match.group(2), match.group(4), '-', sep='t')
    else:
    print(match.group(1), match.group(2), match.group(2), '-', match.group(4), sep='t')
    else:
    match = re.search(r'(S+):(d+)([ACGT]+)>([ACGT]+)', line)
    coord = int(match.group(2))
    print(match.group(1), coord - 1, coord, match.group(3), match.group(4), sep='t')


    Invoked like so on the command line.



    [standage@lappy ~] $ ./transform < f1
    chr7 140453144 140453145 A T
    chr7 140453135 140453136 A T
    chr7 140453135 140453136 TT CA
    chr20 31022287 31022287 - A
    chr19 13054614 13054614 G -






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 7 hours ago

























    answered 8 hours ago









    Daniel StandageDaniel Standage

    3,4477 silver badges33 bronze badges




    3,4477 silver badges33 bronze badges














    • $begingroup$
      Thank you very much: print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t') ^ SyntaxError: invalid syntax ~$ python --version Python 2.7.9 (not sure if this helps, sorry new to python. :)
      $endgroup$
      – justaguy
      7 hours ago







    • 1




      $begingroup$
      It looks like you're running Python 2, which unfortunately is the default version on many operating systems even though it's at its end-of-life. I'll update the script so that it's compatible with Python 2.
      $endgroup$
      – Daniel Standage
      7 hours ago






    • 1




      $begingroup$
      "print" was a statement in Python 2 but is a function in Python 3 hence the syntax error.
      $endgroup$
      – haci
      7 hours ago
















    • $begingroup$
      Thank you very much: print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t') ^ SyntaxError: invalid syntax ~$ python --version Python 2.7.9 (not sure if this helps, sorry new to python. :)
      $endgroup$
      – justaguy
      7 hours ago







    • 1




      $begingroup$
      It looks like you're running Python 2, which unfortunately is the default version on many operating systems even though it's at its end-of-life. I'll update the script so that it's compatible with Python 2.
      $endgroup$
      – Daniel Standage
      7 hours ago






    • 1




      $begingroup$
      "print" was a statement in Python 2 but is a function in Python 3 hence the syntax error.
      $endgroup$
      – haci
      7 hours ago















    $begingroup$
    Thank you very much: print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t') ^ SyntaxError: invalid syntax ~$ python --version Python 2.7.9 (not sure if this helps, sorry new to python. :)
    $endgroup$
    – justaguy
    7 hours ago





    $begingroup$
    Thank you very much: print(match.group(1), match.group(2), match.group(3), match.group(5), match.group(4), sep='t') ^ SyntaxError: invalid syntax ~$ python --version Python 2.7.9 (not sure if this helps, sorry new to python. :)
    $endgroup$
    – justaguy
    7 hours ago





    1




    1




    $begingroup$
    It looks like you're running Python 2, which unfortunately is the default version on many operating systems even though it's at its end-of-life. I'll update the script so that it's compatible with Python 2.
    $endgroup$
    – Daniel Standage
    7 hours ago




    $begingroup$
    It looks like you're running Python 2, which unfortunately is the default version on many operating systems even though it's at its end-of-life. I'll update the script so that it's compatible with Python 2.
    $endgroup$
    – Daniel Standage
    7 hours ago




    1




    1




    $begingroup$
    "print" was a statement in Python 2 but is a function in Python 3 hence the syntax error.
    $endgroup$
    – haci
    7 hours ago




    $begingroup$
    "print" was a statement in Python 2 but is a function in Python 3 hence the syntax error.
    $endgroup$
    – haci
    7 hours ago













    2














    $begingroup$

    The Python script in my first response is heavy on regex matching, which is pretty clunky in Python. I like Python much better than Perl overall, but a throwaway script like this will be clearer and more concise in Perl.



    #!/usr/bin/env perl
    use strict;

    while(<STDIN>) dup)([ACGT]+)/)
    if ($3 eq "del")
    print("$1t$2t$2t$4t-n");

    else
    print("$1t$2t$2t-t$4n");


    elsif (m/(S+):(d+)([ACGT]+)>([ACGT]+)/)
    my $coord2 = $2;
    my $coord1 = $2 - 1;
    print("$1t$coord1t$coord2t$3t$4n")




    It's been several years since I wrote Perl on a regular basis, so there are probably ways to make the script even more clear and concise. (Although I have a love/hate relationship with the default/hidden variables, which can make a script more concise but also harder for a newcomer to understand.)



    Invoked like so on the command line.



    [standage@lappy ~] $ ./transform < f1
    chr7 140453144 140453145 A T
    chr7 140453135 140453136 A T
    chr7 140453135 140453136 TT CA
    chr20 31022287 31022287 - A
    chr19 13054614 13054614 G -





    share|improve this answer









    $endgroup$



















      2














      $begingroup$

      The Python script in my first response is heavy on regex matching, which is pretty clunky in Python. I like Python much better than Perl overall, but a throwaway script like this will be clearer and more concise in Perl.



      #!/usr/bin/env perl
      use strict;

      while(<STDIN>) dup)([ACGT]+)/)
      if ($3 eq "del")
      print("$1t$2t$2t$4t-n");

      else
      print("$1t$2t$2t-t$4n");


      elsif (m/(S+):(d+)([ACGT]+)>([ACGT]+)/)
      my $coord2 = $2;
      my $coord1 = $2 - 1;
      print("$1t$coord1t$coord2t$3t$4n")




      It's been several years since I wrote Perl on a regular basis, so there are probably ways to make the script even more clear and concise. (Although I have a love/hate relationship with the default/hidden variables, which can make a script more concise but also harder for a newcomer to understand.)



      Invoked like so on the command line.



      [standage@lappy ~] $ ./transform < f1
      chr7 140453144 140453145 A T
      chr7 140453135 140453136 A T
      chr7 140453135 140453136 TT CA
      chr20 31022287 31022287 - A
      chr19 13054614 13054614 G -





      share|improve this answer









      $endgroup$

















        2














        2










        2







        $begingroup$

        The Python script in my first response is heavy on regex matching, which is pretty clunky in Python. I like Python much better than Perl overall, but a throwaway script like this will be clearer and more concise in Perl.



        #!/usr/bin/env perl
        use strict;

        while(<STDIN>) dup)([ACGT]+)/)
        if ($3 eq "del")
        print("$1t$2t$2t$4t-n");

        else
        print("$1t$2t$2t-t$4n");


        elsif (m/(S+):(d+)([ACGT]+)>([ACGT]+)/)
        my $coord2 = $2;
        my $coord1 = $2 - 1;
        print("$1t$coord1t$coord2t$3t$4n")




        It's been several years since I wrote Perl on a regular basis, so there are probably ways to make the script even more clear and concise. (Although I have a love/hate relationship with the default/hidden variables, which can make a script more concise but also harder for a newcomer to understand.)



        Invoked like so on the command line.



        [standage@lappy ~] $ ./transform < f1
        chr7 140453144 140453145 A T
        chr7 140453135 140453136 A T
        chr7 140453135 140453136 TT CA
        chr20 31022287 31022287 - A
        chr19 13054614 13054614 G -





        share|improve this answer









        $endgroup$



        The Python script in my first response is heavy on regex matching, which is pretty clunky in Python. I like Python much better than Perl overall, but a throwaway script like this will be clearer and more concise in Perl.



        #!/usr/bin/env perl
        use strict;

        while(<STDIN>) dup)([ACGT]+)/)
        if ($3 eq "del")
        print("$1t$2t$2t$4t-n");

        else
        print("$1t$2t$2t-t$4n");


        elsif (m/(S+):(d+)([ACGT]+)>([ACGT]+)/)
        my $coord2 = $2;
        my $coord1 = $2 - 1;
        print("$1t$coord1t$coord2t$3t$4n")




        It's been several years since I wrote Perl on a regular basis, so there are probably ways to make the script even more clear and concise. (Although I have a love/hate relationship with the default/hidden variables, which can make a script more concise but also harder for a newcomer to understand.)



        Invoked like so on the command line.



        [standage@lappy ~] $ ./transform < f1
        chr7 140453144 140453145 A T
        chr7 140453135 140453136 A T
        chr7 140453135 140453136 TT CA
        chr20 31022287 31022287 - A
        chr19 13054614 13054614 G -






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 7 hours ago









        Daniel StandageDaniel Standage

        3,4477 silver badges33 bronze badges




        3,4477 silver badges33 bronze badges
























            1














            $begingroup$

            Here is an R solution, would probably be slower than the suggested Python solution though:



            bed <- readLines("bed.txt")

            res <- data.frame()

            for(i in bed)

            temp <- strsplit(i, ":") %>% unlist()

            # case N>N

            if(grepl(">", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer() - 1
            end <- begin + 1
            ref <- str_extract(temp[[2]], pattern = "[A-Z]")
            alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)


            # case del_ins

            if(grepl("ins", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- str_extract(temp[[2]], pattern = "_[:digit:]+")
            end <- substring(end, 2, nchar(end)) %>% as.integer()
            ref <- str_extract(temp[[2]], pattern = "[A-Z]+$")
            alt <- str_extract(temp[[2]], pattern = "[A-Z]+")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)



            # case dup

            if(grepl("dup", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- begin
            ref <- "-"
            alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)


            # case del only

            if(grepl("del", temp[[2]]) & !grepl("ins", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- begin
            ref <- str_extract(temp[[2]], pattern = "[A-Z]$")
            alt <- "-"

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)




            print(res)

            temp..1.. begin end ref alt
            1 chr7 140453144 140453145 A T
            2 chr7 140453135 140453136 A T
            3 chr7 140453135 140453136 TT CA
            4 chr20 31022287 31022287 - A
            5 chr19 13054614 13054614 G -





            share|improve this answer









            $endgroup$














            • $begingroup$
              Thank you all very much :).
              $endgroup$
              – justaguy
              1 hour ago















            1














            $begingroup$

            Here is an R solution, would probably be slower than the suggested Python solution though:



            bed <- readLines("bed.txt")

            res <- data.frame()

            for(i in bed)

            temp <- strsplit(i, ":") %>% unlist()

            # case N>N

            if(grepl(">", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer() - 1
            end <- begin + 1
            ref <- str_extract(temp[[2]], pattern = "[A-Z]")
            alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)


            # case del_ins

            if(grepl("ins", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- str_extract(temp[[2]], pattern = "_[:digit:]+")
            end <- substring(end, 2, nchar(end)) %>% as.integer()
            ref <- str_extract(temp[[2]], pattern = "[A-Z]+$")
            alt <- str_extract(temp[[2]], pattern = "[A-Z]+")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)



            # case dup

            if(grepl("dup", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- begin
            ref <- "-"
            alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)


            # case del only

            if(grepl("del", temp[[2]]) & !grepl("ins", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- begin
            ref <- str_extract(temp[[2]], pattern = "[A-Z]$")
            alt <- "-"

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)




            print(res)

            temp..1.. begin end ref alt
            1 chr7 140453144 140453145 A T
            2 chr7 140453135 140453136 A T
            3 chr7 140453135 140453136 TT CA
            4 chr20 31022287 31022287 - A
            5 chr19 13054614 13054614 G -





            share|improve this answer









            $endgroup$














            • $begingroup$
              Thank you all very much :).
              $endgroup$
              – justaguy
              1 hour ago













            1














            1










            1







            $begingroup$

            Here is an R solution, would probably be slower than the suggested Python solution though:



            bed <- readLines("bed.txt")

            res <- data.frame()

            for(i in bed)

            temp <- strsplit(i, ":") %>% unlist()

            # case N>N

            if(grepl(">", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer() - 1
            end <- begin + 1
            ref <- str_extract(temp[[2]], pattern = "[A-Z]")
            alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)


            # case del_ins

            if(grepl("ins", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- str_extract(temp[[2]], pattern = "_[:digit:]+")
            end <- substring(end, 2, nchar(end)) %>% as.integer()
            ref <- str_extract(temp[[2]], pattern = "[A-Z]+$")
            alt <- str_extract(temp[[2]], pattern = "[A-Z]+")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)



            # case dup

            if(grepl("dup", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- begin
            ref <- "-"
            alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)


            # case del only

            if(grepl("del", temp[[2]]) & !grepl("ins", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- begin
            ref <- str_extract(temp[[2]], pattern = "[A-Z]$")
            alt <- "-"

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)




            print(res)

            temp..1.. begin end ref alt
            1 chr7 140453144 140453145 A T
            2 chr7 140453135 140453136 A T
            3 chr7 140453135 140453136 TT CA
            4 chr20 31022287 31022287 - A
            5 chr19 13054614 13054614 G -





            share|improve this answer









            $endgroup$



            Here is an R solution, would probably be slower than the suggested Python solution though:



            bed <- readLines("bed.txt")

            res <- data.frame()

            for(i in bed)

            temp <- strsplit(i, ":") %>% unlist()

            # case N>N

            if(grepl(">", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer() - 1
            end <- begin + 1
            ref <- str_extract(temp[[2]], pattern = "[A-Z]")
            alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)


            # case del_ins

            if(grepl("ins", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- str_extract(temp[[2]], pattern = "_[:digit:]+")
            end <- substring(end, 2, nchar(end)) %>% as.integer()
            ref <- str_extract(temp[[2]], pattern = "[A-Z]+$")
            alt <- str_extract(temp[[2]], pattern = "[A-Z]+")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)



            # case dup

            if(grepl("dup", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- begin
            ref <- "-"
            alt <- str_extract(temp[[2]], pattern = "[A-Z]$")

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)


            # case del only

            if(grepl("del", temp[[2]]) & !grepl("ins", temp[[2]]))

            begin <- str_extract(temp[[2]], pattern = "^[:digit:]+") %>% as.integer()
            end <- begin
            ref <- str_extract(temp[[2]], pattern = "[A-Z]$")
            alt <- "-"

            temp_res <- data.frame(temp[[1]], begin, end, ref, alt)
            res <- rbind(res, temp_res)




            print(res)

            temp..1.. begin end ref alt
            1 chr7 140453144 140453145 A T
            2 chr7 140453135 140453136 A T
            3 chr7 140453135 140453136 TT CA
            4 chr20 31022287 31022287 - A
            5 chr19 13054614 13054614 G -






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 7 hours ago









            hacihaci

            6031 silver badge11 bronze badges




            6031 silver badge11 bronze badges














            • $begingroup$
              Thank you all very much :).
              $endgroup$
              – justaguy
              1 hour ago
















            • $begingroup$
              Thank you all very much :).
              $endgroup$
              – justaguy
              1 hour ago















            $begingroup$
            Thank you all very much :).
            $endgroup$
            – justaguy
            1 hour ago




            $begingroup$
            Thank you all very much :).
            $endgroup$
            – justaguy
            1 hour ago


















            draft saved

            draft discarded















































            Thanks for contributing an answer to Bioinformatics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f10454%2fsplit-1-column-input-into-5-column-bed-file%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

            Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

            François Viète Contents Biography Work and thought Bibliography See also Notes Further reading External links Navigation menup. 21Google Bookspp. 75–77Google BooksDe thou (from University of Saint Andrews)ArchivedGoogle BooksGoogle BooksGoogle BooksGoogle booksGoogle Bookscc-parthenay.frL'histoire universelle (fr)Universal History (en)ArchivedAdsabs.harvard.eduPagesperso-orange.frArchive.orgChikara Sasaki. Descartes' mathematical thought p.259Google BooksGoogle BooksGoogle Bookspp. 152 and onwardGoogle BooksGoogle BooksScribd.comGoogle Books1257-7979Google BooksGoogle BooksGoogle BooksGoogle BooksGoogle BooksGoogle BooksGallica.bnf.frGoogle BooksGoogle Books"François Viète"Francois Viète: Father of Modern Algebraic NotationThe Lawyer and the GamblerAbout TarporleySite de Jean-Paul GuichardL'algèbre nouvelle"About the Harmonicon"cb120511976(data)1188044800000 0001 0913 5903n82164680ola2013766880073431702w6vt1sb70287374827140948071409480