How to get the SMILES of all compounds on PubChem?Pubchem, InChI, SMILES, and uniquenessPubchem: list all compounds for which Kovats retention indices are availableSoftware for compound database creationenumerationInferring bond breakage in a reaction from InChI strings of reactans?Finding vendor information programmaticallyWhat is the best way to convert SMILES strings to file names?How to index molecules in computer programs?Is it possible to build or embed the SMILES representation of compounds in 3D?

Is fascism intrinsically violent?

Will the size of Bitcoin core full-node be too big to run on a normal computer?

Which collation should I use for biblical Hebrew?

How to get the SMILES of all compounds on PubChem?

SSD or HDD for server

Why are Starfleet vessels designed with nacelles so far away from the hull?

33 Months on Death Row

Does the warlock's Gift of the Ever-Living Ones eldritch invocation work with potions or healing spells cast on you by others?

How does Firefox know my ISP login page?

Why are KDFs slow? Is using a KDF more secure than using the original secret?

Do I need to explicitly handle negative numbers or zero when summing squared digits?

Is a light year a different distance if measured from a moving object?

What is the design rationale for having armor and magic penetration mechanics?

Rule of thumb: how far before changing my chain to prevent cassette wear

How do I find files with no group permissions?

How is Smough's name pronounced?

How did Ron get five hundred Chocolate Frog cards?

Proofreading a novel: is it okay to use a question mark with an exclamation mark - "?!"

Is Schrodinger's Cat itself an observer?

How should I tell a professor the answer to something he doesn't know?

How to protect my Wi-Fi password from being displayed by Android phones when sharing it with QR code?

How can AnyDVD destroy a DVD drive?

What is the German word for: "It only works when I try to show you how it does not work"?

Is it realistic that an advanced species isn't good at war?



How to get the SMILES of all compounds on PubChem?


Pubchem, InChI, SMILES, and uniquenessPubchem: list all compounds for which Kovats retention indices are availableSoftware for compound database creationenumerationInferring bond breakage in a reaction from InChI strings of reactans?Finding vendor information programmaticallyWhat is the best way to convert SMILES strings to file names?How to index molecules in computer programs?Is it possible to build or embed the SMILES representation of compounds in 3D?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









5














$begingroup$


I would like to download all the compounds from ChEMBL and PubChem. For ChEMBL this is easy using their webUI. Nevertheless, for PubChem, it isn't clear how to download all the compounds on the database including their SMILES representations.



I am also not sure how pubchem has 33,000,000 compounds where ChEMBL has "only" 2,000,000.










share|improve this question











$endgroup$











  • 1




    $begingroup$
    Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
    $endgroup$
    – Martin - マーチン
    10 hours ago

















5














$begingroup$


I would like to download all the compounds from ChEMBL and PubChem. For ChEMBL this is easy using their webUI. Nevertheless, for PubChem, it isn't clear how to download all the compounds on the database including their SMILES representations.



I am also not sure how pubchem has 33,000,000 compounds where ChEMBL has "only" 2,000,000.










share|improve this question











$endgroup$











  • 1




    $begingroup$
    Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
    $endgroup$
    – Martin - マーチン
    10 hours ago













5












5








5





$begingroup$


I would like to download all the compounds from ChEMBL and PubChem. For ChEMBL this is easy using their webUI. Nevertheless, for PubChem, it isn't clear how to download all the compounds on the database including their SMILES representations.



I am also not sure how pubchem has 33,000,000 compounds where ChEMBL has "only" 2,000,000.










share|improve this question











$endgroup$




I would like to download all the compounds from ChEMBL and PubChem. For ChEMBL this is easy using their webUI. Nevertheless, for PubChem, it isn't clear how to download all the compounds on the database including their SMILES representations.



I am also not sure how pubchem has 33,000,000 compounds where ChEMBL has "only" 2,000,000.







aromatic-compounds cheminformatics






share|improve this question















share|improve this question













share|improve this question




share|improve this question



share|improve this question








edited 10 hours ago







0x90

















asked 14 hours ago









0x900x90

3191 silver badge13 bronze badges




3191 silver badge13 bronze badges










  • 1




    $begingroup$
    Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
    $endgroup$
    – Martin - マーチン
    10 hours ago












  • 1




    $begingroup$
    Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
    $endgroup$
    – Martin - マーチン
    10 hours ago







1




1




$begingroup$
Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
$endgroup$
– Martin - マーチン
10 hours ago




$begingroup$
Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
$endgroup$
– Martin - マーチン
10 hours ago










1 Answer
1






active

oldest

votes


















5
















$begingroup$

The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.



For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz



You can also download subsets using the PubChem Structure Download service



And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.






share|improve this answer










$endgroup$










  • 1




    $begingroup$
    One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
    $endgroup$
    – Geoff Hutchison
    9 hours ago












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "431"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);














draft saved

draft discarded
















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fchemistry.stackexchange.com%2fquestions%2f122109%2fhow-to-get-the-smiles-of-all-compounds-on-pubchem%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown


























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









5
















$begingroup$

The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.



For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz



You can also download subsets using the PubChem Structure Download service



And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.






share|improve this answer










$endgroup$










  • 1




    $begingroup$
    One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
    $endgroup$
    – Geoff Hutchison
    9 hours ago















5
















$begingroup$

The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.



For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz



You can also download subsets using the PubChem Structure Download service



And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.






share|improve this answer










$endgroup$










  • 1




    $begingroup$
    One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
    $endgroup$
    – Geoff Hutchison
    9 hours ago













5














5










5







$begingroup$

The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.



For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz



You can also download subsets using the PubChem Structure Download service



And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.






share|improve this answer










$endgroup$



The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.



For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz



You can also download subsets using the PubChem Structure Download service



And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.







share|improve this answer













share|improve this answer




share|improve this answer



share|improve this answer










answered 9 hours ago









Geoff HutchisonGeoff Hutchison

21.7k3 gold badges56 silver badges117 bronze badges




21.7k3 gold badges56 silver badges117 bronze badges










  • 1




    $begingroup$
    One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
    $endgroup$
    – Geoff Hutchison
    9 hours ago












  • 1




    $begingroup$
    One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
    $endgroup$
    – Geoff Hutchison
    9 hours ago







1




1




$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago




$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago


















draft saved

draft discarded















































Thanks for contributing an answer to Chemistry Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fchemistry.stackexchange.com%2fquestions%2f122109%2fhow-to-get-the-smiles-of-all-compounds-on-pubchem%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown









Popular posts from this blog

Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

Tom Holland Mục lục Đầu đời và giáo dục | Sự nghiệp | Cuộc sống cá nhân | Phim tham gia | Giải thưởng và đề cử | Chú thích | Liên kết ngoài | Trình đơn chuyển hướngProfile“Person Details for Thomas Stanley Holland, "England and Wales Birth Registration Index, 1837-2008" — FamilySearch.org”"Meet Tom Holland... the 16-year-old star of The Impossible""Schoolboy actor Tom Holland finds himself in Oscar contention for role in tsunami drama"“Naomi Watts on the Prince William and Harry's reaction to her film about the late Princess Diana”lưu trữ"Holland and Pflueger Are West End's Two New 'Billy Elliots'""I'm so envious of my son, the movie star! British writer Dominic Holland's spent 20 years trying to crack Hollywood - but he's been beaten to it by a very unlikely rival"“Richard and Margaret Povey of Jersey, Channel Islands, UK: Information about Thomas Stanley Holland”"Tom Holland to play Billy Elliot""New Billy Elliot leaving the garage"Billy Elliot the Musical - Tom Holland - Billy"A Tale of four Billys: Tom Holland""The Feel Good Factor""Thames Christian College schoolboys join Myleene Klass for The Feelgood Factor""Government launches £600,000 arts bursaries pilot""BILLY's Chapman, Holland, Gardner & Jackson-Keen Visit Prime Minister""Elton John 'blown away' by Billy Elliot fifth birthday" (video with John's interview and fragments of Holland's performance)"First News interviews Arrietty's Tom Holland"“33rd Critics' Circle Film Awards winners”“National Board of Review Current Awards”Bản gốc"Ron Howard Whaling Tale 'In The Heart Of The Sea' Casts Tom Holland"“'Spider-Man' Finds Tom Holland to Star as New Web-Slinger”lưu trữ“Captain America: Civil War (2016)”“Film Review: ‘Captain America: Civil War’”lưu trữ“‘Captain America: Civil War’ review: Choose your own avenger”lưu trữ“The Lost City of Z reviews”“Sony Pictures and Marvel Studios Find Their 'Spider-Man' Star and Director”“‘Mary Magdalene’, ‘Current War’ & ‘Wind River’ Get 2017 Release Dates From Weinstein”“Lionsgate Unleashing Daisy Ridley & Tom Holland Starrer ‘Chaos Walking’ In Cannes”“PTA's 'Master' Leads Chicago Film Critics Nominations, UPDATED: Houston and Indiana Critics Nominations”“Nominaciones Goya 2013 Telecinco Cinema – ENG”“Jameson Empire Film Awards: Martin Freeman wins best actor for performance in The Hobbit”“34th Annual Young Artist Awards”Bản gốc“Teen Choice Awards 2016—Captain America: Civil War Leads Second Wave of Nominations”“BAFTA Film Award Nominations: ‘La La Land’ Leads Race”“Saturn Awards Nominations 2017: 'Rogue One,' 'Walking Dead' Lead”Tom HollandTom HollandTom HollandTom Hollandmedia.gettyimages.comWorldCat Identities300279794no20130442900000 0004 0355 42791085670554170004732cb16706349t(data)XX5557367