How to get the SMILES of all compounds on PubChem?Pubchem, InChI, SMILES, and uniquenessPubchem: list all compounds for which Kovats retention indices are availableSoftware for compound database creationenumerationInferring bond breakage in a reaction from InChI strings of reactans?Finding vendor information programmaticallyWhat is the best way to convert SMILES strings to file names?How to index molecules in computer programs?Is it possible to build or embed the SMILES representation of compounds in 3D?
Is fascism intrinsically violent?
Will the size of Bitcoin core full-node be too big to run on a normal computer?
Which collation should I use for biblical Hebrew?
How to get the SMILES of all compounds on PubChem?
SSD or HDD for server
Why are Starfleet vessels designed with nacelles so far away from the hull?
33 Months on Death Row
Does the warlock's Gift of the Ever-Living Ones eldritch invocation work with potions or healing spells cast on you by others?
How does Firefox know my ISP login page?
Why are KDFs slow? Is using a KDF more secure than using the original secret?
Do I need to explicitly handle negative numbers or zero when summing squared digits?
Is a light year a different distance if measured from a moving object?
What is the design rationale for having armor and magic penetration mechanics?
Rule of thumb: how far before changing my chain to prevent cassette wear
How do I find files with no group permissions?
How is Smough's name pronounced?
How did Ron get five hundred Chocolate Frog cards?
Proofreading a novel: is it okay to use a question mark with an exclamation mark - "?!"
Is Schrodinger's Cat itself an observer?
How should I tell a professor the answer to something he doesn't know?
How to protect my Wi-Fi password from being displayed by Android phones when sharing it with QR code?
How can AnyDVD destroy a DVD drive?
What is the German word for: "It only works when I try to show you how it does not work"?
Is it realistic that an advanced species isn't good at war?
How to get the SMILES of all compounds on PubChem?
Pubchem, InChI, SMILES, and uniquenessPubchem: list all compounds for which Kovats retention indices are availableSoftware for compound database creationenumerationInferring bond breakage in a reaction from InChI strings of reactans?Finding vendor information programmaticallyWhat is the best way to convert SMILES strings to file names?How to index molecules in computer programs?Is it possible to build or embed the SMILES representation of compounds in 3D?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
$begingroup$
I would like to download all the compounds from ChEMBL and PubChem. For ChEMBL this is easy using their webUI. Nevertheless, for PubChem, it isn't clear how to download all the compounds on the database including their SMILES representations.
I am also not sure how pubchem has 33,000,000 compounds where ChEMBL has "only" 2,000,000.
aromatic-compounds cheminformatics
$endgroup$
add a comment
|
$begingroup$
I would like to download all the compounds from ChEMBL and PubChem. For ChEMBL this is easy using their webUI. Nevertheless, for PubChem, it isn't clear how to download all the compounds on the database including their SMILES representations.
I am also not sure how pubchem has 33,000,000 compounds where ChEMBL has "only" 2,000,000.
aromatic-compounds cheminformatics
$endgroup$
1
$begingroup$
Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
$endgroup$
– Martin - マーチン♦
10 hours ago
add a comment
|
$begingroup$
I would like to download all the compounds from ChEMBL and PubChem. For ChEMBL this is easy using their webUI. Nevertheless, for PubChem, it isn't clear how to download all the compounds on the database including their SMILES representations.
I am also not sure how pubchem has 33,000,000 compounds where ChEMBL has "only" 2,000,000.
aromatic-compounds cheminformatics
$endgroup$
I would like to download all the compounds from ChEMBL and PubChem. For ChEMBL this is easy using their webUI. Nevertheless, for PubChem, it isn't clear how to download all the compounds on the database including their SMILES representations.
I am also not sure how pubchem has 33,000,000 compounds where ChEMBL has "only" 2,000,000.
aromatic-compounds cheminformatics
aromatic-compounds cheminformatics
edited 10 hours ago
0x90
asked 14 hours ago
0x900x90
3191 silver badge13 bronze badges
3191 silver badge13 bronze badges
1
$begingroup$
Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
$endgroup$
– Martin - マーチン♦
10 hours ago
add a comment
|
1
$begingroup$
Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
$endgroup$
– Martin - マーチン♦
10 hours ago
1
1
$begingroup$
Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
$endgroup$
– Martin - マーチン♦
10 hours ago
$begingroup$
Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
$endgroup$
– Martin - マーチン♦
10 hours ago
add a comment
|
1 Answer
1
active
oldest
votes
$begingroup$
The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.
For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz
You can also download subsets using the PubChem Structure Download service
And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.
$endgroup$
1
$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "431"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fchemistry.stackexchange.com%2fquestions%2f122109%2fhow-to-get-the-smiles-of-all-compounds-on-pubchem%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.
For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz
You can also download subsets using the PubChem Structure Download service
And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.
$endgroup$
1
$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago
add a comment
|
$begingroup$
The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.
For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz
You can also download subsets using the PubChem Structure Download service
And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.
$endgroup$
1
$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago
add a comment
|
$begingroup$
The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.
For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz
You can also download subsets using the PubChem Structure Download service
And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.
$endgroup$
The best way to download bulk data from PubChem is actually FTP, as documented in their documentation.
For example, if you want the unfiltered SMILES of every CID in PubChem, the URL is ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz
You can also download subsets using the PubChem Structure Download service
And as mentioned above, there are Python and other tools to access both PubChem and ChemSpider through their documented web APIs.
answered 9 hours ago
Geoff HutchisonGeoff Hutchison
21.7k3 gold badges56 silver badges117 bronze badges
21.7k3 gold badges56 silver badges117 bronze badges
1
$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago
add a comment
|
1
$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago
1
1
$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago
$begingroup$
One thing to keep in mind.. PubChem intentionally has duplicates - there are SMILES with no stereochemical information and the same SMILES with correct stereo. So depending on your use, you may want to do some level of standardization, filtering and validation.
$endgroup$
– Geoff Hutchison
9 hours ago
add a comment
|
Thanks for contributing an answer to Chemistry Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fchemistry.stackexchange.com%2fquestions%2f122109%2fhow-to-get-the-smiles-of-all-compounds-on-pubchem%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Have a look at github.com/mcs07/PubChemPy. Maybe you can find some application of it. And related for ChemSpider: github.com/mcs07/ChemSpiPy
$endgroup$
– Martin - マーチン♦
10 hours ago