IBM chemical dump to boost PubChem

IBM is contributing a vast chemical data store of 2.4 million compounds to the National Institutes of Health to help accelerate drug discovery. The data has been extracted from 4.7 million patents and the scientific literature of 11 million biomedical journal abstracts from 1976 to 2000 and will be added to PubChem, according to C&EN.

Some observers have pointed out there will inevitably be much duplication, but others have suggested that redundancy is not a problem. It will be interesting to see whether ChemSpider can scrape this data into its system too.