Crystallography database flags nearly 1000 structures linked to a paper mill

A chemistry database of crystal structures has marked nearly 1000 entries with expressions of concern after finding they were linked to articles identified as products of a paper mill. 

The Cambridge Crystallographic Data Centre (CCDC) added notes to 992 structures in its database, according to a notice posted to its website in May. And a crystallography researcher tells us the impact on the field could be significant.

The notes state: 

This structure is currently under review following a 2022 study of a prolific papermill  https://doi.org/10.21203/rs.3.rs-1537438/v1. 

A description of the paper mill, published as a preprint in April, identifies 648 crystallography papers that commenters on PubPeer flagged between December 2021 and January 2022, and an additional 162 papers in journals that do not give DOIs. A list of the papers involved can be found here. Nine papers and 12 crystal structures have been retracted. 

In the preprint’s abstract, David Bimler wrote: 

Here we propose that at least 800 publications in crystallography and exotic-chemistry journals, from the period 2015-2022, are also the work of a prolific papermill specialising in imaginary Metal-Organic Frameworks and their wholly invented therapeutic applications. The mill is characterised by recycled images, and by oddities of wording in Methods sections, but its most obvious hallmarks appear in Reference sections, with citations to irrelevant research from remote fields of science.  

The preprint is marked as “under revision” at a journal whose editors recommended a major revision after peer review.

The crystal structures from the papers – sometimes more than one in an article – were deposited in the CCDC’s Cambridge Structural Database, which its website describes as “the world’s repository for small-molecule organic and metal-organic crystal structures” with “over one million structures from x-ray and neutron diffraction analyses.” 

On April 28th, the CCDC posted a notice saying it had begun investigating “potentially fraudulent data”: 

Following the recent pre-print publication on Research Square (https://doi.org/10.21203/rs.3.rs-1537438/v1) we have begun investigations into potentially fraudulent data in the Cambridge Structural Database (CSD).

At the time of writing, we have so far identified 620 structures in the CSD which are associated to publications named in the pre-print. Our database team and data integrity scientist are working with our contacts at the other databases and the journals involved to fully understand the issue.

We take this matter very seriously. The CSD aims to be an accurate reflection of the literature, and so we will follow developments closely with the journals involved, and retract data when appropriate.

We are also working to add a note to all affected structures which will be visible on WebCSD once complete.

Our investigation to understand the issue and the root cause continues, internally and with our partners. We will post updates here as we learn more.

We acknowledge and appreciate the work the author of the pre-print has undertaken, as well as others that have taken efforts to alert us when issues like this arise.

The May update states the new number of structures linked to the papermill (992) and includes an example of the cautionary notes. 

We emailed the CCDC with some questions and received a detailed statement from Suzanna Ward, Head of Data and Community. After reiterating thanks to the author of the preprint, Ward said:

We had already become suspicious of some structures during the course of our normal checks, but on learning about the pre-print from Retraction Watch and the community, we launched a larger investigation.

We included a link to the preprint in Weekend Reads on April 23, and began tweeting about it on April 25. The CCDC investigation began on April 28th, with adding the notes to the structures as an “initial step.” 

The CCDC currently retracts data from the CSD only if the scientific article that goes with the entry has been retracted, according to its statement. Out of nearly 1.2 million entries in the database, less than 300 400 have been retracted along with related papers. [See update at end of post.]

Regarding the paper mill investigation: 

Of the 992 structures initially flagged, 12 have been retracted from the CSD as the accompanying 9 papers have been retracted. When data is retracted from the CSD the entry with associated bibliographic details still exists, but the scientific content is removed and the entry is linked to the retraction notice in the literature. We have relationships and workflows with all the major publishers to help keep us informed on retractions and we also monitor resources like Retraction Watch…

For the remaining 980 structures implicated by the papermill pre-print, although we reflect data in scientific articles, we are assessing if any structures require further investigation. We have extended our existing data integrity checks to do more extensive analysis and will be following COPE guidelines and communicating with publishers and authors when appropriate depending on the outcomes of this analysis. Our investigations include more in-depth analysis of the underlying datasets and comparisons between the datasets and the CSD.

Our team of PhD-level editors and deposition coordinators are involved in the investigation, including our dedicated Data Integrity Scientist. We have also reached out to other experts in the crystallographic community who are supporting us in these efforts. Going forward we are adding more automated checks which will help us to identify and prioritize structures which need further manual examination.

Since publication retraction timescales can take time and our more extensive checks will also take time, we made the decision to add the additional editorial comment so our users can be informed while the investigation is ongoing. We can also provide a complete refcode list of the structures affected on request, should users wish to exclude these from their work.

We emailed Springer Nature, which had published 130 papers implicated in the paper mill, and Taylor & Francis, which had published 150 of the papers in the journal Inorganic and Nano-Metal Chemistry, asking if they had plans to act on the findings. 

A Springer Nature spokesperson provided this statement from research integrity director Chris Graf:

Our integrity team hadn’t, prior to your email, been informed by the authors of the paper about their concerns. Now we are aware we will look into the authors’ concerns.

A Taylor & Francis spokesperson confirmed the publisher is “actively investigating a large number of articles published in the journal Inorganic and Nano-Metal Chemistry.”

Two implicated articles published in the journal have already been retracted and one has been corrected, all at the authors’ request, the spokesperson said, and the articles are part of the ongoing investigation: 

Our investigation originated with an internal audit we ran in 2021 and was expanded following concerns raised to us by researchers and by the Problematic Paper Screener disclosure earlier this year. We continue to work through the papers with due diligence and following COPE guidelines.

The field has had issues with fraud before. A decade ago, the journal Acta Crystallographica Section E retracted 80 papers with fake structures. In another case, the federal Office of Research Integrity banned a prominent protein crystallographer from receiving federal funding for 10 years for fabricating research. 

Retraction Watch asked Sylvain Bernès, a researcher at the Benemérita Universidad Autónoma de Puebla in Mexico who posted on PubPeer about the expressions of concern, about how common it is for crystallography databases to issue expressions of concern for structures. 

He told us: 

To the best of my knowledge, this is the first time a crystallographic database releases EOCs. This is by no way a common practice, since, generally speaking, an entry is present or not present in a database. A fuzzy mid-point status makes little sense. However, I think that the CCDC was surprised by the amount of involved structures: 99% of the papers from this particular papermill report one or several X-ray structures, which were systematically deposited in the CSD.

Bernès says that if the CCDC does end up retracting the structures, the fallout for the field could be significant: 

Potentially, the full community of chemical crystallographers could be impacted if these papers are eventually retracted, because the confidence in the validation processes normally used for any article including X-ray structures will be seriously deteriorated. On the other hand, a database tainted with wrong data loses its value. For example, data mining is an important sub-field in chemical crystallography, related to structure prediction, calibration and benchmarking for computational chemistry software, etc. For obvious reasons, data mining strongly rely on large and unpolluted databases.

Again, as of now, the CCDC only issued EOCs. Hopefully, none of the flagged structures will be retracted. However, from what I have seen in the structures I reviewed to date, the opposite is much more likely to occur…

Update, 2130 UTC, 7/28/22: Updated number of retracted CCDC entries after the organization told us they re-ran the numbers after giving us their statement and realized they had provided an incorrect figure.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].

9 thoughts on “Crystallography database flags nearly 1000 structures linked to a paper mill”

  1. “Inorganic and nano-metal chemistry” is an absurd title for a journal from a (supposedly) reputable publisher.

  2. It is hard to overstate how bad the 800-odd papers are, and also hard to understand how they were accepted for publication without some kind of insider assistance. Journals accepted paper after paper that describe animal vivisection without meaningful ethical approval. Either the editors don’t think that ethical approval matters, or they knew that the experiments were purely imaginary but accepted the papers anyway.

    1. Perhaps the most outrageous is that the vast majority of these articles claim progress in treatments for a myriad of conditions, like viral myocarditis, sepsis, schizophrenia, Alzheimer disease, Parkinson’s disease, periodontitis, myofascial pain syndrome, chronic obstructive pulmonary disease, acute tracheobronchitis, tuberculosis, optic neuritis, chronic enteritis, endometriosis, coronary artery disease, renal failure, any kind of cancer, influenza A, atherosclerosis, and so on (the list is VERY long!). This papermill is seriously offending people suffering from such illnesses (and btw, my understanding is that many authors are physicians; so, what about their Hippocratic oath?). Smut Clyde is questioning the accountability of editors, however, I would go even further: I cannot understand why the Editorial Boards of these journals don’t resign en masse.

    2. or they knew that the experiments were purely imaginary but accepted the papers anyway.

      In this light, one is led to reconsider Marianne Moore’s poem that concludes

      nor is it valid
      to discriminate against “business documents and

      school-books”; all these phenomena are important. One must make a distinction
      however: when dragged into prominence by half poets, the result is not poetry,
      nor till the autocrats among us can be
      “literalists of
      the imagination”—above
      insolence and triviality and can present

      for inspection, imaginary gardens with real toads in them, shall we have
      it.

      Of course, her topic is Poetry, not Crystallography,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.