Continuing from what I wrote in my article An R Script to Automatically download PubMed Citation Counts By Year of Publication, I’ve now looked at article counts for journals. I did this by extending my script to download the complete article records in XML from inside R. Then I simply extracted journal names from each article and counted how many times different journal occurred. What’s cool about my new script is that it’s possible to extract any PubMed field; so more data will surely follow in subsequent articles.
Specifically, I looked at journals related to Cognitive Behavior Therapy. I did tree searches, one for 2010, one for 2011 and lastly one for the total year range. By doing this I could visualize which journals had the most publications in total and compare that to recent years.
I didn’t care to use a really comprehensive query, as I was more interested in testing my script. Nonetheless, the base query looked like this:
*"cognitive behavior therapy" OR "cognitive behavioral therapy" OR "cognitive therapy"*
The total number of hits for all years was 14342 hits, from which I extracted 1667 different journals. After extracting the top 20 journals with the most publications over time I made these two graphs.
As you can see the Journal of Consulting and Clinical Psychology has had the most publications over time, but Behavior Research and Therapy has had more publications in recent years. This is not really that surprising, since the Journal of Consulting and Clinical Psychology has existed since 1937 whilst Behavior Research and Therapy started out in 1963.
Where’s the R code?
My intention is always to include my complete R code to my posts, but I didn’t have time to clean up the code today. But until I post my code, please play around with my original script to look at annual PubMed citations.
Update: Here's the R script: How to download complete XML records from PubMed and extract data