<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R Psychologist</title>
	<atom:link href="http://rpsychologist.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://rpsychologist.com</link>
	<description></description>
	<lastBuildDate>Sat, 01 Jun 2013 14:52:48 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Visualizing a One-Way ANOVA using D3.js</title>
		<link>http://rpsychologist.com/d3-one-way-anova/</link>
		<comments>http://rpsychologist.com/d3-one-way-anova/#comments</comments>
		<pubDate>Fri, 31 May 2013 15:06:08 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[D3.js]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1617</guid>
		<description><![CDATA[A while ago I was playing around with the javascript package D3.js, and I began with this visualization—that I never really finished—of how a one-way ANOVA is calculated. I tried to make it look like a plot from ggplot2 except with interactive elements. Take a look at it after the jump <a href="http://rpsychologist.com/d3-one-way-anova/">Read more</a>]]></description>
				<content:encoded><![CDATA[<p>A while ago I was playing around with the JavaScript package <a href="http://d3js.org" target="_blank">D3.js</a>, and I began with this visualization—that I never really finished—of how a one-way ANOVA is calculated. I wanted to make the visualization interactive, and I did integrate some interactive elements. For instance, if you hover over a data point it will show the residual, and its value will be highlighted in the combined computation. The circle diagram show the partitioning of the sums of squares, and if you hover a part it will show from where the variation is coming. I tried to make the plots look like plots from the R-package ggplot2. </p>
<p><em>These plots are not designed to work on mobile phones.</em><br />
<iframe src="http://rpsychologist.com/d3/one-way-ANOVA/total_ss.html" id="d3" scrolling="yes" frameborder="0" allowtransparency="true" style="border: none; max-width: 100%; min-width: 180px;" title="d3" width=100% height=570px></iframe><br />
<iframe src="http://rpsychologist.com/d3/one-way-ANOVA/within_ss.html" id="d3" scrolling="yes" frameborder="0" allowtransparency="true" style="border: none; max-width: 100%; min-width: 180px;" title="d3" width=100% height=550px></iframe><br />
<iframe src="http://rpsychologist.com/d3/one-way-ANOVA/between_ss.html" id="d3" scrolling="yes" frameborder="0" allowtransparency="true" style="border: none; max-width: 100%; min-width: 180px;" title="d3" width=100% height=550px></iframe><br />
<h2>Let&#8217;s check the calculations in R</h2>
<p>To se if this works, let&#8217;s compute the ANOVA as I have described it here.
<pre class="brush: r; title: ; notranslate">
# data
grp1 &lt;- c(1,2,3,4)
grp2 &lt;- c(5,6,7,8)
grp3 &lt;- c(9,10,11,12)

# total SS
total_SS &lt;- sum((df$y - mean(df$y))^2)
total_SS
</pre>
<pre>[1] 143</pre>
<pre class="brush: r; title: ; notranslate">
# within groups SS
within_SS &lt;- sum((c(grp1 - mean(grp1), grp2 - mean(grp2), grp3 - mean(grp3)))^2)
within_SS
</pre>
<pre>[1] 15</pre>
<pre class="brush: r; title: ; notranslate">
# between groups
between_SS &lt;- 4*(sum((c(mean(grp1), mean(grp2), mean(grp3))^2 - mean(df$y)^2)))
between_SS
</pre>
<pre>[1] 128</pre>
<pre class="brush: r; title: ; notranslate">
# check calculation
between_SS + within_SS == total_SS
</pre>
<pre>[1] TRUE</pre>
<p>We see that <em>total_SS</em>, <em>between_SS</em> and <em>within_SS</em> are identical to what is shown above in the visualization. </p>
<pre class="brush: r; title: ; notranslate">
df1 &lt;- 3-1  # number of groups - 1
df2 &lt;- 12 - 3 # N - number of groups
F &lt;-(between_SS/df1) / (within_SS/df2)
F
</pre>
<pre>[1] 38.4</pre>
<pre class="brush: r; title: ; notranslate">
1-pf(F, df1, df2) # p-value
</pre>
<pre>[1] 3.921015e-05</pre>
<p>Let&#8217;s compare this to <tt>anova()</tt></p>
<pre class="brush: r; title: ; notranslate">
df &lt;- data.frame(y=c(grp1,grp2,grp3))
df$group &lt;- gl(3,4)
anova(lm(y ~ group, df))
</pre>
<pre>
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)    
group      2    128  64.000    38.4 3.921e-05 ***
Residuals  9     15   1.667                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
</pre>
<p>We have identical results. </p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/d3-one-way-anova/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Are the Current Criteria for Empirically Supported Treatments Too Lenient?</title>
		<link>http://rpsychologist.com/are-the-current-criteria-for-empirically-supported-treatments-too-lenient/</link>
		<comments>http://rpsychologist.com/are-the-current-criteria-for-empirically-supported-treatments-too-lenient/#comments</comments>
		<pubDate>Thu, 30 May 2013 09:36:37 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[Psychology]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Evidence Based Treatments]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1758</guid>
		<description><![CDATA[The practice of classifying treatments as empirically supported has been widely debated for a long time. In this post I write about a recent article that raises several concerns and suggestions regarding the current use of EST criteria—which can be summarized as the current criteria being too lenient, something that I wholeheartedly agree with. <a href="http://rpsychologist.com/are-the-current-criteria-for-empirically-supported-treatments-too-lenient/">Read more</a>]]></description>
				<content:encoded><![CDATA[<p>The practice of classifying treatments as empirically supported (ESTs) has been widely debated for a long time. Recently Jessica Nasser published an article in the <em>Journal of Contemporary Psychotherapy</em> named <em>“Empirically Supported Treatments and Efficacy Trials: What Steps Do We Still Need to Take?</em>”. In the article the author raises several concerns and suggestions regarding the current use of EST criteria—which can be summarized as the current criteria being too lenient, something that I wholeheartedly agree with.  Currently a treatment is regarded as “probably efficacious” if two different experiments show the treatment’s superiority over a wait-list condition. At least according the criteria proposed by Division 12 (Clinical Psychology) of the American Psychological Association. 	</p>
<p>In the article, Nasser outlines three main concerns and suggestions regarding the current criteria for ESTs, which are:</p>
<p>1) Wait-list and placebo control condition does not provide useful information. Instead active control conditions should be used. </p>
<p>2) The EST criteria do not take negative findings into considerations, nor do the criteria provide any provisions for removing treatments from the list. Nasser argues that this could be remedied by including all published findings in a meta-analysis, which would also provide a means of systematically updating the EST lists. </li>
<p>3) ESTs identified in RCTs lack external validity and clinical utility. Nasser’s concern is that trials are neglecting outcomes related to patients’ quality of life, interpersonal and work functioning and so on. The author’s suggestion is that more trials should link “&#8230; outcome measures, effect sizes, and statistical and clinical significance to real-life functioning and practical significance”.</p>
<p>I think these points are fair. However, I would like to add that the criteria should take into serious consideration if there is evidence for the proposed mechanism of change. Currently, treatments can claim to be working by magic and still qualify as an EST, even though the improvements seen in patients are obviously mediated by some other mechanism.  The classic example of this is Eye Movement Desensitization Therapy—which Nasser mentions—were the active mechanism probably is traditional desensitization. Moreover, I believe that the raw data should be made public before a treatment is considered empirically supported, so that the analyses can be validated and replicated. </p>
<p>Despite the shortcomings of the current EST criteria, I do believe that it is a worthy pursuit—mostly as a type of research synthesis to inform clinicians and decisions-makers. But in the criteria’s current form it is hard to not get the feeling that the epithet of “well-established” is basically meaningless.</p>
<p><span style="float: left; padding: 5px;"><a href="http://www.researchblogging.org"><img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/></a></span><span class="Z3988" title="ctx_ver=Z39.88-2004&#038;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&#038;rft.jtitle=Journal+of+Contemporary+Psychotherapy&#038;rft_id=info%3Adoi%2F10.1007%2Fs10879-013-9236-x&#038;rfr_id=info%3Asid%2Fresearchblogging.org&#038;rft.atitle=Empirically+Supported+Treatments+and+Efficacy+Trials%3A+What+Steps+Do+We+Still+Need+to+Take%3F&#038;rft.issn=0022-0116&#038;rft.date=2013&#038;rft.volume=&#038;rft.issue=&#038;rft.spage=&#038;rft.epage=&#038;rft.artnum=http%3A%2F%2Flink.springer.com%2F10.1007%2Fs10879-013-9236-x&#038;rft.au=Nasser%2C+J.&#038;rfe_dat=bpr3.included=1;bpr3.tags=Medicine%2CPsychology">Nasser, J. (2013). Empirically Supported Treatments and Efficacy Trials: What Steps Do We Still Need to Take? <span style="font-style: italic;">Journal of Contemporary Psychotherapy</span> DOI: <a rev="review" href="http://dx.doi.org/10.1007/s10879-013-9236-x">10.1007/s10879-013-9236-x</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/are-the-current-criteria-for-empirically-supported-treatments-too-lenient/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Creating a typical textbook illustration of statistical power using either ggplot or base graphics</title>
		<link>http://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics/</link>
		<comments>http://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics/#comments</comments>
		<pubDate>Sun, 26 May 2013 10:03:02 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Ggplot2]]></category>
		<category><![CDATA[normal distribution]]></category>
		<category><![CDATA[Power analysis]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1667</guid>
		<description><![CDATA[A common way of illustrating the idea behind statistical power in null hypothesis significance testing, is by plotting the sampling distributions of the null hypothesis and the alternative hypothesis. Typically, these illustrations highlight the regions that correspond to making a type II error, type I error and correctly rejecting the null hypothesis (i.e. the test's power). In this post I will show how to create such "power plots" using both ggplot and R's base graphics. 
 <a href="http://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics/">Read more</a>]]></description>
				<content:encoded><![CDATA[<p>A common way of illustrating the idea behind statistical power in null hypothesis significance testing, is by plotting the sampling distributions of the null hypothesis (<img src="//s0.wp.com/latex.php?latex=H_0&#038;bg=ffffff&#038;fg=000&#038;s=0" alt="H_0" title="H_0" class="latex" />) and the alternative hypothesis (<img src="//s0.wp.com/latex.php?latex=H_A&#038;bg=ffffff&#038;fg=000&#038;s=0" alt="H_A" title="H_A" class="latex" />). Typically, these illustrations highlight the regions that correspond to making a type II error (<img src="//s0.wp.com/latex.php?latex=%5Cbeta&#038;bg=ffffff&#038;fg=000&#038;s=0" alt="&#92;beta" title="&#92;beta" class="latex" />), type I error (<img src="//s0.wp.com/latex.php?latex=%5Calpha&#038;bg=ffffff&#038;fg=000&#038;s=0" alt="&#92;alpha" title="&#92;alpha" class="latex" />) and correctly rejecting the null hypothesis (i.e. the test&#8217;s power; <img src="//s0.wp.com/latex.php?latex=1+-+%5Cbeta&#038;bg=ffffff&#038;fg=000&#038;s=0" alt="1 - &#92;beta" title="1 - &#92;beta" class="latex" />). </p>
<p>In this post I will show how to create such &#8220;power plots&#8221; using R. Typically, I prefer to use <tt>ggplot</tt> for plotting, but tasks such as this is one of the few times were I think R&#8217;s base graphics have some merit—especially for creating black and white plots, since <tt>ggplot</tt> does not support using patterns. Thus, I will present code both for <tt>ggplot</tt> and base graphics.  </p>
<p>Creating these plots is pretty straight forwards. You only need to be vaguely familiar with the mechanics of plotting polygons. For instance, say we want to plot a triangle with the following coordinates. </p>
<pre>
       (2,3)
        /\
       /  \
      /    \
     /      \
(1,2) ------ (3,2)
</pre>
<p>Then we just specify x and y as vectors, like this:</p>
<pre class="brush: r; title: ; notranslate">
# ggplot polygon example
ggplot(data.frame(x=c(1,2,3),y=c(2,3,2)), aes(x,y)) + geom_polygon()
</pre>
<p><a href="http://rpsychologist.com/wp-content/uploads/2013/05/polygon_ggplot_example.png"><img src="http://rpsychologist.com/wp-content/uploads/2013/05/polygon_ggplot_example-300x271.png" alt="Example of plotting polygons with ggplot" width="300" height="271" class="aligncenter size-medium wp-image-1733" /></a><br />
So, let us begin by creating the data for the two distributions and three polygons that we will need. </p>
<pre class="brush: r; title: ; notranslate">
library(ggplot2)
library(grid) # need for arrow()
m1 &lt;- 0  # mu H0
sd1 &lt;- 1.5 # sigma H0
m2 &lt;- 3.5 # mu HA
sd2 &lt;- 1.5 # sigma HA

z_crit &lt;- qnorm(1-(0.05/2), m1, sd1)

# set length of tails
min1 &lt;- m1-sd1*4
max1 &lt;- m1+sd1*4
min2 &lt;- m2-sd2*4
max2 &lt;- m2+sd2*4          
# create x sequence
x &lt;- seq(min(min1,min2), max(max1, max2), .01)
# generate normal dist #1
y1 &lt;- dnorm(x, m1, sd1)
# put in data frame
df1 &lt;- data.frame(&quot;x&quot; = x, &quot;y&quot; = y1)
# generate normal dist #2
y2 &lt;- dnorm(x, m2, sd2)
# put in data frame
df2 &lt;- data.frame(&quot;x&quot; = x, &quot;y&quot; = y2)

# Alpha polygon
y.poly &lt;- pmin(y1,y2)
poly1 &lt;- data.frame(x=x, y=y.poly)
poly1 &lt;- poly1[poly1$x &gt;= z_crit, ] 
poly1&lt;-rbind(poly1, c(z_crit, 0))  # add lower-left corner

# Beta polygon
poly2 &lt;- df2
poly2 &lt;- poly2[poly2$x &lt;= z_crit,] 
poly2&lt;-rbind(poly2, c(z_crit, 0))  # add lower-left corner

# power polygon; 1-beta
poly3 &lt;- df2
poly3 &lt;- poly3[poly3$x &gt;= z_crit,] 
poly3 &lt;-rbind(poly3, c(z_crit, 0))  # add lower-left corner

# combine polygons. 
poly1$id &lt;- 3 # alpha, give it the highest number to make it the top layer
poly2$id &lt;- 2 # beta
poly3$id &lt;- 1 # power; 1 - beta
poly &lt;- rbind(poly1, poly2, poly3)
poly$id &lt;- factor(poly$id,  labels=c(&quot;power&quot;,&quot;beta&quot;,&quot;alpha&quot;))
</pre>
<p>Now that we have all the data that we need, let us create the first plot using <tt>ggplot</tt>. The annotation is set manually, so it will be a bit tedious to change these plots. </p>
<pre class="brush: r; title: ; notranslate"> 
# plot with ggplot2
ggplot(poly, aes(x,y, fill=id, group=id)) +
  geom_polygon(show_guide=F, alpha=I(8/10)) +
  # add line for treatment group
  geom_line(data=df1, aes(x,y, color=&quot;H0&quot;, group=NULL, fill=NULL), size=1.5, show_guide=F) + 
  # add line for treatment group. These lines could be combined into one dataframe.
  geom_line(data=df2, aes(color=&quot;HA&quot;, group=NULL, fill=NULL),size=1.5, show_guide=F) +
  # add vlines for z_crit
  geom_vline(xintercept = z_crit, size=1, linetype=&quot;dashed&quot;) +
  # change colors 
  scale_color_manual(&quot;Group&quot;, 
                     values= c(&quot;HA&quot; = &quot;#981e0b&quot;,&quot;H0&quot; = &quot;black&quot;)) +
  scale_fill_manual(&quot;test&quot;, values= c(&quot;alpha&quot; = &quot;#0d6374&quot;,&quot;beta&quot; = &quot;#be805e&quot;,&quot;power&quot;=&quot;#7cecee&quot;)) +
  # beta arrow
  annotate(&quot;segment&quot;, x=0.1, y=0.045, xend=1.3, yend=0.01, arrow = arrow(length = unit(0.3, &quot;cm&quot;)), size=1) +
  annotate(&quot;text&quot;, label=&quot;beta&quot;, x=0, y=0.05, parse=T, size=8) +
  # alpha arrow
  annotate(&quot;segment&quot;, x=4, y=0.043, xend=3.4, yend=0.01, arrow = arrow(length = unit(0.3, &quot;cm&quot;)), size=1) +
  annotate(&quot;text&quot;, label=&quot;frac(alpha,2)&quot;, x=4.2, y=0.05, parse=T, size=8) +
  # power arrow
  annotate(&quot;segment&quot;, x=6, y=0.2, xend=4.5, yend=0.15, arrow = arrow(length = unit(0.3, &quot;cm&quot;)), size=1) +
  annotate(&quot;text&quot;, label=&quot;1-beta&quot;, x=6.1, y=0.21, parse=T, size=8) +
  # H_0 title
  annotate(&quot;text&quot;, label=&quot;H[0]&quot;, x=m1, y=0.28, parse=T, size=8) +
  # H_a title
  annotate(&quot;text&quot;, label=&quot;H[a]&quot;, x=m2, y=0.28, parse=T, size=8) +
  ggtitle(&quot;Statistical Power Plots, Textbook-style&quot;) +
  # remove some elements
  theme(panel.grid.minor = element_blank(),
             panel.grid.major = element_blank(),
             panel.background = element_blank(),
             plot.background = element_rect(fill=&quot;#f9f0ea&quot;),
             panel.border = element_blank(),
             axis.line = element_blank(),
             axis.text.x = element_blank(),
             axis.text.y = element_blank(),
             axis.ticks = element_blank(),
             axis.title.x = element_blank(),
             axis.title.y = element_blank(),
             plot.title = element_text(size=22))

ggsave(&quot;stat_power_ggplot.png&quot;, height=8, width=13, dpi=72)
</pre>
<p><a href="http://rpsychologist.com/wp-content/uploads/2013/05/stat_power_ggplot.png"><img src="http://rpsychologist.com/wp-content/uploads/2013/05/stat_power_ggplot.png" alt="Illustrating the concept of statistical power using ggplot" width="936" height="576" class="aligncenter size-full wp-image-1736" /></a><br />
Now, if we want a more &#8220;classical looking&#8221; black and white-plot, we need to use base graphics. </p>
<pre class="brush: r; title: ; notranslate">
# example with base graphics
png(&quot;stat_power_base.png&quot;, width=900, height=600, units=&quot;px&quot;) # save as png
# reset
plot.new()
# set window size
plot.window(xlim=range(x), ylim=c(-0.01,0.3))
# add polygons
polygon(poly3,  density=10) # 1-beta
polygon(poly2, density=3, angle=-45, lty=&quot;dashed&quot;) # beta
polygon(poly1, density=10, angle=0) # alpha
# add h_a dist
lines(df2,lwd=3)
# add h_0 dist
lines(df1,lwd=3)
### annotations
# h_0 title
text(m1, 0.3, expression(H[0]), cex=1.5)
# h_a title
text(m2, 0.3, expression(H[a]), cex=1.5)
# beta annotation
arrows(x0=-1, y0=0.045, x1=1, y1=0.01,lwd=2,length=0.15)
text(-1.2, 0.045, expression(beta), cex=1.5)
# beta annotation
arrows(x0=4, y0=-0.01, x1=3.5, y1=0.01, lwd=2, length=0.15)
text(x=4.1, y=-0.015, expression(alpha/2), cex=1.5)
# 1-beta 
arrows(x0=6, y0=0.15, x1=5, y1=0.1, lwd=2,length=0.15)
text(x=7, y=0.155, expression(paste(1-beta, &quot;  (\&quot;power\&quot;)&quot;)), cex=1.5)
# show z_crit; start of rejection region
abline(v=z_crit)
# add bottom line
abline(h=0)
title(&quot;Statistical Power&quot;)
dev.off()
</pre>
<p><a href="http://rpsychologist.com/wp-content/uploads/2013/05/stat_power_base.png"><img src="http://rpsychologist.com/wp-content/uploads/2013/05/stat_power_base.png" alt="Illustrating statistical power using R&#039;s base graphics" width="900" height="600" class="aligncenter size-full wp-image-1738" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Working with shapefiles, projections and world maps in ggplot</title>
		<link>http://rpsychologist.com/working-with-shapefiles-projections-and-world-maps-in-ggplot/</link>
		<comments>http://rpsychologist.com/working-with-shapefiles-projections-and-world-maps-in-ggplot/#comments</comments>
		<pubDate>Thu, 23 May 2013 16:36:00 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Ggplot2]]></category>
		<category><![CDATA[map projection]]></category>
		<category><![CDATA[Maps]]></category>
		<category><![CDATA[rgdal]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1666</guid>
		<description><![CDATA[In this post I show some different examples of how to work with map projections and how to plot the maps using ggplot. Many maps that are using the default projection are shown in the longlat-format, which is far from optimal. Here I show how to use either the Robinson or Winkel Tripel projection. <a href="http://rpsychologist.com/working-with-shapefiles-projections-and-world-maps-in-ggplot/">Read more</a>]]></description>
				<content:encoded><![CDATA[<p>In this post I will show some different examples of how to work with map projections and how to plot the maps using <tt>ggplot</tt>. Many maps that are shown using their default projection are in the longlat-format, which is far from optimal. For plotting world maps I prefer to use either Robinson or Winkel Tripel projection—but many more are available—and I will show how to use both these projections. </p>
<p>Before we get started you need to download a couple of shapefiles that we will use. You can find them here:</p>
<ul>
<li><a href="http://www.naturalearthdata.com/downloads/110m-physical-vectors/110m-land/" title="http://www.naturalearthdata.com/downloads/110m-physical-vectors/110m-land/" target="_blank">http://www.naturalearthdata.com/downloads/110m-physical-vectors/110m-land/</a></li>
<li><a href="http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/" title="http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/" target="_blank">http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/</a></li>
<li><a href="http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-populated-places/" title="http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-populated-places/" target="_blank">http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-populated-places/</a></li>
<li><a href="http://www.naturalearthdata.com/downloads/110m-physical-vectors/110m-graticules/" title="http://www.naturalearthdata.com/downloads/110m-physical-vectors/110m-graticules/" target="_blank">ttp://www.naturalearthdata.com/downloads/110m-physical-vectors/110m-graticules/</a></li>
</ul>
<p>Put them directly inside your working directory. We will use functions from the <tt>rgdal</tt>-package to read the shapefiles into R, so if you do not have it, you need to install it before you continue. </p>
<pre class="brush: r; title: ; notranslate">
library(rgdal)
library(ggplot2)
setwd(&quot;/Users/kris/maps_ggplot&quot;)

# read shapefile
wmap &lt;- readOGR(dsn=&quot;ne_110m_land&quot;, layer=&quot;ne_110m_land&quot;)
# convert to dataframe
wmap_df &lt;- fortify(wmap)

# create a blank ggplot theme
theme_opts &lt;- list(theme(panel.grid.minor = element_blank(),
                        panel.grid.major = element_blank(),
                        panel.background = element_blank(),
                        plot.background = element_rect(fill=&quot;#e6e8ed&quot;),
                        panel.border = element_blank(),
                        axis.line = element_blank(),
                        axis.text.x = element_blank(),
                        axis.text.y = element_blank(),
                        axis.ticks = element_blank(),
                        axis.title.x = element_blank(),
                        axis.title.y = element_blank(),
                        plot.title = element_text(size=22)))

# plot map
ggplot(wmap_df, aes(long,lat, group=group)) + 
  geom_polygon() + 
  labs(title=&quot;World map (longlat)&quot;) + 
  coord_equal() + 
  theme_opts

ggsave(&quot;maps/map1.png&quot;,  width=12.5, height=8.25, dpi=72)
</pre>
<p>This will create a longlat-projected world map.<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map1.png" alt="World map in ggplot" width="900" height="594" class="aligncenter size-full wp-image-1675" /></p>
<pre class="brush: r; title: ; notranslate">
# reproject from longlat to robinson
wmap_robin &lt;- spTransform(wmap, CRS(&quot;+proj=robin&quot;))
wmap_df_robin &lt;- fortify(wmap_robin)
ggplot(wmap_df_robin, aes(long,lat, group=group)) + 
  geom_polygon() + 
  labs(title=&quot;World map (robinson)&quot;) + 
  coord_equal() +
  theme_opts

ggsave(&quot;maps/map2.png&quot;, width=12.5, height=8.25, dpi=72)
</pre>
<p>Here the world map is shown using the Robinson projection.<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map2.png" alt="World map in ggplot with robinson projection" width="900" height="594" class="aligncenter size-full wp-image-1677" /></p>
<pre class="brush: r; title: ; notranslate">
# show hole
ggplot(wmap_df_robin, aes(long,lat, group=group, fill=hole)) +
  geom_polygon() + 
  labs(title=&quot;World map (robin)&quot;) +
  coord_equal() + 
  theme_opts
ggsave(&quot;maps/map3.png&quot;, width=12.5, height=8.25, dpi=72)
</pre>
<p>However, the Caspian sea is missing. This is because of how ggplot handles polygon holes. Ggplot will plot polygon holes as a separate polygon, thus we need to make it pseudo-transparent by changing its fill color.<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map3.png" alt="World map in ggplot2 polygon hole example" width="900" height="594" class="aligncenter size-full wp-image-1678" />
<pre class="brush: r; title: ; notranslate">
# change colors
ggplot(wmap_df_robin, aes(long,lat, group=group, fill=hole)) + 
  geom_polygon() + 
  labs(title=&quot;World map (Robinson)&quot;) + 
  coord_equal() + 
  theme_opts +
  scale_fill_manual(values=c(&quot;#262626&quot;, &quot;#e6e8ed&quot;), guide=&quot;none&quot;) # change colors &amp; remove legend

ggsave(&quot;maps/map4.png&quot;, width=12.5, height=8.25, dpi=72)
</pre>
<p>Now the Caspian sea is visible.<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map4.png" alt="World map in ggplot polygon hole fix" width="900" height="594" class="aligncenter size-full wp-image-1679" />
<pre class="brush: r; title: ; notranslate">
# add graticule and bounding box (longlat)
grat &lt;- readOGR(&quot;ne_110m_graticules_all&quot;, layer=&quot;ne_110m_graticules_15&quot;) 
grat_df &lt;- fortify(grat)

bbox &lt;- readOGR(&quot;ne_110m_graticules_all&quot;, layer=&quot;ne_110m_wgs84_bounding_box&quot;) 
bbox_df&lt;- fortify(bbox)

ggplot(bbox_df, aes(long,lat, group=group)) + 
  geom_polygon(fill=&quot;white&quot;) +
  geom_polygon(data=wmap_df, aes(long,lat, group=group, fill=hole)) + 
  geom_path(data=grat_df, aes(long, lat, group=group, fill=NULL), linetype=&quot;dashed&quot;, color=&quot;grey50&quot;) +
  labs(title=&quot;World map + graticule (longlat)&quot;) + 
  coord_equal() + 
  theme_opts +
  scale_fill_manual(values=c(&quot;black&quot;, &quot;white&quot;), guide=&quot;none&quot;) # change colors &amp; remove legend

ggsave(&quot;maps/map5.png&quot;, width=12.5, height=8.25, dpi=72)
</pre>
<p>If we want we can also add a graticule and a bounding box. The bounding box is useful if we want to make the sea blue—especially when using some form of curved projection. Here I have added a graticule and bounding box to the longlat-map.<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map5.png" alt="World map in ggplot plus graticule and bounding box" width="900" height="594" class="aligncenter size-full wp-image-1680" />
<pre class="brush: r; title: ; notranslate">
# graticule (Robin)
grat_robin &lt;- spTransform(grat, CRS(&quot;+proj=robin&quot;))  # reproject graticule
grat_df_robin &lt;- fortify(grat_robin)
bbox_robin &lt;- spTransform(bbox, CRS(&quot;+proj=robin&quot;))  # reproject bounding box
bbox_robin_df &lt;- fortify(bbox_robin)

ggplot(bbox_robin_df, aes(long,lat, group=group)) + 
  geom_polygon(fill=&quot;white&quot;) +
  geom_polygon(data=wmap_df_robin, aes(long,lat, group=group, fill=hole)) + 
  geom_path(data=grat_df_robin, aes(long, lat, group=group, fill=NULL), linetype=&quot;dashed&quot;, color=&quot;grey50&quot;) +
  labs(title=&quot;World map (Robinson)&quot;) + 
  coord_equal() + 
  theme_opts +
  scale_fill_manual(values=c(&quot;black&quot;, &quot;white&quot;), guide=&quot;none&quot;) # change colors &amp; remove legend

ggsave(&quot;maps/map6.png&quot;, width=12.5, height=8.25, dpi=72)
</pre>
<p>Robinson projection with added graticule and bounding box.<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map6.png" alt="World map in ggplot using robinson projection with graticule and bounding box" width="900" height="594" class="aligncenter size-full wp-image-1681" />
<pre class="brush: r; title: ; notranslate">
# add country borders
countries &lt;- readOGR(&quot;ne_110m_admin_0_countries&quot;, layer=&quot;ne_110m_admin_0_countries&quot;) 
countries_robin &lt;- spTransform(countries, CRS(&quot;+init=ESRI:54030&quot;))
countries_robin_df &lt;- fortify(countries_robin)

ggplot(bbox_robin_df, aes(long,lat, group=group)) + 
  geom_polygon(fill=&quot;white&quot;) +
  geom_polygon(data=countries_robin_df, aes(long,lat, group=group, fill=hole)) + 
  geom_path(data=countries_robin_df, aes(long,lat, group=group, fill=hole), color=&quot;white&quot;, size=0.3) +
  geom_path(data=grat_df_robin, aes(long, lat, group=group, fill=NULL), linetype=&quot;dashed&quot;, color=&quot;grey50&quot;) +
  labs(title=&quot;World map (Robinson)&quot;) + 
  coord_equal() + 
  theme_opts +
  scale_fill_manual(values=c(&quot;black&quot;, &quot;white&quot;), guide=&quot;none&quot;) # change colors &amp; remove legend

ggsave(&quot;maps/map7.png&quot;, width=12.5, height=8.25, dpi=72)
</pre>
<p>Here I have added country borders to the previous map plot.<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map7.png" alt="World map in ggplot in robinson projection with country borders" width="900" height="594" class="aligncenter size-full wp-image-1682" />
<pre class="brush: r; title: ; notranslate">
# bubble plot
places &lt;- readOGR(&quot;ne_110m_populated_places&quot;, layer=&quot;ne_110m_populated_places&quot;) 
places_df &lt;- as(places, &quot;data.frame&quot;)
places_robin_df &lt;- project(cbind(places_df$LONGITUDE, places_df$LATITUDE), proj=&quot;+init=ESRI:54030&quot;) 
places_robin_df &lt;- as.data.frame(places_robin_df)
names(places_robin_df) &lt;- c(&quot;LONGITUDE&quot;, &quot;LATITUDE&quot;)
places_robin_df$POP2000 &lt;- places_df$POP2000 

ggplot(bbox_robin_df, aes(long,lat, group=group)) + 
  geom_polygon(fill=&quot;white&quot;) +
  geom_polygon(data=countries_robin_df, aes(long,lat, group=group, fill=hole)) + 
  geom_point(data=places_robin_df, aes(LONGITUDE, LATITUDE, group=NULL, fill=NULL, size=POP2000), color=&quot;#32caf6&quot;, alpha=I(8/10)) +
  geom_path(data=countries_robin_df, aes(long,lat, group=group, fill=hole), color=&quot;white&quot;, size=0.3) +
  geom_path(data=grat_df_robin, aes(long, lat, group=group, fill=NULL), linetype=&quot;dashed&quot;, color=&quot;grey50&quot;) +
  labs(title=&quot;World map (Robinson)&quot;) + 
  coord_equal() + 
  theme_opts +
  scale_fill_manual(values=c(&quot;black&quot;, &quot;white&quot;), guide=&quot;none&quot;)+
  scale_size_continuous(range=c(1,20), guide=&quot;none&quot;)# change colors &amp; remove legend
ggsave(&quot;maps/map8.png&quot;, width=12.5, height=8.25, dpi=72)
</pre>
<p>Bubble plots are a popular way of displaying information on maps. Here I used project() to reproject the bubbles&#8217; coordinates into the Robinson projection.<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map8.png" alt="World map in ggplot using robinson projection plus bubble plot" width="900" height="594" class="aligncenter size-full wp-image-1683" />
<pre class="brush: r; title: ; notranslate">
# Winkel tripel projection
countries_wintri &lt;- spTransform(countries, CRS(&quot;+proj=wintri&quot;))
bbox_wintri &lt;- spTransform(bbox, CRS(&quot;+proj=wintri&quot;))
wmap_wintri &lt;- spTransform(wmap, CRS(&quot;+proj=wintri&quot;))
grat_wintri &lt;- spTransform(grat, CRS(&quot;+proj=wintri&quot;))

p&lt;-ggplot(bbox_wintri, aes(long,lat, group=group)) + 
  geom_polygon(fill=&quot;white&quot;) +
  geom_polygon(data=countries_wintri, aes(long,lat, group=group, fill=hole)) + 
  geom_path(data=countries_wintri, aes(long,lat, group=group, fill=hole), color=&quot;white&quot;, size=0.3) +
  geom_path(data=grat_wintri, aes(long, lat, group=group, fill=NULL), linetype=&quot;dashed&quot;, color=&quot;grey50&quot;) +
  labs(title=&quot;World map (Winkel Tripel)&quot;) + 
  coord_equal(ratio=1) + 
  theme_opts +
  scale_fill_manual(values=c(&quot;black&quot;, &quot;white&quot;), guide=&quot;none&quot;) # change colors &amp; remove legend

ggsave(plot=p, &quot;maps/map9.png&quot;, width=12.5, height=8.25, dpi=72)
</pre>
<p>Lastly, here is an example of the Winkel tripel projection. This projection became popular after 1998 when the National Geographic Society choose to use it for their world maps—using it to replace the Robinson projection, which they previously used.<br />
<img src="http://rpsychologist.com/wp-content/uploads/2013/05/map9.png" alt="World map in ggplot using winkel tripel projection" width="900" height="594" class="aligncenter size-full wp-image-1684" /></p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/working-with-shapefiles-projections-and-world-maps-in-ggplot/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Analytical and simulation-based power analyses for mixed-design ANOVAs</title>
		<link>http://rpsychologist.com/analytical-and-simulation-based-power-analyses-for-mixed-design-anovas/</link>
		<comments>http://rpsychologist.com/analytical-and-simulation-based-power-analyses-for-mixed-design-anovas/#comments</comments>
		<pubDate>Wed, 22 May 2013 04:19:18 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[Ggplot2]]></category>
		<category><![CDATA[Monte Carlo]]></category>
		<category><![CDATA[Power analysis]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1624</guid>
		<description><![CDATA[In this post I show some R-examples on how to perform power analyses for mixed-design ANOVAs. The first example is analytical—and adapted from formulas used in G*Power (Faul et al., 2007), and the second example is a Monte Carlo simulation. <a href="http://rpsychologist.com/analytical-and-simulation-based-power-analyses-for-mixed-design-anovas/">Read more</a>]]></description>
				<content:encoded><![CDATA[<p>In this post I show some R-examples on how to perform power analyses for mixed-design ANOVAs. The first example is analytical — adapted from formulas used in G*Power (Faul et al., 2007), and the second example is a Monte Carlo simulation. The source code is embedded at the end of this post.</p>
<p>Both functions require a dataframe, containing the parameters that will be used in the power calculations. Here is an example using three groups and three time-points. </p>
<pre class="brush: r; title: ; notranslate">
# design -------
# mus
CT &lt;- c(34.12, 21, 17.44)
BA &lt;- c(36.88, 16.82, 8.75) 
ADM &lt;- c(35.61, 14.39, 7.78)

study &lt;- data.frame(&quot;group&quot; = gl(3,3, labels=c(&quot;CT&quot;, &quot;BA&quot;, &quot;ADM&quot;)))
study$time &lt;- gl(3,1,9, labels=c(&quot;Intake&quot;, &quot;8 weeks&quot;, &quot;16 weeks&quot;))

study$DV &lt;- c(CT, BA, ADM) 
study$SD &lt;- 10

ggplot(study, aes(time, DV, group=group, linetype=group, shape=group)) + 
    geom_line() + 
    geom_point()
</pre>
<p>Here is a plot of our hypothetical study design.<br />
<img src="http://rpsychologist.com/wp-content/uploads/2013/05/design-1024x897.png" alt="Study design for power analysis for mixed-design ANOVA" width="840" height="735" class="aligncenter size-large wp-image-1639" /><br />
Now, we will use this design to perform a power analysis using <tt>anova.pwr.mixed</tt> and <tt>anova.pwr.mixed.sim</tt>. </p>
<pre class="brush: r; title: ; notranslate">
# analytical ----------
anova.pwr.mixed(data = study, Formula = &quot;DV ~ time*group&quot;,
 n=10, m=3, rho=0.5)
</pre>
<pre>
   Terms      power n.needed
b  group      0.197       NA
w1 time       1.000       NA
w2 time:group 0.617       NA</pre>
<pre class="brush: r; title: ; notranslate">
# monte carlo ------------
anova.pwr.mixed.sim(data=study, Formula=&quot;DV ~ time*group + Error(subjects)&quot;,
 FactorA=&quot;group&quot;, n=10, rho=0.5, sims=100)
</pre>
<pre>
       terms power
1  group      0.19
2 time        1.00
3 time:group  0.64</pre>
<h2>Comparison of analytical and monte carlo power analysis</h2>
<p>Now let&#8217;s compare the two functions&#8217; results on the time x group-interaction. Hopefully, the two methods will yield the same result. </p>
<pre class="brush: r; title: ; notranslate">
# compare
samples &lt;- seq(10,50,3) # n's to use
analytical &lt;- matrix(ncol=2, nrow=length(samples))
colnames(analytical) &lt;- c(&quot;power&quot;, &quot;n&quot;)
for(i in samples) { 
  j &lt;- which(samples == i)
  analytical[j,1] &lt;- anova.pwr.mixed(data = study, Formula = &quot;DV ~ time*group&quot;, n=i, m=3, rho=0.5)$power[3]
  analytical[j,2] &lt;- i
}
  
MC &lt;- matrix(ncol=2, nrow=length(samples))
colnames(MC) &lt;- c(&quot;power&quot;, &quot;n&quot;)
for(i in samples) { 
  j &lt;- which(samples == i)
  MC[j,1] &lt;- anova.pwr.mixed.sim(data=study, Formula=&quot;DV ~ time*group + Error(subjects)&quot;, FactorA=&quot;group&quot;, n=i, rho=0.5, sims=500)$power[3]
  MC[j,2] &lt;- i
}

# plot
MC &lt;- data.frame(MC)
MC$method &lt;- &quot;MC&quot;
analytical &lt;- data.frame(analytical)
analytical$method &lt;- &quot;analytical&quot;
df &lt;- rbind(analytical, MC)

ggplot(df, aes(n, power, group=method, color=method)) + geom_smooth(se=F) + geom_point()
</pre>
<p><img src="http://rpsychologist.com/wp-content/uploads/2013/05/ana_vs_mc-1024x897.png" alt="Comparison of analytical versus monte carlo power analysis for mixed design anova" width="840" height="735" class="size-large wp-image-1642" /><br />
Luckily, the analytical results are consistent with the Monte Carlo results. </p>
<h2> References </h2>
<p>Faul, F., Erdfelder, E., Lang, A. G., &#038; Buchner, A. (2007). G* Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences.<em> Behavior research methods</em>, 39(2), 175-191.</p>
<h2> Source code </h2>
<p><script src="https://gist.github.com/rpsychologist/5618891.js"></script><br />
<script src="https://gist.github.com/rpsychologist/5618888.js"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/analytical-and-simulation-based-power-analyses-for-mixed-design-anovas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cognitive behavior therapy outperformed psychodynamic therapy on all outcomes in a randomized controlled trail</title>
		<link>http://rpsychologist.com/cognitive-behavior-therapy-outperformed-psychodynamic-therapy-on-all-outcomes-in-a-randomized-controlled-trail/</link>
		<comments>http://rpsychologist.com/cognitive-behavior-therapy-outperformed-psychodynamic-therapy-on-all-outcomes-in-a-randomized-controlled-trail/#comments</comments>
		<pubDate>Mon, 30 Jul 2012 07:54:01 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[Psychology]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Cognitive Behavior Therapy]]></category>
		<category><![CDATA[cohen's d]]></category>
		<category><![CDATA[Dodo bird]]></category>
		<category><![CDATA[Psychotherapy]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1452</guid>
		<description><![CDATA[The dodo bird might be extinct in the real world but in the world of psychotherapy research it refuses to die. However, a group of German researchers recently put forward an article were they had randomized patients to either a PDT or CBT condition and measured the relative proficiency of the two orientations, and they found that their results delivered a convincing blow to the dodo bird verdict. <a href="http://rpsychologist.com/cognitive-behavior-therapy-outperformed-psychodynamic-therapy-on-all-outcomes-in-a-randomized-controlled-trail/">Read more</a>]]></description>
				<content:encoded><![CDATA[<p class="intro">The dodo bird might be extinct in the real world but in the world of psychotherapy research it refuses to die. However, a group of German researchers recently put forward an article were they had randomized patients to either a PDT or CBT condition and measured the relative proficiency of the two orientations, and they found that their results delivered a convincing blow to the dodo bird verdict.</p>
<h3>Introduction</h3>
<p>The dodo bird verdict is the long held belief by some researchers and clinicians that all psychological interventions produce the same outcome. The proponents of this theory often attribute the efficacy of a therapy to “common factors”, such as the alliance between therapist and client. This is in opposition to crediting the success of a therapy to the specific techniques used by the therapist, e.g. exposure or cognitive restructuring. </p>
<p>It’s no secret that the main rivalry has been between cognitive behavior therapy (CBT) and psychodynamic therapy (PDT) practitioners. However, one of the issues with testing the dodo bird verdict has been the lack of quality studies on the efficacy of PDT. More importantly very few direct comparisons have been made within randomized controlled trails. Therefore it’s always exciting when a research group puts forward precisely that. </p>
<h3>The study</h3>
<p>Watzke et al. (2012) recruited 189 patients and randomized them to either CBT or PDT, they used a 3:2 ratio because the facilities had less capacity to give PDT. The study took place in a natural setting at an inpatient unit in Germany, and because of the naturalistic setting very broad inclusion criteria were used. The treatments were administered as brief group therapies, with 3-4 sessions per week and an average treatment length of 6 weeks. Both treatment groups additionally received one individual session per week. No treatment manuals were used and the therapists received no special training for this study. </p>
<p>The primary outcome used in the study was <em>the General Severity Index of the Symptom Check list-14 (SCL-14)</em>. The secondary outcomes was <em>mental component summary</em> of <em>the SF-8</em>, and <em>the Inventory of Interpersonal Problems (IIP-C)</em>. The outcomes were assed at intake and at 6 month follow up. </p>
<p>You can see their results in Figure 1, were I’ve made a plot of the results, and calculated Cohen’s <em>d</em> with 95 % CIs. </p>
<p><a href="http://rpsychologist.com/wp-content/uploads/2012/07/CBT_vs_PDT_in_a_RCT.png.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/07/CBT_vs_PDT_in_a_RCT.png.png" alt="Cognitive behavioral therapy (CBT) vs Psychodynamic therapy (PDT) in randomized controlled trail (RCT)" title="Cognitive behavioral therapy (CBT) vs Psychodynamic therapy (PDT) in randomized controlled trail (RCT)" width="600" class="aligncenter size-full wp-image-1453" /></a><br />
<em>Figure 1.</em> A Comparison of Cognitive behavioral therapy (CBT) vs Psychodynamic therapy (PDT) in randomized controlled trail (RCT)</p>
<h3>Conclusion</h3>
<p>In this direct comparison between CBT and PDT, CBT clearly performed better, and quite convincingly so. Clearly the dodo bird did not fair well in this study, but more research like this is needed before the dodo bird finally can be put to rest. If indeed CBT is more effective than PDT, then this is incredible valuable research for all the patients out there. Hopefully we’ll see more randomized controlled trails that compare two <em>bona fide</em> psychotherapies in the future. </p>
<div class="yellow-box">
<h3>Quality of the evidence</h3>
<p>This study is a randomized controlled trail and as such the evidence has got the potential to be of high quality. Some aspects of the study are a bit unclear though, for instance the authors never describe how the allocation sequence was concealed. And I couldn’t see any information on why they only hade two time points (baseline and 6 months), more time points would’ve provided more information. Additionally, I think they should’ve analyzed their data as a multilevel model, especially if they used many different therapists and different hospital units. Also, they did not state how many therapists that were used in the study, and consequently they did not test for any therapist interaction effect. However, the authors explicitly state that the treatments were not recorded and hence no assessment of adherence or competence was made. Though, the <em>“results of a prior study including independent expert raters (video ratings) describe the main interventions of both treatments (Watzke et al, 2004, 2008) and suggest that there was sufficient treatment differentiation between the two treatments in the unit”</em>. So it’s possible that the outcome is due to the CBT therapists being more competent, and not due to CBT being more effective as a specific intervention. However, to me this seems redundant as from my perspective it’s a sign of competence to choose CBT before PDT. This statement might seem unnecessary polemic, but CBT is the treatment of choice for many disorders today with a vast amount of research supporting its efficacy, so perhaps one must be a bit scientific naïve to administer PDT for diagnoses were research support is lacking. </p>
<p>The researchers used unequal groups, with a 2:3 ratio (CBT having more patients). It’s hard to estimate if this had any effect on the outcome, statistically this might affect the significance test if the assumptions of homogeneous variance is violated. However, by looking at the standard deviations reported in the study this doesn’t seem to be a problem. Moreover, the authors performed analysis to test if any cofounders might have been unequally distributed between the treatments, and found no evidence for this. Also, attrition didn’t seem to be a problem, the authors performed sensitivity analyses and intention-to-treat analyses, which did not reveal any cause for concern. </p>
<p>I’m not sold on the outcomes they assessed. I’m thinking that they could’ve used more outcome measures, for instance they could’ve assessed depression and anxiety separately. However, there’s evidence that the SCL-scales are quite good at detecting general symptom severity.  </p>
<p>Overall, I find the results of this study interesting and I don’t think any of the study’s shortcomings invalidates its findings. But as always more studies are needed before any robust conclusion can be made.<br />
</div><br />
<div class="white-box"><span style="float: left; padding: 5px;"><a href="http://www.researchblogging.org"><img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0; box-shadow: 0 0 0px; -moz-box-shadow: 0 0 0px;"/></a><br />
</span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&#038;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&#038;rft.jtitle=Behaviour+research+and+therapy&#038;rft_id=info%3Apmid%2F22750189&#038;rfr_id=info%3Asid%2Fresearchblogging.org&#038;rft.atitle=Longer+term+outcome+of+cognitive-behavioural+and+psychodynamic+psychotherapy+in+routine+mental+health+care%3A+Randomised+controlled+trial.&#038;rft.issn=0005-7967&#038;rft.date=2012&#038;rft.volume=50&#038;rft.issue=9&#038;rft.spage=580&#038;rft.epage=587&#038;rft.artnum=&#038;rft.au=Watzke+B&#038;rft.au=R%C3%BCddel+H&#038;rft.au=J%C3%BCrgensen+R&#038;rft.au=Koch+U&#038;rft.au=Kriston+L&#038;rft.au=Grothgar+B&#038;rft.au=Schulz+H&#038;rfe_dat=bpr3.included=1;bpr3.tags=Psychology%2CSocial+Science%2CHealth">Watzke B, Rüddel H, Jürgensen R, Koch U, Kriston L, Grothgar B, &#038; Schulz H (2012). Longer term outcome of cognitive-behavioural and psychodynamic psychotherapy in routine mental health care: Randomised controlled trial. <span style="font-style: italic;">Behaviour research and therapy, 50</span> (9), 580-587 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22750189">22750189</a></span><br />
</span><br />
</div></p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/cognitive-behavior-therapy-outperformed-psychodynamic-therapy-on-all-outcomes-in-a-randomized-controlled-trail/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to tell when error bars correspond to a significant p-value</title>
		<link>http://rpsychologist.com/how-to-tell-when-error-bars-correspond-to-a-significant-p-value/</link>
		<comments>http://rpsychologist.com/how-to-tell-when-error-bars-correspond-to-a-significant-p-value/#comments</comments>
		<pubDate>Tue, 24 Jul 2012 19:21:25 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[confidence interval]]></category>
		<category><![CDATA[error bars]]></category>
		<category><![CDATA[Ggplot2]]></category>
		<category><![CDATA[standard error]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1426</guid>
		<description><![CDATA[Can you tell when error bars based on 95 % CIs or standard errors correspond to a significant p-value? Don’t fret if you think it’s hard, a study from 2005 showed that researchers in psychogoly, behavior neuroscience and medicine had a hard time judging when error bars from two independent groups signified a significant difference.  <a href="http://rpsychologist.com/how-to-tell-when-error-bars-correspond-to-a-significant-p-value/">Read more</a>]]></description>
				<content:encoded><![CDATA[<h1>Introduction</h2>
<p>Belia, Fidler, Williams, and Cumming (2005) found that researchers in psychology, behavior neuroscience and medicine are really bad at interpreting when error bars signify that two means are significantly different (<em>p</em> = 0.05). What they did was to email a bunch of researchers and invite them to take a web-based test, and they got 473 usable responses.  The test consisted of an interactive plot with error bars for two independent groups, the participants were asked to move the error bars to a position they believed would represent a significant <em>t</em>-test at <em>p</em>=0.05. They did this for error bars based on the 95 % CI and the group’s standard errors. The participants did on average set the 95 % CI too far apart with their mean placement corresponding to a <em>p</em> value of .009. They did the opposite with the SE error bars, which they put too close together yielding placements corresponding to <em>p</em> = 0.109. And if you’re wondering they found no difference between the three disciplines. </p>
<h1>Plots</h1>
<p>I wanted to pull my weight, and I have therefore created some various plots in R that show error bars that are significant at various <em>p</em>-values. </p>
<p><a href="http://rpsychologist.com/wp-content/uploads/2012/07/errorbar_05.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/07/errorbar_05-1024x714.png" alt="Interpreting error bars and confidence intervals p = .05" title="Interpreting error bars and confidence intervals p = .05" width="600" class="aligncenter size-large wp-image-1427" /></a></p>
<h5>Figure 1. Error bars corresponding to a significant difference at p = .05 (equal group sizes and equal variances)</h5>
<p><a href="http://rpsychologist.com/wp-content/uploads/2012/07/errorbar_01.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/07/errorbar_01-1024x714.png" alt="Interpreting error bars and confidence intervals p = .01" title="Interpreting error bars and confidence intervals p = .01" width="600" class="aligncenter size-large wp-image-1428" /></a></p>
<h5>Figure 2. Error bars corresponding to a significant difference at p = .01 (equal group sizes and equal variances)</h5>
<p><a href="http://rpsychologist.com/wp-content/uploads/2012/07/errorbar_001.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/07/errorbar_001-1024x714.png" alt="Interpreting error bars and confidence intervals p = .001" title="Interpreting error bars and confidence intervals p = .001" width="600"  class="aligncenter size-large wp-image-1429" /></a></p>
<h5>Figure 3. Error bars corresponding to a significant difference at p = .001 (equal group sizes and equal variances)</h5>
<p>Based on the first plot we see that an overlap of about one third of the 95 % CIs corresponds to <em>p</em> = 0.05. For the SE error bars we see that they are about 1 SE apart when <em>p</em> = 0.05. </p>
<h2>R Code</h2>
<p>Here&#8217;s the complete R code used to produce these plots</p>
<pre class="brush: r; title: ; notranslate">
library(ggplot2)
library(ggplot2)
library(plyr)
m2 &lt;- 100 # initital group size, should be the same as m1
p &lt;- 1 # starting p-value
m1 &lt;- 100 # mean group 1
sd1 &lt;- 10 # sd group 1
sd2 &lt;- 10 # sd group 2
n &lt;- 20 # n per group
s &lt;- sqrt(0.5 * (sd1^2 + sd2^2)) # pooled sd
while(p&gt;0.05) { # loop til p = 0.05
  t &lt;- (min(c(m1,m2)) - max(c(m1,m2))) / (s * sqrt(2/n)) # t statistics
  df &lt;- (n*2)-2 # degress of freedom
  p &lt;-pt(t, df)*2 # p value
  m2 &lt;- m2 - (m2/10000) # adjust mean for group 2
}
get_CI &lt;- function(x, sd, CI) { # calculate error bars
  se &lt;- sd/sqrt(n) # standard error
  lwr &lt;- c(x - qt((1 + CI)/2, n - 1) * se, x - se) # 95 % CI and SE lower limit
  upr &lt;- c(x + qt((1 + CI)/2, n - 1) * se, x + se) # 95 % CI and SE upper limit
  data.frame(&quot;lwr&quot; = lwr, &quot;upr&quot; = upr, &quot;se&quot; = se) # result
}
plot_df &lt;- data.frame(&quot;mu&quot; = rep(c(m1,m2), each=2)) # means
plot_df$group &lt;- gl(2,2, labels=c(&quot;group1&quot;, &quot;group2&quot;)) # group factor
plot_df$type &lt;- gl(2,1,4, labels=c(&quot;95 % CI&quot;, &quot;se errorbars&quot;)) # type of errorbar
plot_df &lt;- cbind(plot_df, rbind(get_CI(m1, sd1, .95), get_CI(m2, sd2, .95))) # put it all together

get_overlap &lt;- function(arg) { # calculate overlap %
  x &lt;-subset(plot_df, type == arg) # subset for type of errorbar
  x_range &lt;- abs(mean(x$lwr - x$upr)) # length of error bar
  x_lwr &lt;- max(x$lwr) # lwr limit for group with highest lwr limit
  x_upr &lt;- min(x$upr) # upr limit for group with lowest lwr limit
  overlap &lt;- abs( (x_upr - x_lwr) / x_range) # % overlap
  data.frame(&quot;type&quot;=arg, &quot;range&quot; = x_range, &quot;lwr&quot; = x_lwr, &quot;upr&quot; = x_upr, &quot;overlap&quot; = round(overlap, 2)) # result
}
overlap &lt;-ldply(levels(plot_df$type), get_overlap) # get overlap and put into dataframe
overlap$text &lt;- paste(overlap$overlap * 100, &quot;% of errorbar&quot;) # label text
overlap$text_y &lt;- c(overlap[1,4], overlap[2,3]) # quick-fix

ggplot(plot_df, aes(group, mu, group=group)) + 
  geom_point(size=3) + # point for group mean
  geom_errorbar(aes(ymax=upr, ymin=lwr), width=0.2) + # error bars for means
  opts(title=paste(&quot;Illustration of errorbars for a significant 2-sample t-test, p =&quot;, round(p,3))) +  # plot title
  facet_grid(. ~ type) + # split plot after error bar type
  geom_errorbar(data=overlap, aes(ymax=upr, ymin=lwr, x=1.5, y=NULL, group=type), width=0.1, color=&quot;red&quot;) + # add overlap error bar
  geom_text(data=overlap, aes(label = text, group=type, y=text_y, x=1.5, vjust=-1)) + # annotate overlap
  ylab(expression(bar(x))) # change y label</pre>
<p><span style="float: left; padding: 5px;"><a href="http://www.researchblogging.org"><img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0; box-shadow: 0 0 0px; -moz-box-shadow: 0 0 0px;"/></a><br />
</span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&#038;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&#038;rft.jtitle=Psychological+methods&#038;rft_id=info%3Apmid%2F16392994&#038;rfr_id=info%3Asid%2Fresearchblogging.org&#038;rft.atitle=Researchers+misunderstand+confidence+intervals+and+standard+error+bars.&#038;rft.issn=1082-989X&#038;rft.date=2005&#038;rft.volume=10&#038;rft.issue=4&#038;rft.spage=389&#038;rft.epage=96&#038;rft.artnum=&#038;rft.au=Belia+S&#038;rft.au=Fidler+F&#038;rft.au=Williams+J&#038;rft.au=Cumming+G&#038;rfe_dat=bpr3.included=1;bpr3.tags=Medicine%2CPsychology%2CSocial+Science%2CNeuroscience">Belia S, Fidler F, Williams J, &#038; Cumming G (2005). Researchers misunderstand confidence intervals and standard error bars. <span style="font-style: italic;">Psychological methods, 10</span> (4), 389-96 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/16392994">16392994</a><br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/how-to-tell-when-error-bars-correspond-to-a-significant-p-value/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>The Higgs boson: 5-sigma and the concept of p-values</title>
		<link>http://rpsychologist.com/the-higgs-boson-sigma-5-and-the-concept-of-p-values/</link>
		<comments>http://rpsychologist.com/the-higgs-boson-sigma-5-and-the-concept-of-p-values/#comments</comments>
		<pubDate>Wed, 04 Jul 2012 11:56:47 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Ggplot2]]></category>
		<category><![CDATA[Higgs]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1385</guid>
		<description><![CDATA[Why are physicists talking about 5-sigma, and what's it got to do with statistics? In this short post I'll explain what 5-sigma is  and why it's not a measure of how certain scientist are that they've found the Higgs boson <a href="http://rpsychologist.com/the-higgs-boson-sigma-5-and-the-concept-of-p-values/">Read more</a>]]></description>
				<content:encoded><![CDATA[<p>Today’s announcement at CERN of the latest research on the Higgs boson was truly extraordinary. Not only was the scientific achievement remarkable, but medias reporting of 5-sigma as a measure of “certainty” was also truly remarkable. For instance, the science editor at the Swedish news paper Dagens Nyheter reported that a sigma of 4.9 equals a certainty of 99.99994 %, which obviously isn’t true, simply because <em>p</em>( <em>D</em> | <em>H0</em> ) is not the same as <em>p</em>( <em>H0</em> | <em>D</em> ). In plain english this means that a <em>p</em>-value represents the conditional probability of getting the data given that the null hypothesis is true. Nothing more, and it surely doesn&#8217;t give the probability for the alternative hypothesis being true, i.e. the &#8220;certainty&#8221; that somethings been found that&#8217;s not a random fluctuation.  </p>
<p>So what does physicists mean when they report 5-sigma? Well, it’s just another convention of reporting alpha values. Sigma refers to the population standard deviation, and 5-sigma means that they accept events as statistical significant if they fall more than 5 standard deviations away from the mean, given that the null hypothesis is true. And here the null hypothesis is that the event is simply due to random noise or fluctuations.  You can get the <em>p</em>-values for 5-sigma by taking the area under the normal curve that’s to the left of +5 sigma.</p>
<pre class="toolbar:2 lang:r decode:true">
&gt; pnorm(5)
[1] 0.9999997
</pre>
<p>And then take 1 &#8211; 0.9999997 to get the <em>p</em>-value, which is 0.0000003 as the CERN researchers performed a one-tailed test. I imagine physicists say 5-sigma because saying &#8220;point zero zero zero zero zero zero three&#8221; might become quite tiresome, so it&#8217;s quite ironic that journalist all over the world seem to be converting sigma back to percent. </p>
<p>If we want we can also use R and ggplot2 to illustrate 5-sigma by plotting the normal distribution and superimpose a line at sigma 5</p>
<pre class="lang:r decode:true">
library(ggplot2)
x <- seq(-6,6,length=200) # sigmas
y <- dnorm(x) # curve

df <- data.frame("sigma" = x,"y" = y) # create data frame

# plot
text_block <- "A confidence level = 5-sigma represents \nthe probability of getting a result from your \nexperiment, simply from random fluctuations \nalone, equal to the area under the curve \nthat’s to the right of the dotted line. That’s an \nexceptionally rare event. However, the area to \nthe left of 5-sigma does not represent the \nprobability or certainty that the Higgs boson \nhas been found."
ggplot(df, aes(sigma,y)) +
geom_line(size=1) +
annotate("text", x=1.7, y=0.2, label=text_block, size=4, hjust=0) +
annotate("segment", x = 5, xend=5, y = 0, yend = 0.05, linetype="dashed") +
annotate("text", x=5, y=0.05, label="5-sigma", vjust=-0.5)
</pre>
<p><em>Note: The actual plot below has been fine-tuned in Illustrator</em>.<br />
<a href="http://rpsychologist.com/wp-content/uploads/2012/07/higgs_P-values_sigma_5.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/07/higgs_P-values_sigma_5.png" alt="Higgs boson cern 5-sigma and p-values" title="Higgs boson cern 5-sigma and p-values" width="600" class="aligncenter size-full wp-image-1416" /></a></p>
<p>The area under the curve that's to the right of the dotted line represents the p-value for 5-sigma. We see that observations in that area are highly unlikely to occur if we assume that the null hypothesis is true. </p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/the-higgs-boson-sigma-5-and-the-concept-of-p-values/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Effect of sample size on the accuracy of Cohen&#8217;s d estimates (95 % CI)</title>
		<link>http://rpsychologist.com/effect-of-sample-size-on-the-accuracy-of-cohens-d-estimates-95-ci/</link>
		<comments>http://rpsychologist.com/effect-of-sample-size-on-the-accuracy-of-cohens-d-estimates-95-ci/#comments</comments>
		<pubDate>Wed, 27 Jun 2012 07:06:08 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[Psychology]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[AIPE]]></category>
		<category><![CDATA[cohen's d]]></category>
		<category><![CDATA[effect sizes]]></category>
		<category><![CDATA[Ggplot2]]></category>
		<category><![CDATA[Sample size planning]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1360</guid>
		<description><![CDATA[When talking about confidence intervals, Jacob Cohen famously said: “I suspect that the main reason they are not reported is that they are so embarrassingly large!” (Cohen, 1994). In this post I'll take a look at the relationship between the 95 % CI for Cohen's <em>d</em> and it's corresponding sample size.  <a href="http://rpsychologist.com/effect-of-sample-size-on-the-accuracy-of-cohens-d-estimates-95-ci/">Read more</a>]]></description>
				<content:encoded><![CDATA[<h2>Introduction</h2>
<p>I&#8217;ve been incredibly busy the last month, amongst other things I&#8217;ve moved about 400 miles from Umeå, in the north of sweden, to Stockholm. However, I&#8217;ve been working a lot with R and especially with power analysis both via Monte Carlo simulations and via analytical approaches. I will not write about power analysis here, but I will write about a closely related concept about sample size planning for <em>accuracy in parameter estimation (AIPE)</em>. Whereas traditional power analysis is used to plan for an adequate sample size to reject the null hypothesis at the desired alpha level, sample size planning for AIPE is used to plan for a desired with of the CI. AIPE functions are implemented in the MBESS-package by Kelly and Lai, if you&#8217;re interested you can read more about AIPE in Maxwell, Kelley, and Rausch (2008).</p>
<h2>Graphs</h2>
<p>Here&#8217;s two graphs I did in R to illustrate the connection between sample size and confidence interval. In Figure 2 you can see that that sample size will increase as a function of the width <em>and</em> and the magnitude of the effect, i.e. you need a larger sample to achieve a certain CI width the larger the effect size is. You also see that this increase in sample size is less apparent the larger the CI&#8217;s width become. </p>
<p><a href="http://rpsychologist.com/wp-content/uploads/2012/06/cohens_d_confidence_interval_vs_sample_size.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/06/cohens_d_confidence_interval_vs_sample_size-1024x628.png" alt="cohens d confidence interval vs sample size" title="cohens d confidence interval vs sample size" width="600" class="aligncenter size-large wp-image-1361" /></a> Figure 1. 95% confidence interval for Cohen&#8217;s <em>d</em> of 0.8 in relation to sample size (per group), value above the error bars represent the CI&#8217;s range. </p>
<p><a href="http://rpsychologist.com/wp-content/uploads/2012/06/size_of_cohens_d_sample_size_and_confidence_interval.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/06/size_of_cohens_d_sample_size_and_confidence_interval-1024x628.png" alt="size of cohens d vs sample size and confidence interval" title="size of cohens d vs sample size and confidence interval" width="600" class="aligncenter size-large wp-image-1365" /></a> Figure 2. 95% confidence interval for different magnitudes of Cohen&#8217;s <em>d</em> in relation to sample sizes (per group) and width of confidence intervals. </p>
<h2>R code</h2>
<pre class="lang:r decode:true " >
library(MBESS)
library(ggplot2)

# CI for d = 0.8 ----------------------------------------------------------
smd_plot <- data.frame()
for(i in seq(10,400, by=10)) { # loop
x.ci <- ci.smd(smd=0.8, n.1=i,n.2=i)
smd_plot <- rbind(smd_plot, data.frame("lwr" = x.ci$Lower.Conf.Limit.smd,
"upr" = x.ci$Upper.Conf.Limit.smd, "smd"=0.8, "n" = i))
}
smd_plot$range <- round(smd_plot$upr - smd_plot$lwr,2)

# ggplot --------------------------------------------------------------------

ggplot(smd_plot, aes(n, smd)) +
geom_point() +
geom_errorbar(aes(ymin=lwr, ymax=upr)) +
geom_text(aes(label=range, y=upr), hjust=-0.4, angle=45, size=4) +
scale_y_continuous(breaks=seq(0,2, by=0.25)) +
scale_x_continuous(breaks=seq(0,400, by=20)) +
ylim(-.2,1.8)

# diffrent ds, widths and sample sizes ------------------------------------
ss2 <- NULL
# nested loops to run ss.aie.smd with different deltas and widths
for(j in seq(0.2, 1, by=0.2)) {
ss <- NULL
for(i in seq(0.1,2,by=0.2)) {
ss <- c(ss, ss.aipe.smd(delta=i, width=j))
}
if(j == 0.2) {
ss2 <- data.frame("n" = ss)
ss2$width <- j
} else
{
ss_tmp <- data.frame("n" = ss)
ss_tmp$width <- j
ss2 <- rbind(ss2, ss_tmp)
}
}
ss2$delta <- rep(seq(0.1,2,by=0.2), times=5) # add deltas used in loop

# ggplot ------------------------------------------------------------------
ggplot(ss2, aes(delta, n, group=factor(width), linetype=factor(width))) + geom_line()
</pre>
<h2>References</h2>
<p>Cohen J. 1994. The earth is round (p < .05). <em>Am. Psychol</em>. 49(12):997–1003<br />
Maxwell, S. E., Kelley, K., &#038; Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. <em>Annu. Rev. Psychol.</em>, 59, 537-563.</p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/effect-of-sample-size-on-the-accuracy-of-cohens-d-estimates-95-ci/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>PubMed publications in 2011 by 202 world countries: who’s the winner?</title>
		<link>http://rpsychologist.com/pubmed-publications-in-2011-by-202-world-countries-whos-the-winner/</link>
		<comments>http://rpsychologist.com/pubmed-publications-in-2011-by-202-world-countries-whos-the-winner/#comments</comments>
		<pubDate>Mon, 07 May 2012 10:27:14 +0000</pubDate>
		<dc:creator>Kristoffer Magnusson</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Ggplot2]]></category>
		<category><![CDATA[PubMed]]></category>

		<guid isPermaLink="false">http://rpsychologist.com/?p=1301</guid>
		<description><![CDATA[Which country had the most PubMed citations in 2011? To find out I used R statistical software to analyze the affiliation of 986 427 articles.  <a href="http://rpsychologist.com/pubmed-publications-in-2011-by-202-world-countries-whos-the-winner/">Read more</a>]]></description>
				<content:encoded><![CDATA[<h2>Introduction</h2>
<p>I had this idea that it’d be fun to look at all PubMed&#8217;s articles from 2011 and extract country affiliation for each individual country. So I set out to do just that, but in addition to just look at 2011 I also looked at proportional change in publication 1980–2010 for the top 20 countries. The data for 2011 is visualized on a world map both as a bubble plot and as a heat map. </p>
<p>It turned that this project weren’t as straightforward as I first had anticipated. Mainly because PubMed’s affiliation field is a veritable mess with no apparent reporting standard. I imagine there are databases who are much more suited to this task than PubMed. </p>
<h2>Method</h2>
<p>There were 986 427 articles published in PubMed in 2011; so I, naturally, used R to extract national publication counts. I did this by downloading all citations into one 8.37 Gb XML-file, imported the affiliation strings into <tt>MySQL</tt> and then used R to extract country affiliation using <tt>grep</tt> and <tt>regular expressions</tt>. </p>
<p>To avoid unnecessary manual work I used lists of country names, U.S state &#038; university names, India states and Japan universities. I also looked at word frequencies for the affiliations strings that couldn’t be matched, and used this to make additional pattern lists. Lastly, I also used mail-suffixes to extract affiliation. </p>
<h2>Reliability</h2>
<p>To find out how many mismatches my script perfomed, I drew a random sample (n = 2000) and manually screened for errors. 22 errors were found, and all of them entailed the string being matched to the correct country plus one incorrect country, i.e. this string were matched to both UK and US (because &#8220;Bristol&#8221; is matched to UK):</p>
<p><tt>Department of Biotransformation, Bristol-Myers Squibb, Route 206 and Province Line Road, Princeton, NJ 08543, USA. anthony.barros@bms.com</tt></p>
<p>It’s not really a big problem since it only occurs in 1.1 % of the sample. The following countries had erroneous extra matches in my random screening sample:</p>
<pre>
            x  freq
2    Australia    1
3      Austria    1
4        China    3
5       France    1
6        India    2
7        Japan    1
8         Oman    1
9  Saint Lucia    2
10          UK    8
11         USA    2
</pre>
<p>Moreover 1.8% of the affiliation strings couldn’t be matched to any country, by analyzing the word frequencies for the unmatched strings, I concluded there didn’t appear to be any words that could be used to identify an significant amount of countries.   </p>
<p>Additionally, I compared the number of hits for my top 20 countries to the corresponding hits when searching PubMed using rudimentary country queries. These were the results:</p>
<pre>
                                                                   search      R PubMed   dif error
1  United States of America[ad] OR United States[ad] OR US[ad] OR USA[ad] 252796 242050 10746  0.04
2                                                               China[ad]  77614  76359  1255  0.02
3                             UK[ad] OR United Kingdom[ad] OR England[ad]  56069  54661  1408  0.03
4                                                               Japan[ad]  51740  48518  3222  0.06
5                                          Germany[ad] OR Deutschland[ad]  48183  44405  3778  0.08
6                                                              Canada[ad]  31926  29386  2540  0.08
7                                                               Italy[ad]  31883  30971   912  0.03
8                                                              France[ad]  31233  28832  2401  0.08
9                                                               Spain[ad]  24901  21268  3633  0.15
10                                                          Australia[ad]  23807  22891   916  0.04
11                                                              Korea[ad]  23796  23778    18  0.00
12                                                              India[ad]  23371  23093   278  0.01
13                                                        Netherlands[ad]  20002  19602   400  0.02
14                                               Brazil[ad] OR Brasil[ad]  18868  18223   645  0.03
15                                                             Taiwan[ad]  12324  12321     3  0.00
16                                                        Switzerland[ad]  11685  10320  1365  0.12
17                                                             Sweden[ad]  11018  10506   512  0.05
18                                                            Belgium[ad]   8551   8146   405  0.05
19                                                             Poland[ad]   7914   6526  1388  0.18
</pre>
<p>The measurement error is a bit high in countries like Poland, Switzerland and Spain. Nonetheless, I decided to use these PubMed quires to look at annual publications for these countries 1980–2010, using my <a href="http://rpsychologist.com/an-r-script-to-automatically-look-at-pubmed-citation-counts-by-year-of-publication/" title="An R Script to Automatically download PubMed Citation Counts By Year of Publication" target="_blank">PubMed trend script</a> </p>
<h2>Results</h2>
<p>In total 202 countries were extracted, with the publication distribution looking like this:<br />
<a href="http://rpsychologist.com/wp-content/uploads/2012/05/PubMed_publications_world_map_plot1.jpg"><img src="http://rpsychologist.com/wp-content/uploads/2012/05/PubMed_publications_world_map_plot1-1024x625.jpg" alt="PubMed publications world map plot" title="PubMed publications world map plot" width="1024" height="625" class="aligncenter size-large wp-image-1335" /></a><br />
<a href="http://rpsychologist.com/wp-content/uploads/2012/05/PubMed_publications_world_map_bubble_plot.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/05/PubMed_publications_world_map_bubble_plot-1024x488.png" alt="PubMed publications top 20 countries world map bubble plot" title="PubMed publications top 20 countries world map bubble plot" width="1024" height="488" class="aligncenter size-large wp-image-1339" /></a><br />
The same plot as above, but with the bubble size representing publications per capita.<br />
<a href="http://rpsychologist.com/wp-content/uploads/2012/05/PubMed_publications_world_map_bubble_plot_per_capita.png"><img src="http://rpsychologist.com/wp-content/uploads/2012/05/PubMed_publications_world_map_bubble_plot_per_capita-1024x546.png" alt="PubMed publications per capita top 20 countries world map bubble plot" title="PubMed publications per capita top 20 countries world map bubble plot" width="1024" height="546" class="aligncenter size-large wp-image-1341" /></a> </p>
<p>And a plot of the top 20 countries publication percentages 1980–2010</p>
<p><a href="http://rpsychologist.com/wp-content/uploads/2012/05/PubMed_publications_top_20_country_year.jpg"><img src="http://rpsychologist.com/wp-content/uploads/2012/05/PubMed_publications_top_20_country_year.jpg" alt="PubMed publications top 20 country year 1980–2010" title="PubMed publications top 20 country year 1980–2010" width="720" height="636" class="aligncenter size-full wp-image-1307" /></a></p>
<p><del datetime="2012-05-08T13:01:56+00:00">I really don&#8217;t know why USA had such a boost in the 1990s, perhaps it got something to do with PubMed&#8217;s indexing or maybe it&#8217;s a consequence of the <a href="http://en.wikipedia.org/wiki/1990s_United_States_boom" title="http://en.wikipedia.org/wiki/1990s_United_States_boom" target="_blank">&#8220;1990s United States boom&#8221;</a>?</del> The reason for the sudden increase in US citations in the 90s is that prior to 1995 MEDLINE did only record institution, city, and state including zip code for authors affiliated with the US. So naturally, my queries will miss most US publications prior to 1995. However, the apparent question is: when will china surpass US in scientific output?</p>
<p>PS 1. Thanks to Allan Just for telling me how to extract centroid values from the country polygons. </p>
<p>PS 2. My plan is to do some more in-depth analyzes if this data, e.g. to look at publications per capita (in a vain attempt to increase Sweden&#8217;s rankings) and some traditional statistical analysis. <strong>Update:</strong> Publications per capita added. </p>
]]></content:encoded>
			<wfw:commentRss>http://rpsychologist.com/pubmed-publications-in-2011-by-202-world-countries-whos-the-winner/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>
