Background
View
code on GitHub
This repository presents a randomized A/B test
examining whether managers’ expectations about relationship consequences
influence communication with employees.
The core question is whether managers are more likely to send a
dominant message, using coercion and threat, when they believe it will
improve—rather than harm—their relationship with an employee. The goal
is to isolate this mechanism using a clean experimental intervention and
a concrete incentive-compatible behavioral decision.
Design Overview
Participants were instructed to act as managers assigning a
challenging task to an employee whose performance affected the manager’s
bonus.
Before making a communication choice, participants were randomly
assigned to one of two conditions:
- Positive relationship impact: reflecting on the
positive impact a dominant message will have on their
relationship with the employee.
- Negative relationship impact: reflecting on the
negative impact a dominant message will have on their
relationship with the employee.
All other elements of the scenario—including task demands,
incentives, and message options—were held constant.
As the primary outcome of the experiment, participants could send one
of two messages to motivate the employee: a dominant message and a
non-dominant message.
Participants
total_n <- df %>%
nrow()
df_elg <- df %>%
filter(is_elg == 1)
eligible_n = df_elg %>%
nrow()
n_white <- df_elg %>%
group_by(race) %>%
summarise(N = n()) %>%
ungroup() %>%
filter(race == "white") %>%
select(N) %>%
unlist() %>%
unname()
n_man <- df_elg %>%
mutate(gender = ifelse(is.na(gender) | gender == "","other",gender)) %>%
group_by(gender) %>%
summarise(N = n()) %>%
ungroup() %>%
filter(gender == "man") %>%
select(N) %>%
unlist() %>%
unname()
mean_age <- df_elg %>%
summarise(age_mean = round(mean(age,na.rm = T),2)) %>%
unlist() %>%
unname()
median_income_num <- df_elg %>%
mutate(income = factor(income,c("$0-$20,000",
"$20,001-$40,000",
"$40,001-$60,000",
"$60,001-$80,000",
"$80,001-$100,000",
"$100,001-$120,000",
"$120,001-$140,000",
"$140,001-$160,000",
"$160,001-$180,000",
"$180,001-$200,000",
"Over $200,000")),
income_num = as.numeric(income)) %>%
summarise(median = median(income_num,na.rm = T))
median_income <- median_income_num %>%
mutate(income_char = case_when(median == 1 ~ "$0-$20,000",
median == 2 ~ "$20,001-$40,000",
median == 3 ~ "$40,001-$60,000",
median == 4 ~ "$60,001-$80,000",
median == 5 ~ "$80,001-$100,000",
median == 6 ~ "$100,001-$120,000",
median == 7 ~ "$120,001-$140,000",
median == 8 ~ "$140,001-$160,000",
median == 9 ~ "$160,001-$180,000",
median == 10 ~ "$180,001-$200,000",
median == 11 ~ "Over $200,000")) %>%
select(income_char) %>%
unlist() %>%
unname()
I recruited 503 participants from Connect by
CloudResearch (online U.S. panel) on May 27th, 2025. After preregistered
attention and bot checks, 492 eligible responses
remained in the final sample (N white = 335; N men =
249; M age = 38.99; Median income =
$80,001-$100,000).
Treatment
All participants saw both the the dominant message and the
non-dominant message:
Dominant Message
By now you know the task at hand. It’s time to get in there and
do your absolute best across all rounds. If you don’t complete it and do
it well, you will not get the full bonus.
Non-Dominant Message
Your job in this task is to select the shapes that match the
description. Please make sure you look at them carefully. It would be
great if you can get as many of them right as possible.
Those in the Positive Relationship Impact Condition were
told to reflect on how the employee might react
positively to the dominant message, whereas those in
the Negative Relationship Impact Condition were told to reflect
on how the employee might react negatively to the
dominant message.
Before you make your choice of which message you want to send,
think for a moment about how your employee might react
[positively/negatively] to this message:
[dominant message]
How and why might the employee have a positive/negative (or at
least not negative/positive) reaction to that message, affecting their
attitude towards the manager?
In the space below, please write 1-2 sentences about positive
thoughts or feelings they could have about the manager and their
relationship with them.
Treatment check
I validated that participants engaged with the treatment in the way I
intended: (1) a single-item self-report on the expected employee
attitude toward them; (2) a lexical valence analysis of the open-ended
responses; and (3) a word count, by condition, to make sure that they
engaged with both prompts similarly.
(1) Self-report item
Immediately after the treatment, participants estimated the
employee’s attitude toward them if they were to send the dominant
message (1 = Extremely Negative to 7 = Extremely
Positive).
df_elg <- df_elg %>%
mutate(cond = factor(cond,levels = c("pos","neg")))
df_elg %>%
group_by(cond) %>%
summarise(N = n(),
Mean = round(mean(pred_att,na.rm = T),2),
SD = round(sd(pred_att,na.rm = T),2)) %>%
ungroup() %>%
kbl() %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
cond
|
N
|
Mean
|
SD
|
|
pos
|
247
|
4.39
|
1.65
|
|
neg
|
245
|
3.31
|
1.67
|
m <- t.test(pred_att ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]
Confirming that the manipulation shifted relational expectations as
intended, participants in the Positive Relationship Impact
Condition expected a more positive reaction to the dominant message
than those in the Negative Relationship Impact Condition
(t(489.73) = 7.25, p = 0, Lower CI = 0.79,
Upper CI = 1.38, d = 0.66).
(2) Lexical valence analysis
First, these are the top-10 most frequent sentiment-scored words
(AFINN), by condition:
Positive Relationship Impact Condition
afinn <- get_sentiments("afinn")
df_elg %>%
select(PID,cond,reflection) %>%
unnest_tokens(word,reflection) %>%
inner_join(afinn,by = "word") %>%
group_by(cond,word) %>%
summarise(n = n()) %>%
ungroup() %>%
arrange(cond,desc(n)) %>%
group_by(cond) %>%
slice(1:10) %>%
ungroup() %>%
filter(cond == "pos") %>%
select(-cond) %>%
kbl() %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
word
|
n
|
|
best
|
76
|
|
positive
|
70
|
|
like
|
36
|
|
positively
|
27
|
|
want
|
25
|
|
motivated
|
19
|
|
good
|
18
|
|
appreciate
|
17
|
|
hard
|
16
|
|
clear
|
15
|
Negative Relationship Impact Condition
df_elg %>%
select(PID,cond,reflection) %>%
unnest_tokens(word,reflection) %>%
inner_join(afinn,by = "word") %>%
group_by(cond,word) %>%
summarise(n = n()) %>%
ungroup() %>%
arrange(cond,desc(n)) %>%
group_by(cond) %>%
slice(1:10) %>%
ungroup() %>%
filter(cond == "neg") %>%
select(-cond) %>%
kbl() %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
word
|
n
|
|
negative
|
70
|
|
like
|
68
|
|
threatening
|
39
|
|
threat
|
34
|
|
pressure
|
31
|
|
pressured
|
23
|
|
demanding
|
21
|
|
best
|
20
|
|
want
|
15
|
|
good
|
13
|
Now, let’s compare the valence of the two conditions. To that end,
each participant will get a valence score based on the sum of valence
scores for detectable words (those in the AFINN lexicon that they used).
As opposed to taking the mean, this accounts for the number of
sentiment-bearing words used and reflects the overall sentiment of the
response better. Those who did not use any detectable words are dropped
from this analysis. Below are the mean sum scores per condition.
PID_valence <- df_elg %>%
select(PID,cond,reflection) %>%
unnest_tokens(word,reflection) %>%
inner_join(afinn,by = "word") %>%
group_by(PID) %>%
summarise(value = sum(value)) %>%
ungroup()
remaining_pos <- df_elg %>%
select(PID,cond) %>%
inner_join(PID_valence,by = "PID") %>%
group_by(cond) %>%
summarise(N = n()) %>%
ungroup() %>%
filter(cond == "pos") %>%
select(N) %>%
unlist() %>%
unname()
remaining_neg <- df_elg %>%
select(PID,cond) %>%
inner_join(PID_valence,by = "PID") %>%
group_by(cond) %>%
summarise(N = n()) %>%
ungroup() %>%
filter(cond == "neg") %>%
select(N) %>%
unlist() %>%
unname()
total_pos <- df_elg %>%
select(PID,cond) %>%
group_by(cond) %>%
summarise(N = n()) %>%
ungroup() %>%
filter(cond == "pos") %>%
select(N) %>%
unlist() %>%
unname()
total_neg <- df_elg %>%
select(PID,cond) %>%
group_by(cond) %>%
summarise(N = n()) %>%
ungroup() %>%
filter(cond == "neg") %>%
select(N) %>%
unlist() %>%
unname()
perc_overall = round(100*nrow(PID_valence)/eligible_n,2)
perc_pos = round(100*remaining_pos/total_pos,2)
perc_neg = round(100*remaining_neg/total_neg,2)
df_elg %>%
select(PID,cond) %>%
inner_join(PID_valence,by = "PID") %>%
group_by(cond) %>%
summarise(N = n(),
Mean = round(mean(value),2),
SD = round(sd(value),2)) %>%
ungroup() %>%
kbl() %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
cond
|
N
|
Mean
|
SD
|
|
pos
|
238
|
4.47
|
3.33
|
|
neg
|
238
|
-0.58
|
3.62
|
This analysis covers participants who used at least one word present
in the AFINN lexicon (96.75% of total reflections; 96.36% of positive
condition reflections; 97.14% of negative condition reflections).
m <- t.test(value ~ cond,data = df_elg %>%
select(PID,cond) %>%
inner_join(PID_valence,by = "PID"))
d_mod <- cohens_d(m)
d = d_mod[1,1]
Indeed, those in the Positive Relationship Impact Condition
responded using more positively valenced language on average than those
in the Negative Relationship Impact Condition
(t(470.64) = 15.84, p = 0, Lower CI = 4.42,
Upper CI = 5.68, d = 1.46).
(3) Word-count
I also wanted to make sure that participants in the two conditions
did not differ too much in the length of their response. To that end, I
calculate and compare the word-count of each reflection and compare the
two conditions.
PID_wordcount <- df_elg %>%
transmute(PID,
n_words = str_count(str_squish(reflection), "\\S+"))
df_elg %>%
select(PID,cond) %>%
left_join(PID_wordcount,by = "PID") %>%
group_by(cond) %>%
summarise(Mean = round(mean(n_words),2),
SD = round(sd(n_words),2)) %>%
ungroup() %>%
kbl() %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
cond
|
Mean
|
SD
|
|
pos
|
27.34
|
10.73
|
|
neg
|
28.40
|
13.20
|
m <- t.test(n_words ~ cond,data = df_elg %>%
select(PID,cond) %>%
left_join(PID_wordcount,by = "PID"))
d_mod <- cohens_d(m)
d = d_mod[1,1]
Indeed, the difference in word-count between participants in the
Positive Relationship Impact Condition and participants in the
Negative Relationship Impact Condition is not statistically
significant (t(468.85) = -0.98, p = 0.325, Lower
CI = -3.2, Upper CI = 1.06, d = -0.09),
suggesting that participants in both conditions engaged with the
treatment similarly. Because response lengths were similar across
conditions, sum-based valence is less likely to be mechanically driven
by length of response.
Primary DV
After the treatment, participants were asked which message they want
to send to the employees. These are the ratios, within each condition,
of sending the dominant message to the employee.
Descriptives
pos_share <- df_elg %>%
group_by(cond) %>%
summarise(Share = round(100*mean(choicedom,na.rm = T),2)) %>%
ungroup() %>%
filter(cond == "pos") %>%
select(Share) %>%
unlist() %>%
unname()
neg_share <- df_elg %>%
group_by(cond) %>%
summarise(Share = round(100*mean(choicedom,na.rm = T),2)) %>%
ungroup() %>%
filter(cond == "neg") %>%
select(Share) %>%
unlist() %>%
unname()
df_elg %>%
group_by(cond) %>%
summarise(N = n(),
Mean = round(100*mean(choicedom,na.rm = T),2),
SD = round(100*sd(choicedom,na.rm = T),2)) %>%
ungroup() %>%
kbl() %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
cond
|
N
|
Mean
|
SD
|
|
pos
|
247
|
29.55
|
45.72
|
|
neg
|
245
|
11.02
|
31.38
|
Logistic Regression
m1 <- glm(choicedom ~ cond, family = binomial(link = "logit"),df_elg %>% mutate(cond = factor(cond,levels = c("neg","pos"))))
ci_low = confint(m1)[2,1]
or_ci_low <- exp(ci_low)
ci_high = confint(m1)[2,2]
or_ci_high <- exp(ci_high)
apa_lm <- apa_print(m1)
kbl(apa_lm$table) %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
term
|
estimate
|
conf.int
|
statistic
|
p.value
|
|
Intercept
|
-2.09
|
[-2.51, -1.71]
|
-10.24
|
< .001
|
|
Condpos
|
1.22
|
[0.75, 1.72]
|
4.94
|
< .001
|
A logistic regression indicates that participants in the Positive
Relationship Impact Condition were 3.39 times more
likely to send the dominant message than those in the Negative
Relationship Impact Condition (odds ratio; logit coefficient =
1.22; 95% CI = [2.11, 5.58]).
In probability terms, the share choosing the dominant message
increased from 11.02% in the Negative Relationship
Impact Condition to 29.55% in the Positive
Relationship Impact Condition.
fig1 <- df_elg %>%
mutate(cond_char = ifelse(cond == "pos","Positive Relationship\nImpact Condition","Negative Relationship\nImpact Condition")) %>%
ggplot(aes(x = cond_char,y = choicedom)) +
stat_summary(fun.data = "mean_cl_boot",
size = 0.5,
geom = "errorbar",
width = 0.05,
color = "#080807",
position = position_nudge(0)) +
stat_summary(fun = "mean",
geom = "point",
size = 2.3,
fill = "black",
color = "black",
position = position_nudge(0)) +
stat_summary(fun = "mean",
shape = 1,
geom = "point",
color = "black",
fill = "black",
position = position_nudge(0)) +
scale_y_continuous(limits = c(-.02,1.02),
breaks = seq(0,1,0.2),
labels = c("0%","20%","40%","60%","80%","100%")) +
ylab("Share Who Sent\n the Dominant Message") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
panel.grid.major.y = element_line(color = "grey80",
linetype = "dashed"),
axis.ticks = element_blank(),
axis.line = element_line(color = "grey66"),
axis.text.x = element_text(color = "black",
face = "bold",
size = 12),
axis.text.y = element_text(color = "grey30",
size = 10),
axis.title.y = element_text(color = "black",
face = "bold",
size = 12),
axis.title.x = element_blank(),
legend.position = "none",
title = element_text(color = "black",
size = 12,
face = "bold"))
#png("treatment_effect.png",width = 360,height = 320,units = "px")
fig1

Secondary DV
To increase variance and offer potential nuance to the treatment
effect, I created a continuous dependent variable as well: preference
for selected message. After selecting the message, participants
indicated the extent to which they preferred the message they selected
(1 = Slightly preferred to 3 = Strongly preferred).
Later, I coded this response, in combination with the message selection
response, as a 6-point dominance scale (1 = Strongly preferred the
non-dominant message to 6 = Strongly preferred the dominant
message).
Descriptives
df_elg %>%
group_by(cond) %>%
summarise(N = n(),
Mean = round(mean(pref_cont,na.rm = T),2),
SD = round(sd(pref_cont,na.rm = T),2)) %>%
ungroup() %>%
kbl() %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
cond
|
N
|
Mean
|
SD
|
|
pos
|
247
|
2.54
|
1.63
|
|
neg
|
245
|
1.83
|
1.18
|
Two-sample t-test
m <- t.test(pref_cont ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]
t(447.92) = 5.51, p = 0, Lower CI = 0.45,
Upper CI = 0.96, d = 0.52.
Results were consistent when using a continuous 6-point preference
scale as the outcome rather than a simple binary choice.
Robustness check
To make sure that the treatment effect is not driven by expected
compliance, we also asked participants to indicate how much of the task
they believe the employee will complete if they were to receive the
dominant message (0 = minimum score on the task to 50 =
maximum score on the task). This, after all, could be the main
motivator for message selection because it directly impacts
participants’ bonus.
Descriptives
df_elg %>%
group_by(cond) %>%
summarise(N = n(),
Mean = round(mean(pred_comp,na.rm = T),2),
SD = round(sd(pred_comp,na.rm = T),2)) %>%
ungroup() %>%
kbl() %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
cond
|
N
|
Mean
|
SD
|
|
pos
|
247
|
37.06
|
9.54
|
|
neg
|
245
|
34.48
|
10.46
|
Two-sample t-test
m <- t.test(pred_comp ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]
t(485.15) = 2.86, p = 0.004, Lower CI =
0.81, Upper CI = 4.35, d = 0.26.
Indeed, there is a treatment effect of condition on expected
compliance. Let’s add it as a control variable to the logistic
regression model.
Logistic Regression
m2 <- glm(choicedom ~ cond + pred_comp, family = binomial(link = "logit"),df_elg %>% mutate(cond = factor(cond,levels = c("neg","pos"))))
ci_low = confint(m2)[2,1]
or_ci_low <- exp(ci_low)
ci_high = confint(m2)[2,2]
or_ci_high <- exp(ci_high)
apa_lm <- apa_print(m2)
kbl(apa_lm$table) %>%
kable_styling(bootstrap_options = "hover",
full_width = F,
position = "left")
|
term
|
estimate
|
conf.int
|
statistic
|
p.value
|
|
Intercept
|
-4.08
|
[-5.26, -3.02]
|
-7.16
|
< .001
|
|
Condpos
|
1.14
|
[0.66, 1.64]
|
4.53
|
< .001
|
|
Pred comp
|
0.05
|
[0.03, 0.08]
|
3.94
|
< .001
|
Controlling for predicted compliance, the condition effect on message
selection remained strong and statistically significant. In the adjusted
model, participants in the Positive Relationship Impact
Condition were 3.12 times more likely to send the
dominant message than those in the Negative Relationship Impact
Condition (odds ratio; logit coefficient = 1.14; 95% CI = [1.93,
5.18]).
Summary & Takeaways
- The reflection manipulation successfully shifted relational
expectations about how an employee might respond to a dominant
message.
- This shift led to a substantially higher likelihood of sending the
dominant message, with participants in the Positive Relationship
Impact Condition 3.39 times more likely to select
it than those in the Negative Relationship Impact
Condition.
- The effect remained robust when controlling for expected task
compliance and when using a continuous dominance-preference scale as the
outcome.
- This analysis illustrates a typical A/B testing workflow with a
binary behavioral outcome, including treatment checking, effect
estimation, visualization, and robustness checks.
- The results highlight how framing leaders’ expectations about
relational consequences can causally shift communication choices, even
when financial incentives are held constant.