Background

View code on GitHub

This repository presents a randomized A/B test examining whether managers’ expectations about relationship consequences influence communication with employees.

The core question is whether managers are more likely to send a dominant message, using coercion and threat, when they believe it will improve—rather than harm—their relationship with an employee. The goal is to isolate this mechanism using a clean experimental intervention and a concrete incentive-compatible behavioral decision.

Design Overview

Participants were instructed to act as managers assigning a challenging task to an employee whose performance affected the manager’s bonus.

Before making a communication choice, participants were randomly assigned to one of two conditions:

  • Positive relationship impact: reflecting on the positive impact a dominant message will have on their relationship with the employee.
  • Negative relationship impact: reflecting on the negative impact a dominant message will have on their relationship with the employee.

All other elements of the scenario—including task demands, incentives, and message options—were held constant.

As the primary outcome of the experiment, participants could send one of two messages to motivate the employee: a dominant message and a non-dominant message.

Participants

total_n <- df %>% 
  nrow()

df_elg <- df %>% 
  filter(is_elg == 1) 

eligible_n = df_elg %>% 
  nrow()

n_white <- df_elg %>% 
  group_by(race) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(race == "white") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

n_man <- df_elg %>% 
  mutate(gender = ifelse(is.na(gender) | gender == "","other",gender)) %>% 
  group_by(gender) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(gender == "man") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

mean_age <- df_elg %>% 
  summarise(age_mean = round(mean(age,na.rm = T),2)) %>% 
  unlist() %>% 
  unname()

median_income_num <- df_elg %>% 
  mutate(income = factor(income,c("$0-$20,000",
                                  "$20,001-$40,000",
                                  "$40,001-$60,000",
                                  "$60,001-$80,000",
                                  "$80,001-$100,000",
                                  "$100,001-$120,000",
                                  "$120,001-$140,000",
                                  "$140,001-$160,000",
                                  "$160,001-$180,000",
                                  "$180,001-$200,000",
                                  "Over $200,000")),
         income_num = as.numeric(income)) %>% 
  summarise(median = median(income_num,na.rm = T))

median_income <- median_income_num %>% 
  mutate(income_char = case_when(median == 1 ~ "$0-$20,000",
                                 median == 2 ~ "$20,001-$40,000",
                                 median == 3 ~ "$40,001-$60,000",
                                 median == 4 ~ "$60,001-$80,000",
                                 median == 5 ~ "$80,001-$100,000",
                                 median == 6 ~ "$100,001-$120,000",
                                 median == 7 ~ "$120,001-$140,000",
                                 median == 8 ~ "$140,001-$160,000",
                                 median == 9 ~ "$160,001-$180,000",
                                 median == 10 ~ "$180,001-$200,000",
                                 median == 11 ~ "Over $200,000")) %>% 
  select(income_char) %>% 
  unlist() %>% 
  unname()

I recruited 503 participants from Connect by CloudResearch (online U.S. panel) on May 27th, 2025. After preregistered attention and bot checks, 492 eligible responses remained in the final sample (N white = 335; N men = 249; M age = 38.99; Median income = $80,001-$100,000).

Treatment

All participants saw both the the dominant message and the non-dominant message:

Dominant Message

By now you know the task at hand. It’s time to get in there and do your absolute best across all rounds. If you don’t complete it and do it well, you will not get the full bonus.

Non-Dominant Message

Your job in this task is to select the shapes that match the description. Please make sure you look at them carefully. It would be great if you can get as many of them right as possible.

Those in the Positive Relationship Impact Condition were told to reflect on how the employee might react positively to the dominant message, whereas those in the Negative Relationship Impact Condition were told to reflect on how the employee might react negatively to the dominant message.

Before you make your choice of which message you want to send, think for a moment about how your employee might react [positively/negatively] to this message:

[dominant message]

How and why might the employee have a positive/negative (or at least not negative/positive) reaction to that message, affecting their attitude towards the manager?

In the space below, please write 1-2 sentences about positive thoughts or feelings they could have about the manager and their relationship with them.

Treatment check

I validated that participants engaged with the treatment in the way I intended: (1) a single-item self-report on the expected employee attitude toward them; (2) a lexical valence analysis of the open-ended responses; and (3) a word count, by condition, to make sure that they engaged with both prompts similarly.

(1) Self-report item

Immediately after the treatment, participants estimated the employee’s attitude toward them if they were to send the dominant message (1 = Extremely Negative to 7 = Extremely Positive).

df_elg <- df_elg %>% 
  mutate(cond = factor(cond,levels = c("pos","neg")))

df_elg %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(mean(pred_att,na.rm = T),2),
            SD = round(sd(pred_att,na.rm = T),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
cond N Mean SD
pos 247 4.39 1.65
neg 245 3.31 1.67
m <- t.test(pred_att ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]

Confirming that the manipulation shifted relational expectations as intended, participants in the Positive Relationship Impact Condition expected a more positive reaction to the dominant message than those in the Negative Relationship Impact Condition (t(489.73) = 7.25, p = 0, Lower CI = 0.79, Upper CI = 1.38, d = 0.66).

(2) Lexical valence analysis

First, these are the top-10 most frequent sentiment-scored words (AFINN), by condition:

Positive Relationship Impact Condition

afinn <- get_sentiments("afinn")

df_elg %>% 
  select(PID,cond,reflection) %>% 
  unnest_tokens(word,reflection) %>% 
  inner_join(afinn,by = "word") %>% 
  group_by(cond,word) %>% 
  summarise(n = n()) %>% 
  ungroup() %>% 
  arrange(cond,desc(n)) %>% 
  group_by(cond) %>% 
  slice(1:10) %>% 
  ungroup() %>% 
  filter(cond == "pos") %>% 
  select(-cond) %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
word n
best 76
positive 70
like 36
positively 27
want 25
motivated 19
good 18
appreciate 17
hard 16
clear 15

Negative Relationship Impact Condition

df_elg %>% 
  select(PID,cond,reflection) %>% 
  unnest_tokens(word,reflection) %>% 
  inner_join(afinn,by = "word") %>% 
  group_by(cond,word) %>% 
  summarise(n = n()) %>% 
  ungroup() %>% 
  arrange(cond,desc(n)) %>% 
  group_by(cond) %>% 
  slice(1:10) %>% 
  ungroup() %>% 
  filter(cond == "neg") %>% 
  select(-cond) %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
word n
negative 70
like 68
threatening 39
threat 34
pressure 31
pressured 23
demanding 21
best 20
want 15
good 13

Now, let’s compare the valence of the two conditions. To that end, each participant will get a valence score based on the sum of valence scores for detectable words (those in the AFINN lexicon that they used). As opposed to taking the mean, this accounts for the number of sentiment-bearing words used and reflects the overall sentiment of the response better. Those who did not use any detectable words are dropped from this analysis. Below are the mean sum scores per condition.

PID_valence <- df_elg %>% 
  select(PID,cond,reflection) %>% 
  unnest_tokens(word,reflection) %>% 
  inner_join(afinn,by = "word") %>% 
  group_by(PID) %>% 
  summarise(value = sum(value)) %>% 
  ungroup()

remaining_pos <- df_elg %>% 
  select(PID,cond) %>% 
  inner_join(PID_valence,by = "PID") %>% 
  group_by(cond) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(cond == "pos") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

remaining_neg <- df_elg %>% 
  select(PID,cond) %>% 
  inner_join(PID_valence,by = "PID") %>% 
  group_by(cond) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(cond == "neg") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

total_pos <- df_elg %>% 
  select(PID,cond) %>% 
  group_by(cond) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(cond == "pos") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

total_neg <- df_elg %>% 
  select(PID,cond) %>% 
  group_by(cond) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(cond == "neg") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

perc_overall = round(100*nrow(PID_valence)/eligible_n,2)
perc_pos = round(100*remaining_pos/total_pos,2)
perc_neg = round(100*remaining_neg/total_neg,2)


df_elg %>% 
  select(PID,cond) %>% 
  inner_join(PID_valence,by = "PID") %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(mean(value),2),
            SD = round(sd(value),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
cond N Mean SD
pos 238 4.47 3.33
neg 238 -0.58 3.62

This analysis covers participants who used at least one word present in the AFINN lexicon (96.75% of total reflections; 96.36% of positive condition reflections; 97.14% of negative condition reflections).

m <- t.test(value ~ cond,data = df_elg %>% 
  select(PID,cond) %>% 
  inner_join(PID_valence,by = "PID"))

d_mod <- cohens_d(m)
d = d_mod[1,1]

Indeed, those in the Positive Relationship Impact Condition responded using more positively valenced language on average than those in the Negative Relationship Impact Condition (t(470.64) = 15.84, p = 0, Lower CI = 4.42, Upper CI = 5.68, d = 1.46).

(3) Word-count

I also wanted to make sure that participants in the two conditions did not differ too much in the length of their response. To that end, I calculate and compare the word-count of each reflection and compare the two conditions.

PID_wordcount <- df_elg %>%
  transmute(PID,
            n_words = str_count(str_squish(reflection), "\\S+"))

df_elg %>% 
  select(PID,cond) %>% 
  left_join(PID_wordcount,by = "PID") %>% 
  group_by(cond) %>% 
  summarise(Mean = round(mean(n_words),2),
            SD = round(sd(n_words),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
cond Mean SD
pos 27.34 10.73
neg 28.40 13.20
m <- t.test(n_words ~ cond,data = df_elg %>% 
  select(PID,cond) %>% 
  left_join(PID_wordcount,by = "PID"))

d_mod <- cohens_d(m)
d = d_mod[1,1]

Indeed, the difference in word-count between participants in the Positive Relationship Impact Condition and participants in the Negative Relationship Impact Condition is not statistically significant (t(468.85) = -0.98, p = 0.325, Lower CI = -3.2, Upper CI = 1.06, d = -0.09), suggesting that participants in both conditions engaged with the treatment similarly. Because response lengths were similar across conditions, sum-based valence is less likely to be mechanically driven by length of response.

Primary DV

After the treatment, participants were asked which message they want to send to the employees. These are the ratios, within each condition, of sending the dominant message to the employee.

Descriptives

pos_share <- df_elg %>% 
  group_by(cond) %>% 
  summarise(Share = round(100*mean(choicedom,na.rm = T),2)) %>% 
  ungroup() %>% 
  filter(cond == "pos") %>% 
  select(Share) %>% 
  unlist() %>% 
  unname()

neg_share <- df_elg %>% 
  group_by(cond) %>% 
  summarise(Share = round(100*mean(choicedom,na.rm = T),2)) %>% 
  ungroup() %>% 
  filter(cond == "neg") %>% 
  select(Share) %>% 
  unlist() %>% 
  unname()

df_elg %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(100*mean(choicedom,na.rm = T),2),
            SD = round(100*sd(choicedom,na.rm = T),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
cond N Mean SD
pos 247 29.55 45.72
neg 245 11.02 31.38

Logistic Regression

m1 <- glm(choicedom ~ cond, family = binomial(link = "logit"),df_elg %>% mutate(cond = factor(cond,levels = c("neg","pos"))))

ci_low = confint(m1)[2,1]
or_ci_low <- exp(ci_low)
ci_high = confint(m1)[2,2]
or_ci_high <- exp(ci_high)

apa_lm <- apa_print(m1)
 
kbl(apa_lm$table) %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
term estimate conf.int statistic p.value
Intercept -2.09 [-2.51, -1.71] -10.24 < .001
Condpos 1.22 [0.75, 1.72] 4.94 < .001

A logistic regression indicates that participants in the Positive Relationship Impact Condition were 3.39 times more likely to send the dominant message than those in the Negative Relationship Impact Condition (odds ratio; logit coefficient = 1.22; 95% CI = [2.11, 5.58]).

In probability terms, the share choosing the dominant message increased from 11.02% in the Negative Relationship Impact Condition to 29.55% in the Positive Relationship Impact Condition.

fig1 <- df_elg %>% 
  mutate(cond_char = ifelse(cond == "pos","Positive Relationship\nImpact Condition","Negative Relationship\nImpact Condition")) %>% 
  ggplot(aes(x = cond_char,y = choicedom)) +
  stat_summary(fun.data = "mean_cl_boot",
               size = 0.5,
               geom = "errorbar",
               width = 0.05,
               color = "#080807",
               position = position_nudge(0)) +
  stat_summary(fun = "mean",
               geom = "point",
               size = 2.3,
               fill = "black",
               color = "black",
               position = position_nudge(0)) +
  stat_summary(fun = "mean",
               shape = 1,
               geom = "point",
               color = "black",
               fill = "black",
               position = position_nudge(0)) +
  scale_y_continuous(limits = c(-.02,1.02),
                     breaks = seq(0,1,0.2),
                     labels = c("0%","20%","40%","60%","80%","100%")) +
  ylab("Share Who Sent\n the Dominant Message") +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.grid.major.y = element_line(color = "grey80",
                                          linetype = "dashed"),
        axis.ticks = element_blank(),
        axis.line = element_line(color = "grey66"),
        axis.text.x = element_text(color = "black",
                                   face = "bold",
                                   size = 12),
        axis.text.y = element_text(color = "grey30",
                                   size = 10),
        axis.title.y = element_text(color = "black",
                                   face = "bold",
                                   size = 12),
        axis.title.x = element_blank(),
        legend.position = "none",
        title = element_text(color = "black",
                             size = 12,
                             face = "bold"))

#png("treatment_effect.png",width = 360,height = 320,units = "px")
fig1

Secondary DV

To increase variance and offer potential nuance to the treatment effect, I created a continuous dependent variable as well: preference for selected message. After selecting the message, participants indicated the extent to which they preferred the message they selected (1 = Slightly preferred to 3 = Strongly preferred). Later, I coded this response, in combination with the message selection response, as a 6-point dominance scale (1 = Strongly preferred the non-dominant message to 6 = Strongly preferred the dominant message).

Descriptives

df_elg %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(mean(pref_cont,na.rm = T),2),
            SD = round(sd(pref_cont,na.rm = T),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
cond N Mean SD
pos 247 2.54 1.63
neg 245 1.83 1.18

Two-sample t-test

m <- t.test(pref_cont ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]


t(447.92) = 5.51, p = 0, Lower CI = 0.45, Upper CI = 0.96, d = 0.52.

Results were consistent when using a continuous 6-point preference scale as the outcome rather than a simple binary choice.

Robustness check

To make sure that the treatment effect is not driven by expected compliance, we also asked participants to indicate how much of the task they believe the employee will complete if they were to receive the dominant message (0 = minimum score on the task to 50 = maximum score on the task). This, after all, could be the main motivator for message selection because it directly impacts participants’ bonus.

Descriptives

df_elg %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(mean(pred_comp,na.rm = T),2),
            SD = round(sd(pred_comp,na.rm = T),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
cond N Mean SD
pos 247 37.06 9.54
neg 245 34.48 10.46

Two-sample t-test

m <- t.test(pred_comp ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]

t(485.15) = 2.86, p = 0.004, Lower CI = 0.81, Upper CI = 4.35, d = 0.26.

Indeed, there is a treatment effect of condition on expected compliance. Let’s add it as a control variable to the logistic regression model.

Logistic Regression

m2 <- glm(choicedom ~ cond + pred_comp, family = binomial(link = "logit"),df_elg %>% mutate(cond = factor(cond,levels = c("neg","pos"))))

ci_low = confint(m2)[2,1]
or_ci_low <- exp(ci_low)
ci_high = confint(m2)[2,2]
or_ci_high <- exp(ci_high)

apa_lm <- apa_print(m2)
 
kbl(apa_lm$table) %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")
term estimate conf.int statistic p.value
Intercept -4.08 [-5.26, -3.02] -7.16 < .001
Condpos 1.14 [0.66, 1.64] 4.53 < .001
Pred comp 0.05 [0.03, 0.08] 3.94 < .001

Controlling for predicted compliance, the condition effect on message selection remained strong and statistically significant. In the adjusted model, participants in the Positive Relationship Impact Condition were 3.12 times more likely to send the dominant message than those in the Negative Relationship Impact Condition (odds ratio; logit coefficient = 1.14; 95% CI = [1.93, 5.18]).

Summary & Takeaways

  • The reflection manipulation successfully shifted relational expectations about how an employee might respond to a dominant message.
  • This shift led to a substantially higher likelihood of sending the dominant message, with participants in the Positive Relationship Impact Condition 3.39 times more likely to select it than those in the Negative Relationship Impact Condition.
  • The effect remained robust when controlling for expected task compliance and when using a continuous dominance-preference scale as the outcome.
  • This analysis illustrates a typical A/B testing workflow with a binary behavioral outcome, including treatment checking, effect estimation, visualization, and robustness checks.
  • The results highlight how framing leaders’ expectations about relational consequences can causally shift communication choices, even when financial incentives are held constant.