Background

View code on GitHub

This repository presents a randomized A/B test examining whether managers’ expectations about relationship consequences influence communication with employees.

The core question is whether managers are more likely to send a dominant message, using coercion and threat, when they believe it will improve—rather than harm—their relationship with an employee. The goal is to isolate this mechanism using a clean experimental intervention and a concrete incentive-compatible behavioral decision.

Design Overview

Participants were instructed to act as managers assigning a challenging task to an employee whose performance affected the manager’s bonus.

Before making a communication choice, participants were randomly assigned to one of two conditions:

Positive relationship impact: reflecting on the positive impact a dominant message will have on their relationship with the employee.
Negative relationship impact: reflecting on the negative impact a dominant message will have on their relationship with the employee.

All other elements of the scenario—including task demands, incentives, and message options—were held constant.

As the primary outcome of the experiment, participants could send one of two messages to motivate the employee: a dominant message and a non-dominant message.

Participants

total_n <- df %>% 
  nrow()

df_elg <- df %>% 
  filter(is_elg == 1) 

eligible_n = df_elg %>% 
  nrow()

n_white <- df_elg %>% 
  group_by(race) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(race == "white") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

n_man <- df_elg %>% 
  mutate(gender = ifelse(is.na(gender) | gender == "","other",gender)) %>% 
  group_by(gender) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(gender == "man") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

mean_age <- df_elg %>% 
  summarise(age_mean = round(mean(age,na.rm = T),2)) %>% 
  unlist() %>% 
  unname()

median_income_num <- df_elg %>% 
  mutate(income = factor(income,c("$0-$20,000",
                                  "$20,001-$40,000",
                                  "$40,001-$60,000",
                                  "$60,001-$80,000",
                                  "$80,001-$100,000",
                                  "$100,001-$120,000",
                                  "$120,001-$140,000",
                                  "$140,001-$160,000",
                                  "$160,001-$180,000",
                                  "$180,001-$200,000",
                                  "Over $200,000")),
         income_num = as.numeric(income)) %>% 
  summarise(median = median(income_num,na.rm = T))

median_income <- median_income_num %>% 
  mutate(income_char = case_when(median == 1 ~ "$0-$20,000",
                                 median == 2 ~ "$20,001-$40,000",
                                 median == 3 ~ "$40,001-$60,000",
                                 median == 4 ~ "$60,001-$80,000",
                                 median == 5 ~ "$80,001-$100,000",
                                 median == 6 ~ "$100,001-$120,000",
                                 median == 7 ~ "$120,001-$140,000",
                                 median == 8 ~ "$140,001-$160,000",
                                 median == 9 ~ "$160,001-$180,000",
                                 median == 10 ~ "$180,001-$200,000",
                                 median == 11 ~ "Over $200,000")) %>% 
  select(income_char) %>% 
  unlist() %>% 
  unname()

I recruited 503 participants from Connect by CloudResearch (online U.S. panel) on May 27th, 2025. After preregistered attention and bot checks, 492 eligible responses remained in the final sample (N white = 335; N men = 249; M age = 38.99; Median income = $80,001-$100,000).

Treatment

All participants saw both the the dominant message and the non-dominant message:

Dominant Message

By now you know the task at hand. It’s time to get in there and do your absolute best across all rounds. If you don’t complete it and do it well, you will not get the full bonus.

Non-Dominant Message

Your job in this task is to select the shapes that match the description. Please make sure you look at them carefully. It would be great if you can get as many of them right as possible.

Those in the Positive Relationship Impact Condition were told to reflect on how the employee might react positively to the dominant message, whereas those in the Negative Relationship Impact Condition were told to reflect on how the employee might react negatively to the dominant message.

Before you make your choice of which message you want to send, think for a moment about how your employee might react [positively/negatively] to this message:

[dominant message]

How and why might the employee have a positive/negative (or at least not negative/positive) reaction to that message, affecting their attitude towards the manager?

In the space below, please write 1-2 sentences about positive thoughts or feelings they could have about the manager and their relationship with them.

Treatment check

I validated that participants engaged with the treatment in the way I intended: (1) a single-item self-report on the expected employee attitude toward them; (2) a lexical valence analysis of the open-ended responses; and (3) a word count, by condition, to make sure that they engaged with both prompts similarly.

(1) Self-report item

Immediately after the treatment, participants estimated the employee’s attitude toward them if they were to send the dominant message (1 = Extremely Negative to 7 = Extremely Positive).

df_elg <- df_elg %>% 
  mutate(cond = factor(cond,levels = c("pos","neg")))

df_elg %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(mean(pred_att,na.rm = T),2),
            SD = round(sd(pred_att,na.rm = T),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

cond	N	Mean	SD
pos	247	4.39	1.65
neg	245	3.31	1.67

m <- t.test(pred_att ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]

Confirming that the manipulation shifted relational expectations as intended, participants in the Positive Relationship Impact Condition expected a more positive reaction to the dominant message than those in the Negative Relationship Impact Condition (t(489.73) = 7.25, p = 0, Lower CI = 0.79, Upper CI = 1.38, d = 0.66).

(2) Lexical valence analysis

First, these are the top-10 most frequent sentiment-scored words (AFINN), by condition:

Positive Relationship Impact Condition

afinn <- get_sentiments("afinn")

df_elg %>% 
  select(PID,cond,reflection) %>% 
  unnest_tokens(word,reflection) %>% 
  inner_join(afinn,by = "word") %>% 
  group_by(cond,word) %>% 
  summarise(n = n()) %>% 
  ungroup() %>% 
  arrange(cond,desc(n)) %>% 
  group_by(cond) %>% 
  slice(1:10) %>% 
  ungroup() %>% 
  filter(cond == "pos") %>% 
  select(-cond) %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

word	n
best	76
positive	70
like	36
positively	27
want	25
motivated	19
good	18
appreciate	17
hard	16
clear	15

Negative Relationship Impact Condition

df_elg %>% 
  select(PID,cond,reflection) %>% 
  unnest_tokens(word,reflection) %>% 
  inner_join(afinn,by = "word") %>% 
  group_by(cond,word) %>% 
  summarise(n = n()) %>% 
  ungroup() %>% 
  arrange(cond,desc(n)) %>% 
  group_by(cond) %>% 
  slice(1:10) %>% 
  ungroup() %>% 
  filter(cond == "neg") %>% 
  select(-cond) %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

word	n
negative	70
like	68
threatening	39
threat	34
pressure	31
pressured	23
demanding	21
best	20
want	15
good	13

Now, let’s compare the valence of the two conditions. To that end, each participant will get a valence score based on the sum of valence scores for detectable words (those in the AFINN lexicon that they used). As opposed to taking the mean, this accounts for the number of sentiment-bearing words used and reflects the overall sentiment of the response better. Those who did not use any detectable words are dropped from this analysis. Below are the mean sum scores per condition.

PID_valence <- df_elg %>% 
  select(PID,cond,reflection) %>% 
  unnest_tokens(word,reflection) %>% 
  inner_join(afinn,by = "word") %>% 
  group_by(PID) %>% 
  summarise(value = sum(value)) %>% 
  ungroup()

remaining_pos <- df_elg %>% 
  select(PID,cond) %>% 
  inner_join(PID_valence,by = "PID") %>% 
  group_by(cond) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(cond == "pos") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

remaining_neg <- df_elg %>% 
  select(PID,cond) %>% 
  inner_join(PID_valence,by = "PID") %>% 
  group_by(cond) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(cond == "neg") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

total_pos <- df_elg %>% 
  select(PID,cond) %>% 
  group_by(cond) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(cond == "pos") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

total_neg <- df_elg %>% 
  select(PID,cond) %>% 
  group_by(cond) %>% 
  summarise(N = n()) %>% 
  ungroup() %>% 
  filter(cond == "neg") %>% 
  select(N) %>% 
  unlist() %>% 
  unname()

perc_overall = round(100*nrow(PID_valence)/eligible_n,2)
perc_pos = round(100*remaining_pos/total_pos,2)
perc_neg = round(100*remaining_neg/total_neg,2)


df_elg %>% 
  select(PID,cond) %>% 
  inner_join(PID_valence,by = "PID") %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(mean(value),2),
            SD = round(sd(value),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

cond	N	Mean	SD
pos	238	4.47	3.33
neg	238	-0.58	3.62

This analysis covers participants who used at least one word present in the AFINN lexicon (96.75% of total reflections; 96.36% of positive condition reflections; 97.14% of negative condition reflections).

m <- t.test(value ~ cond,data = df_elg %>% 
  select(PID,cond) %>% 
  inner_join(PID_valence,by = "PID"))

d_mod <- cohens_d(m)
d = d_mod[1,1]

Indeed, those in the Positive Relationship Impact Condition responded using more positively valenced language on average than those in the Negative Relationship Impact Condition (t(470.64) = 15.84, p = 0, Lower CI = 4.42, Upper CI = 5.68, d = 1.46).

(3) Word-count

I also wanted to make sure that participants in the two conditions did not differ too much in the length of their response. To that end, I calculate and compare the word-count of each reflection and compare the two conditions.

PID_wordcount <- df_elg %>%
  transmute(PID,
            n_words = str_count(str_squish(reflection), "\\S+"))

df_elg %>% 
  select(PID,cond) %>% 
  left_join(PID_wordcount,by = "PID") %>% 
  group_by(cond) %>% 
  summarise(Mean = round(mean(n_words),2),
            SD = round(sd(n_words),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

cond	Mean	SD
pos	27.34	10.73
neg	28.40	13.20

m <- t.test(n_words ~ cond,data = df_elg %>% 
  select(PID,cond) %>% 
  left_join(PID_wordcount,by = "PID"))

d_mod <- cohens_d(m)
d = d_mod[1,1]

Indeed, the difference in word-count between participants in the Positive Relationship Impact Condition and participants in the Negative Relationship Impact Condition is not statistically significant (t(468.85) = -0.98, p = 0.325, Lower CI = -3.2, Upper CI = 1.06, d = -0.09), suggesting that participants in both conditions engaged with the treatment similarly. Because response lengths were similar across conditions, sum-based valence is less likely to be mechanically driven by length of response.

Primary DV

After the treatment, participants were asked which message they want to send to the employees. These are the ratios, within each condition, of sending the dominant message to the employee.

Descriptives

pos_share <- df_elg %>% 
  group_by(cond) %>% 
  summarise(Share = round(100*mean(choicedom,na.rm = T),2)) %>% 
  ungroup() %>% 
  filter(cond == "pos") %>% 
  select(Share) %>% 
  unlist() %>% 
  unname()

neg_share <- df_elg %>% 
  group_by(cond) %>% 
  summarise(Share = round(100*mean(choicedom,na.rm = T),2)) %>% 
  ungroup() %>% 
  filter(cond == "neg") %>% 
  select(Share) %>% 
  unlist() %>% 
  unname()

df_elg %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(100*mean(choicedom,na.rm = T),2),
            SD = round(100*sd(choicedom,na.rm = T),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

cond	N	Mean	SD
pos	247	29.55	45.72
neg	245	11.02	31.38

Logistic Regression

m1 <- glm(choicedom ~ cond, family = binomial(link = "logit"),df_elg %>% mutate(cond = factor(cond,levels = c("neg","pos"))))

ci_low = confint(m1)[2,1]
or_ci_low <- exp(ci_low)
ci_high = confint(m1)[2,2]
or_ci_high <- exp(ci_high)

apa_lm <- apa_print(m1)
 
kbl(apa_lm$table) %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

term	estimate	conf.int	statistic	p.value
Intercept	-2.09	[-2.51, -1.71]	-10.24	< .001
Condpos	1.22	[0.75, 1.72]	4.94	< .001

A logistic regression indicates that participants in the Positive Relationship Impact Condition were 3.39 times more likely to send the dominant message than those in the Negative Relationship Impact Condition (odds ratio; logit coefficient = 1.22; 95% CI = [2.11, 5.58]).

In probability terms, the share choosing the dominant message increased from 11.02% in the Negative Relationship Impact Condition to 29.55% in the Positive Relationship Impact Condition.

fig1 <- df_elg %>% 
  mutate(cond_char = ifelse(cond == "pos","Positive Relationship\nImpact Condition","Negative Relationship\nImpact Condition")) %>% 
  ggplot(aes(x = cond_char,y = choicedom)) +
  stat_summary(fun.data = "mean_cl_boot",
               size = 0.5,
               geom = "errorbar",
               width = 0.05,
               color = "#080807",
               position = position_nudge(0)) +
  stat_summary(fun = "mean",
               geom = "point",
               size = 2.3,
               fill = "black",
               color = "black",
               position = position_nudge(0)) +
  stat_summary(fun = "mean",
               shape = 1,
               geom = "point",
               color = "black",
               fill = "black",
               position = position_nudge(0)) +
  scale_y_continuous(limits = c(-.02,1.02),
                     breaks = seq(0,1,0.2),
                     labels = c("0%","20%","40%","60%","80%","100%")) +
  ylab("Share Who Sent\n the Dominant Message") +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        panel.grid.major.y = element_line(color = "grey80",
                                          linetype = "dashed"),
        axis.ticks = element_blank(),
        axis.line = element_line(color = "grey66"),
        axis.text.x = element_text(color = "black",
                                   face = "bold",
                                   size = 12),
        axis.text.y = element_text(color = "grey30",
                                   size = 10),
        axis.title.y = element_text(color = "black",
                                   face = "bold",
                                   size = 12),
        axis.title.x = element_blank(),
        legend.position = "none",
        title = element_text(color = "black",
                             size = 12,
                             face = "bold"))

#png("treatment_effect.png",width = 360,height = 320,units = "px")
fig1

Secondary DV

To increase variance and offer potential nuance to the treatment effect, I created a continuous dependent variable as well: preference for selected message. After selecting the message, participants indicated the extent to which they preferred the message they selected (1 = Slightly preferred to 3 = Strongly preferred). Later, I coded this response, in combination with the message selection response, as a 6-point dominance scale (1 = Strongly preferred the non-dominant message to 6 = Strongly preferred the dominant message).

Descriptives

df_elg %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(mean(pref_cont,na.rm = T),2),
            SD = round(sd(pref_cont,na.rm = T),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

cond	N	Mean	SD
pos	247	2.54	1.63
neg	245	1.83	1.18

Two-sample t-test

m <- t.test(pref_cont ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]

t(447.92) = 5.51, p = 0, Lower CI = 0.45, Upper CI = 0.96, d = 0.52.

Results were consistent when using a continuous 6-point preference scale as the outcome rather than a simple binary choice.

Robustness check

To make sure that the treatment effect is not driven by expected compliance, we also asked participants to indicate how much of the task they believe the employee will complete if they were to receive the dominant message (0 = minimum score on the task to 50 = maximum score on the task). This, after all, could be the main motivator for message selection because it directly impacts participants’ bonus.

Descriptives

df_elg %>% 
  group_by(cond) %>% 
  summarise(N = n(),
            Mean = round(mean(pred_comp,na.rm = T),2),
            SD = round(sd(pred_comp,na.rm = T),2)) %>% 
  ungroup() %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

cond	N	Mean	SD
pos	247	37.06	9.54
neg	245	34.48	10.46

Two-sample t-test

m <- t.test(pred_comp ~ cond,data = df_elg)
d_mod <- cohens_d(m)
d = d_mod[1,1]

t(485.15) = 2.86, p = 0.004, Lower CI = 0.81, Upper CI = 4.35, d = 0.26.

Indeed, there is a treatment effect of condition on expected compliance. Let’s add it as a control variable to the logistic regression model.

Logistic Regression

m2 <- glm(choicedom ~ cond + pred_comp, family = binomial(link = "logit"),df_elg %>% mutate(cond = factor(cond,levels = c("neg","pos"))))

ci_low = confint(m2)[2,1]
or_ci_low <- exp(ci_low)
ci_high = confint(m2)[2,2]
or_ci_high <- exp(ci_high)

apa_lm <- apa_print(m2)
 
kbl(apa_lm$table) %>% 
  kable_styling(bootstrap_options = "hover",
                full_width = F,
                position = "left")

term	estimate	conf.int	statistic	p.value
Intercept	-4.08	[-5.26, -3.02]	-7.16	< .001
Condpos	1.14	[0.66, 1.64]	4.53	< .001
Pred comp	0.05	[0.03, 0.08]	3.94	< .001

Controlling for predicted compliance, the condition effect on message selection remained strong and statistically significant. In the adjusted model, participants in the Positive Relationship Impact Condition were 3.12 times more likely to send the dominant message than those in the Negative Relationship Impact Condition (odds ratio; logit coefficient = 1.14; 95% CI = [1.93, 5.18]).

Summary & Takeaways

The reflection manipulation successfully shifted relational expectations about how an employee might respond to a dominant message.
This shift led to a substantially higher likelihood of sending the dominant message, with participants in the Positive Relationship Impact Condition 3.39 times more likely to select it than those in the Negative Relationship Impact Condition.
The effect remained robust when controlling for expected task compliance and when using a continuous dominance-preference scale as the outcome.
This analysis illustrates a typical A/B testing workflow with a binary behavioral outcome, including treatment checking, effect estimation, visualization, and robustness checks.
The results highlight how framing leaders’ expectations about relational consequences can causally shift communication choices, even when financial incentives are held constant.

Dominance Experiment

Analysis

Dean Baltiansky

Background

Design Overview

Participants

Treatment

Treatment check

(1) Self-report item

(2) Lexical valence analysis

(3) Word-count

Primary DV

Descriptives

Logistic Regression

Secondary DV

Descriptives

Two-sample t-test

Robustness check

Descriptives

Two-sample t-test

Logistic Regression

Summary & Takeaways