Wikipedia:Wikipedia Signpost/2022-06-26/Essay

Essay

RfA trend line haruspicy: fact or fancy?

This user essay originally titled "RFA trend lines" was started in 2020. You may edit it, but please do so on the original page and not The Signpost. – E

Trends in support percentage during a request for adminship are rarely informative, and these trends are difficult to interpret even when they might be informative.

As a first order approximation, let's assume there's an RfA where no new information comes to light over the course of the request and everyone !votes independent of each other. In this case, if we were to poll every Wikipedian, there would be some global, unobserved support percentage for the population; call it p. Given an RfA with n participants, each !vote in an RfA can be considered a Bernoulli trial with probability p. The number of supports, s, at any given time can be simulated by combining the results of multiple Bernoulli trials; this can be modeled as a binomial distribution of n trials and probability p.

RfAs run for multiple days and are among the most attended discussions on the project; this suggests that the final support percentage is a reliable stand-in for the population support percentage. By contrast, the trend line tells us almost nothing and may in fact be misleading. Our binomial model is the same we would use to model the ratio of heads to tails in successive coin flips. Imagine we are going to flip a coin for a contest and we want to prove that the coin we are flipping is fair. We flip it 150 times and track the number and order of heads and tails. After 150 coin flips, the ratio of heads to tails would be very informative: if it is far away from a 50% split then the coin is not fair. The order these flips occur in, however, is uninformative, and in fact, using it as evidence for an argument is logical fallacy known as the gambler's fallacy.

“

If, however, there is a fairly steady state until a piece of evidence comes out, and there is a massive shift, that is useful information. The trend isn't important, and it should not be the reason for success/no success, but it can be a useful tool for identifying when such an inflection point has happened.

”

— Amorymeltzer 25 February 2020

Our first order approximation of RfA trend lines represents a hypothesis regarding !voting behavior. Absent evidence to the contrary, we assume editors review the candidate and comment independently of others just like the result of a coin flip does not depend on prior results. But an RfA is not a series of independent tests. The amount of information available to a !voter includes not only other comments, but new question answers, and summary statistics like current support percentage. These can consciously or unconsciously affect how a participant !votes and justifies an alternate hypothesis: each !vote is related to the ones that came before it (and maybe even after it). If the population support percentage, p, doesn't change then this distinction is immaterial to our model.

Reconsider the coin flip example: if the probability of getting heads depends on the previous result such that getting a heads changes the probability from 50% to 50% (i.e., no change), then the dependent model and independent model will produce the exact same results. Differences only arise if the dependence changes the underlying probability. In statistical terms, we can say that the binomial distribution is robust against violations of the independence assumption as long as the sample size is much smaller than the population. For example, let's assume that getting a heads increased the likelihood of getting another heads. In that situation our independent trial model will be accurate at first but get more inaccurate as we have more trials since the non-independence will keep compounding making heads more and more likely. Bringing this back to RfA, the influence of prior votes on later ones is not a serious threat to the binomial (independent trial) model. It would only affect our model if there were thousands of !voters or if there was a major shift in the underlying probability.

Editors look at trend lines because they believe that (or want to evaluate whether) earlier votes influenced later ones to such an extent that a major shift occurred in the underlying probability. considering how !votes are non-independent, this intuition makes sense but is flawed. Essentially, this is a model selection problem, and the starting assumption ought to be the null hypothesis. As discussed above, this means that without evidence, we should assume that the order of !votes is not meaningful, just like the order of coin flips. Claiming that a coin is unfair because of the order of heads and tails is fallacious, so we cannot reject the null hypothesis on the basis of the trend line alone; we need some other kind of evidence. What is critical to understand in the context of RfA is that the trend line cannot tell us whether a change in the underlying support percentage occurred; they are only useful if we already assume that happened and even then can only help us determine when.

Like any hypothesis testing tool, a trend line is only useful if we already have a hypothesis. Unless there is an independent reason to believe the information available to participants has changed, the trend line is most likely to reflect randomness in the sample rather than a meaningful pattern. Without a rational argument as to why early !voters did not have the same information as late !voters, an argument from trend-line data is weak.

Example

A simulated RfA with 150 !votes. Can you tell where the underlying support percentage changed?

The accompanying image shows a trend line for the support percentage in a simulated RfA which ended within the discretionary range. It is a series of 150 Bernoulli trials, but at some point the underlying probability of support changed from just above the 75% threshold for an outright pass (76 percent) to well below the 65% threshold for outright fail (60 percent). The location at which this change occurred is difficult to determine from the trend line alone, and in fact the graph looks like other simulations where the underlying support percentage was above the discretionary range the entire time. The change in probability occurred after the 90th !vote, and despite that change, there is little evidence in the trend line alone to substantiate that. These simulations can be replicated (in spirit, since it's a random simulation) using the following R code:

# Config variables
N = 150 # How many !votes to simulate
switchPoint = 90 # At what vote should the probability switch
p.start = 0.76 # Probability of support before switchPoint
p.end = 0.6 # Probability of support after switch point

# Data lists
voteList = c()
meanSeries = c()

# Simulation
for(i in 1:N) {
  if ( i < switchPoint ) {
    p = p.start
  } else {
    p = p.end
  }
  voteList[i] = rbinom(1,1,p)
  meanSeries[i] = mean(voteList)
}

# Plot the result
plot(1:150,meanSeries,xlab='!vote number',ylab='Support percentage',type='l')

In this issue

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

A better way of looking at RFA trends is to look at some of the RFAs that have spectacularly changed direction. Usually these are negative trends - someone comes up with a reason to oppose the candidate that gets traction among the !voting community and the RFA changes direction. Usually these are very obvious when you read the subsequent vote rationales. Sometimes it happens becasue of a mistake the candidate or their nominator made during the RFA - one of the classics being when the nominator picked up the wrong laptop and made a comment while logged in as his nominee/girlfriend. Othertimes it happens when one of the few people who actually review the candidate's edits spots a problem and details it in their oppose. The minority of RFA participants who actually spend an hour or so checking the candidate's edits have a huge influence on RFAs. Ϣere SpielChequers 15:24, 30 June 2022 (UTC)[reply]

It seems to me that there is an error here somewhere in the argumentation re independence of samples. Assuming the value of each new sample in this process is partly dependent on the previous sample, then there is no requirement for the population mean to be affected to get biased results - we are not 'selecting samples of a certain value' from the pool, but 'modifying a drawn sample post selection'; 'p' doesn't seem to come into it. - Maybe a better approach to analysing this series would be treatment as an autocorrelated time-series. That would allow identification of inflection points, with some estimate of confidence. --Elmidae (talk · contribs) 15:27, 2 July 2022 (UTC)[reply]
- I expect that an autocorrelated time-series would still assume that the population of people who !vote in the first day, the second day and so on are roughly identical. But there is a confounding variable in RfA trend lines: the populations of strongest support !vote first. For instance, if I see the candidate's name every week, almost always in a context where they are creating good content or making comments that I agree with, then I'm going to vote immediately; if I see it in a more negative context then I might take more time to decide or take a while to build up my oppose rationale. If I've watchlisted the candidate's talk page then I will see the RfA quicker.
  The message to me is: trend line "haruspicy" is doomed to fail. We should just take the tallies as they are and crats should decide consensus on factors other than trend lines in cases where the tally isn't clear enough. (The exception is if there's a clear information gap e.g. undisclosed COI editing by the candidate was only unearthed on day 6.) And all of this trend line analysis adds to the excruciating pressure and overanalysis that discourages prospective RfA candidates. — Bilorv (talk) 13:53, 13 July 2022 (UTC)[reply]

It's your Signpost. You can help us.

Home

About