One of my first SwordSTEM articles was Is It Important To Weigh Target Values?. In this article I tried to correlate a fighter’s win percentage with their preference for scoring high or low value points. In the end it showed that there wasn’t really a correlation between fighters who hit higher value targets and those who won more.
The data was based on 10 events which had a large number of overlapping participants. But times have changed. I have some better ideas on how to test this, and a much, much bigger data set to work with.
A New Adventure
My new approach is to see how many matches would be overturned by flat scoring vs how many would retain the same winners. The methodology was simple:
- Look up how many times each fighter hit each other.
- Determine who got more hits in.
- Compare that against the actual match winner.
Which is all well and good, but how do I know which tournaments to select?
The advantage of working with the 10 events in the first article was that they all had similar rules, allowing them to be treated as the same. Now I am looking at a database of 128 events (541 tournaments), which means that there is a ton of variation.
It is important to filter out tournaments which don’t use weighted scoring, as you are going to see 100% correlation (+ table error) between the number of times hit and the match winners. My first thought was to use the highest point value seen in the tournament. But I know for a fact that scorekeepers make mistakes, and sometimes organizers add extra exchanges to correct things. A tournament with a few hundred 1-point exchanges, and a single 3 point exchange, is still a non-weighted tournament . And thus I moved to looking at the average number of points awarded in a tournament.
But what should I set the threshold at? I decided to sweep through and see at what point the number of tournaments drops off, and…
Why am I seeing about 50 tournaments with an average point value (on scoring exchanges) of less than 1 point? Turns out it is because there are several full afterblow tournaments which are not recording their exchanges correctly. Instead of inputting the proper point values they enter 0 points awarded when both fighters score the same value. So I try again only looking at clean hits, and things look much better.
You Haven’t Told Us Anything Useful Yet
Now that I know how I’m going to get my dataset, it’s time to mine some data.
Bam. Looks like there is very little change if we remove weighted scoring. If you disallow ties then you will get the same result 96% of the time. But does it matter how much you are weighing your tournaments? Surely a tournament which awards points from 1-4 will be more likely to overturn results than a tournament which only gives 1-2 points?
Effect of Point Spread
So we repeat the same experiment, while varying the threshold of which tournaments we include.
Everything seems somewhat stable until we get to an average exchange value of greater than 2.5. After that the weighting seems to become significantly more important. Or does it?
Around 2.5 points per exchange is the region where we get into really small sample sizes. With a few tournaments dominating the narrative the numbers certainly can’t be taken to be indicative of an overall trend. So the original observations stand.
Earlier I said that I didn’t want to use the maximum points awarded in a single exchange as a criteria to determine if a tournament was weighted or unweighted scoring. But we can use it as a point of comparison with this data, to make sure it is all sensible.
And it looks about the same. After 4 points we are getting into small sample size territory, and we start to see the number of ties increase as a result of our point flattening changes. But overall the same numbers check out.
This data shows fairly conclusively that weighted scoring really doesn’t affect the outcome of the match very much. Nineteen times out of twenty you will end up with the same overall result.
But is that even important? Even if the weighting doesn’t affect the match outcome it can still have significant impact on participant behavior. Looking back to Data Mining – SoCal Swordfight Longsword we can see significant changes in the targeting when point values were changed year over year. And this is minor, shuffling the point values within a fairly stable ruleset. Going from tournament to tournament I would expect to see even more variation (as soon as I figure out a good way to parse and wrap my head around it).
Because of its ability to change participant behavior, I still strongly advocate having weighted targets. But it has become very obvious by now that better fencers will get it done regardless of how you award the points.
Stuff for Nerds
What happens if I look at tournaments with really big score discrepancies.
Number of Eligible Tournaments:
|Average Point Theshold||All Scoring Exchanges||Clean Exchanges Only|
Number of Matches Affected By Flat Scoring
|Average Score Value||No Change||Tie||Different Result|
Number of Matches Affected By Flat Scoring
|Max Score Value Greater Than||No Change||Tie||Different Result|