Before diving into the correlations, let’s first clarify the more nuanced attributes we consider:
The two plots below show the correlation of our article attributes with our two composite scores. The first plot shows scores that emphasize a minimal number of clicks, whicle the second shows scores that emphasize a short path completion time. None of the attributes based on in-article features correlated heavily with either of our two scores. However, graph-based attributes correlated well with utility across both composite scores. The failure of in-article attributes to correlate with our article metrics implies that there is no such thing as a "good" article, as derived from its textual attributes. Instead, the only features of a given article which we can make generalizable statements on are those removed from the text itself: the graph attributes. This begs the question: what is to blame: our attributes or the dataset? In order to get a firmer understanding of our attributes' shortcomings, we decided to create our own dataset of traversed paths.
In the Wild West, it was common that disputes were settled with a battle of reflexes: fast draw. Anecdotally, this is the way that many people play the Wikispeedia game. These players don't pour over articles for outgoing links, and they certainly don't take the time to read different articles. Instead, these players rely on their own assumptions about the structure of the graph. When choosing a next article, they don't necessarily vocalize a strategy of navigating to a larger hub and then incrementally refining their article choice: they simply choose the next article that is most similar to their target article. Our current player-made paths dataset does have a few of these sharpshooters, but they are lost in the noise. What if we could play the Wikispeedia game like these players? Would our attributes correlate any differently on these kinds of paths?
The first step is to define our strategy for selecting the next article given a current article and a target article. We could use the categories and their tiers. However, we run into problems if we just use the binary classifications provided to us by the categories. There is no clear way to evaluate distance between different completely category types. Thankfully, due to our embeddings, we already have a numeric representation of semantic distance! This makes our algorithm for playing Wikispeedia much simpler.
For each start, end in paths:
current_article = start
while current_article != end:
Out of the possible next articles, choose the article that satisfies the following conditions:
1. we have not visited
2. has the smallest cosine distance to the end of the path
If no articles satisfy this, discard the path.
Using this algorithm, we can play any Wikispeedia path we want! In order to make comparing the correlation of our attributes to our prospective metrics, we will use this algorithm to play the same filtered paths as before. After playing those paths, we can compute corresponding utility and quality scores for the articles in our bot-generated paths. Before computing these scores, we filter our bot-generated paths in the same way we filtered the player-made paths: downsampling and IQR filtering. Note that the utility and quality in this case are just the sum of centered weights and average weight scores, as detailed above. Time and composite scores are not relevant in this case.
Yee-Haw! Looks like this quick-draw way of playing is a much better fit for our attributes! The graph-based attributes especially correlate with our scores generated on these types of paths. One possible interpretation of this change in the correlation data is that specifying a strategy for the Wikispeedia game results in more consistent data. That is to say, the player-made paths (even our filtered version) are made up of many different players each playing in their own way with their own strategies. Our increased correlation after generating paths for just one strategy supports this understanding of the dataset.
At the same time, the implication that article-specific attributes are more relevant only when a single player archetype plays the game implies that no such thing as an universally good article exists. One might think of a "good" article in terms of a specific path: the idea that you can only evaluate an article given a goal. However, this increase in correlation reveals there is something else that makes a "good" article: the player. What a player would consider good about an article differs from player to player. If a player prefers to take their time and read through all of the different out-links, then link density is best. If a player is not necessarily playing to win but rather explore Wikipedia, then the number of characters is a desirable attribute. After all, Wikispeedia is a game, and people play games to have fun. What one person considers to be "fun" may be entirely different from another person.
At the end of the day: there are no "good" or "bad" (or even "ugly") horses: just cowboys riding the trail together.