Graph theory based ranking

When analyzing the network of articles, where the articles serve as nodes and the links as edges, graph theory provides a powerful tool for identifying those that play a central role in navigation. Using a graph theory-based metric, we evaluate an article’s importance by examining how often it appears in the shortest paths connecting all possible source and target pairs. To ensure fairness, we normalize these counts by the total number of shortest paths for each pair, preventing any bias toward articles that dominate only specific routes.

Interesting! This shows that topics tied to powerful countries (like the United States and the United Kingdom), vast regions (such as Europe and Africa), and major historical events (like World War II) are riding high in the rankings. We also saw this reflected in the number of incoming hyperlinks these articles receive, which is to be expected, as their centrality and broad relevance make them key connectors in the network. These geopolitical, regional, and historical themes are therefore proven to be the backbone of the network, acting as key links between all kinds of different topics.

Played path based scoring

Graph theory alone doesn’t capture the full picture. To understand which articles are most valuable in carrying cowboys to their targets, we analyze the trails they actually ride on; the played paths. The metrics we introduce reflect what articles tend to lead players most effectively to their destination, derived directly from their navigational choices and the strategies they employ.

Wikispeedia challenges cowboys with a dual objective: to reach the target as quickly as possible or to do so while taking the shortest route. While shorter trails are often faster, that’s not always the case. Sometimes, a longer trail across familiar terrain, articles more widely known among players, can lead to quicker success. To capture these dynamics, we separately calculate scores that reward both minimal path length (fewer clicks) and minimal path duration (faster navigation).

Scores Rewarding Minimal Clicks

Path weights are our first metric for determining trail efficiency. These weights measure how closely a cowboy’s chosen trail matches the shortest possible path by calculating the ratio of the optimal distance to the simplified trail length. The closer a weight is to 1 the more efficient a path is. From these path scores, we derive two types of article scores:

Average of the Weights: Average weight of the paths an article appears in.
Sum of Centered Weights: This score centers the path weights around 0 by substracting their mean weight, then sums these centered weights for all paths were theh article appers.

One might think that these two scores should basically be the same, however, this is not the case. The distinction is important: articles with the highest average weights are those that reliably appear in near-optimal paths, representing article quality. On the other hand, articles with the highest sum of centered weights can just have above average weights, but appear a lot more frequently, reflecting article utility.

While the weight scores capture efficiency, they don’t tell the whole story. Two additional metrics help us assess how articles affect navigation in other ways. The Detour Ratio Score measures the proportion of times an article appears in detours, parts of the trail that required backtracking, compared to its total appearances in finished paths. Additionally, the Unfinished Ratio Score tracks how an article shows up in incomplete paths versus all paths. In both cases, a high score suggests the article might mislead cowboys, nudging them onto less effective paths.

Finally, we bring it all together with Composite Scores , combining one of the weight-based scores with the detour and unfinished ratio scores. This weighted sum prioritizes the weight score while giving smaller emphasis to the two ratio scores, providing a sophisticated measure of an article’s performance as a guide on the trails. Below we present the top 5 articles with the largest composite quality scores (weighted sum of average weight, detour ratio, and unfinished ratio after scaling) and utility scores (weighted sum of sum of centered weight, detour ratio, and unfinished ratio after scaling).

Geography rides tall in the saddle once again, partner. For the utility score, nearly every top-ranking article is tied to geographical themes. From sprawling regions to powerful nations, these landmarks continue to steer cowboys across the vast network. On the other hand, the quality score highlights seemingly more random articles like "Tennis" and "Harry Potter". We suspect these to be precise waypoints, less common but crucial stops for cowboys who knew exactly where they were heading. There is a very clear difference in the number of appearances of the top articles between the two scores, further emphasizing the distinction between the average and sum of centered weight scores, as we discussed above

Scores Rewarding Fast Path Completion

Just as article weights rely on path distances, article times can only be determined after calculating the times for all trails. Instead of simply using the raw times taken to complete a path, we compute adjusted path times that factor in the optimal distance of each trail, scaling the times accordingly. Specifically, since the most common start-target distance is 3, times for other distances are scaled to align with the median time of distance 3. This ensures that differences in path times caused by varying levels of difficulty, reflected in differing path distances, are accurately accounted for.

Using these adjusted path times, we calculate adjusted article times, mirroring the two methods used for article weight scoring. As before, the distinction between these two scores reflects the balance between quality and utility of an article to the players.

Average of the Adjusted Times: Represents the average adjusted time for each article
Sum of Centered Adjusted Times: This score centers the adjusted path times by substrating their mean, then sums these centered values for all paths where the article appears.

It’s interesting to see how the top articles for the two time scores highlight different roles in the journey. For the centered sum score, the standout articles are those that appear frequently on well-traveled trails, like "United States" and "Europe." These are followed by others with high usage, such as "Periodic Table" and "Presidents of the US." Their frequent presence on many trails reflects their utility, making them dependable companions for players navigating diverse paths. On the other hand, the average time scores reveal a different kind of value. Articles like "Great Lakes" and "Harry Potter" may not appear on as many trails, but when they do, the journey tends to be efficient, with players reaching their destinations quickly. This contrast illustrates the dual nature of article value again. Some are reliable workhorses, widely used and essential for many routes, while others offer high-quality, efficient contributions in specific situations. Together, these scores provide a nuanced view of how articles support the player’s pathfinding experience.

Given the clear parallels between the click-based utility and quality scores and the time-based metrics, specifically, the weighted sum of times and the average time scores, it’s worth examining these connections more closely. Therfore, we compare them in the plots below.

Make sure to use the scrollbar, to understand the meaning of the dots' colors! It is also important to mention that the scores are standarized and computed in a way that bigger is better. So, for example, a large average adjusted time scores reflects a fast completion time.

The metnioned parallels turn out to be quite clear correlations. It’s not surprising that completing a path in fewer steps often results in faster times. What’s particularly interesting, however, is the strong correlation between the composite quality score (based primarily on the average weight score) and the average time scores. Similarly, the utility composite score, which is mainly driven by the sum of centered weight scores, aligns well with the sum of centered time scores. This highlights how both quality and utility contribute consistently to the efficiency of a player's journey.