As any cowboy needs his trusty steed, so does a brave wikispeedia adventurer need a good article. But what makes an article good? Or better than another? Can one even qunatify such metrics?

Defining article attributes

Like a good horse helping our valiant cowboy cross the expanse of the Mojave desert, a good article should be able to get you anywhere. The cowboy initially hypothesises that hyperlink density could one of the most important article attributes. The more articles you can get to, the better.

Hyperlink density was computed by dividing the number of hyperlinks by the total character length of the article. This helps weed out list articles that will naturally have a high hyperlink density. The graph below highlights the results of this analysis.

Similarly to hyperink density, an article's place on the graph of all articles in wikispeedia is a powerful tool that can help our cowboy choose his steed. The more an article is present on the shortest paths of two different articles, the better its chances of ensuring that our cowboy gets to his destination.

Such an attribute could be quantified by dividing the number of times an article appears in a shortest path between two articles by the number of shortest paths between any two articles. The results of this analysis are presented below.

The semantic distance of different article titles can be also taken into account. It is hypothesised that articles will generally have more hyperlinks to articles on topics of a similar theme. Article titles are represented by as vector embeddings. Dimensionality reduction and t-SNE are then employed to obtain a three dimensional graph representing the links between all articles.

But how can we translate horse to English and maths? The answer was given to us thousands of years ago by a Greek man named Euclid. Taking the United States as an example, one could take the euclidean distance of all of its semantic neighbours into account.

In spite of the harsh realities of the wild west, some cowboys remain attached to some of the finer and more sophisticated aspects of life. Such cowboyes might prefer choosing a steed with a rich and deep knowledge of the English language. To help our cowboy choose his steed, an article's vocabulary richness can be taken as a potential attribute.

Scoring articles

Our cowboy is in a hurry to protect his herd of cattle from marauding ravers! He does not have the time to look at all of this data and would prefer to have a single quantifiable metric to evaluate how good his steed is.

To do so, different article scores were defined.

Let's take a look at how these scores performed!

Hmm, our cowboy is not convinced by these scores as they do not seem to be correlated to one another whatsoever.

He decides to define his own composite score that takes into account the weighed average score, detour ratio and unfinished ratio scores. Being in a hurry, the cowboy chooses his own weights.

With this new composite score, the cowboy decides to take a look at the top horses before traversing the desert.

Interesting the cowboy thinks! He picks the (insert top article name) horse and rides off into the wild.

Hold on...

As the cowboy rides off into the waste, he comes to wonder if his composite score correlates with the attributes he had previously determined. That is to say, did he make the good score choice?