The Algorithm That Automatically Creates Wikibooks

Wikibooks are fascinating things. At the time of writing, there are over 3,000 of them, and they aim to pull together content from Wikipedia on a wide range of topics to provide the reader with a comprehensive overview of a particular field.

Whilst some are mere pamphlets, others, such as the Machine Learning book come in at several thousand pages. It’s hard enough reading such a tome, but compiling it is equally difficult, as you’d expect for a book with over 550 chapters.

Researchers from Ben-Gurion University of the Negev believe they’ve found an answer. In a recently published paper, they describe their AI-driven approach to automating the generation of Wikibooks.

Automating Wikibooks

There are nearly 7,000 Wikibooks already on the market. These books were made available by Wikipedia for such research on account of their quality, and so each had been viewed at least 1,000 times. After applying additional filters for things such as length, they were left with over 400 to train the algorithm, with the remainder used for testing.

The task of creating the book was then split into sections, each requiring a unique skill. For instance, a title would be required, followed by a concept and so on, before articles are chosen that best fit certain sections.

The best articles were selected based upon their relative levels of popularity in terms of other articles linking to them. This was then built upon by comparing the network structure of the Wikibook to that of human-generated books. If including an article in the book made it more like a human-generated book, the article was included, but if it made it less like it, it was culled.

The team then set about organizing the articles into chapters. They did this by looking for clusters within the network of articles to see whether there were clear thematic similarities. Last, but not least, the researchers determined the order the articles appeared within each of these chapters by putting each article into a pair, and using network models to decide which would appear first.

Put to the test

So how did the AI do? Well, they were able to produce automated versions of a Wikibook that was pretty similar to ones that had already been created by human curators. These books contained much of the same material, and were in much the same order.

The next step is to try and scale up the project and produce a number of Wikibooks in areas that have so far not had books produced. They’ll then put these out into the market and test the response from readers in terms of page views and the number of edits to gauge their quality and popularity.

It’s a fascinating project, and it will be well worth keeping an eye on the books that come out of it to see just how good the algorithm is. What is unknown at this stage is whether the books will be marked as having been produced by AI, and whether this would influence the readership numbers. Either way, it’s worth keeping an eye out for.