Automating over the data skills gap

data-scientistA few weeks ago I chaired a session at EdTechX in London on the skills gap and what might be the crucial skills for the future.  Certainly one of the skills of the present is in data science, with a recent report from Crowd Flower revealing the ongoing shortage in people with data skills.  It revealed that a staggering 83% of respondents were struggling to find people to fill vacancies in data science related roles.

Of course, this is not a new trend, with Gartner highlighting the issue way back in 2012, but the Crowd Flower data suggests things are getting worse, not better.

Automating the problem away

To try and rectify matters, DARPA have recently setup a new program called Data-Driven Discovery of Models (D3M).  D3M aims to help in the development of automated means of crossing the data skills gap and allow non-experts to develop their own complex models.  They will be empowered to do this via a significant level of automation of the back-end work behind such algorithms.

In many ways, therefore, it’s doing for data science what WYSIWYG editors did for web development 20 years ago and visual programming environments have done for coding.  DARPA believe that it will be akin to allowing relative novices to behave like virtual data scientists.

“We have an urgent need to develop machine-based modeling for users with no data-science background. We believe it’s possible to automate certain aspects of data science, and specifically to have machines learn from prior example how to construct new models,” DARPA say.

The overall aim is to open up to non-specialists the ability to create complex empirical models in areas where they have subject matter expertise but little in the way of data science capabilities.

“This capability will enable subject matter experts to create empirical models without the need for data scientists, and will increase the productivity of expert data scientists via automation. The automated model discovery systems developed by the D3M Program will be tested on real-world problems that will progressively get harder during the course of the program. Toward the end of the program, D3M will target problems that are both unsolved and underspecified in terms of data and instances of outcomes available for modeling,” they conclude.



Leave a Reply

Your email address will not be published. Required fields are marked *