![](https://i2.wp.com/www.eurasiareview.com/wp-content/uploads/2018/10/c-53.jpg?resize=830%2C510)
Researcher in the KIDS research group from the University of Cordoba’s Department of Computer Science and Numerical Analysis were able to improve the models that predict several variables simultaneously based on the same set of input variables, thus reducing the size of data necessary for the forecast to be exact. One example of this is a method that predicts several parameters related to soil quality based on a set of variables such as crops planted, tillage and the use of pesticides.
“When you are dealing with a large volume of data, there are two solutions. You either increase computer performance, which is very expensive, or you reduce the quantity of information needed for the process to be done properly,” says researcher Sebastian Ventura, one of the authors of the research article.
When building a predictive model there are two issues that need to be dealt with: the number of variables that come into play and the number of examples entered into the system for the most reliable results. With the idea that less is more, the study has been able to reduce the number of examples, by eliminating those that are redundant or “noisy,” and that therefore do not contribute any useful information for the creation of a better predictive model.
As Oscar Reyes, the lead author of the research, points out “we have developed a technique that can tell you which set of examples you need so that the forecast is not only reliable but could even be better.” In some databases, of the 18 that were analyzed, they were able to reduce the amount of information by 80% without affecting the predictive performance, meaning that less than half the original data was used. All of this, says Reyes, “means saving energy and money in the building of a model, as less computing power is required.” In addition, it also means saving time, which is interesting for applications that work in real-time, since “it doesn’t make sense for a model to take half an hour to run if you need a prediction every five minutes.”
As pointed out by the authors of the research, these systems that predict several variables simultaneously (which could be related to one another), based on several variables -known as multi-output regression models,- are gaining more notable importance due to the wide range of applications that “could be analyzed under this paradigm of automatic learning,” such as for example those related to healthcare, water quality, cooling systems for buildings and environmental studies.
No comments:
Post a Comment