everisMoriarty at the TDWI 2017 conference
Julio 3, 2017
Containerized BigData in Mesos and Kubernetes
Julio 26, 2017

The scientific publication WIREs Data Mining and Knowledge Discovery accepts a scientific paper from the CADC

It concerns the article “Evaluation and Comparison of Open Source Software Suites for Data Mining and Knowledge Discovery” written by Miguel Angel Vallejo, CADC data scientist, in collaboration with other international researchers. WIREs Data Mining and Knowledge Discovery publication from Wiley is a reference in the scientific field with an impact factor of 1,579.

This study aims to evaluate 19 open source data mining tools and provide companies and the scientific community with an extensive study, based on a wide set of characteristics that any data mining tool should possess, either from a subjective (tool comparison) or objective (with or without a specific characteristic) point of view.

The results show that RapidMiner, KNIME, and WEKA are the tools that have a greater number of these characteristics.

The growing interest in the extraction of useful knowledge from data favors the emergence of several data mining tools. The scientific community is aware of the importance that open source data mining software has in facilitating the diffusion of new algorithms. The availability of these free-of-charge tools, along with the possibility of understanding the different approaches by examining the source code, provides a great opportunity to refine and improve algorithms.

Value and applications for companies and the scientific community:

Value

  • Publicizing no-cost and open source tools.
  • Knowing what functionalities these types of tools can provide.
  • Knowing the functionalities and performance of each tool.
  • Verifying the “last update date” (it can be seen if the project is still alive) and license type.

Applications

  • Selecting the best tool for a PoC or project. This helps avoiding the cost of having to change tool with the project in progress, since it does not have certain characteristics that are needed.
  • Substituting the use of other paid tools for free ones that have the same functionalities and good performance.
  • Creating training plans on the most important tools and determining the groups that can benefit from being trained on each tool, according to their characteristics and the functions of the employees.
  • Finding weaknesses in the set of free applications to add them as value in our business tools (such as eMoriarty).
  • If a functionality that any of these tools may have is needed, it is possible to look for tools that possess it and study their source code.