Fentech

Evaluation of Big Data platforms

A few years ago, many companies tried to implement Big Data platforms. These platforms are generally intended to support process improvement within these organizations by processing a huge amount of data in a reasonably short period of time. Today, several years have passed since the launch of these Big Data platforms.

Today, several years have passed since the launch of these Big Data platforms, making it necessary to take stock of the exact status of these projects. However, due to the complexity of these projects and the lack of maturity in their management, project managers are faced with a series of unprecedented issues: how to define the various metrics to evaluate a platform of this scale, how to monitor and control the rapid evolution of the technologies used in these projects, and how to identify the obstacles, opportunities, and improvements related to the ongoing development of the project. FenTech proposes to address these issues through a series of articles, published on a regular basis on our website.

Scissors Effect

Big Data analysis methods involve three areas of knowledge: Business, Analysis and Technology. In the first area, business managers, department heads, etc., must define business objectives to carry out data analysis, which will then be used as the basis for a precise project architecture, allowing them to define the methods and tools for collecting, storing, analyzing and finally visualizing the data needed to achieve their business objectives. 

Finally, the business leaders in collaboration with the IT department must make technological choices that can support the development of the project in terms of infrastructure, allowing distributed storage and parallel processing of massive structured and unstructured data. But as the years go by, Big Data technologies keep changing and evolving with increasing speed, and it becomes necessary for Big Data project managers to review their technology choices in order to replace those that are becoming obsolete.For example, in order to overcome the major drawbacks of Hadoop MapReduce, several companies have turned to Spark technology, which happens to be an excellent alternative to MapReduce. 

For example, Spark allows for real-time stream processing, unlike MapReduce, which offers it the ability to work in Batch only. Thus, if a company does not put in place a system that will allow it to evaluate the various components of its Big Data project on a continuous basis, it will be difficult for it to evolve the various technologies that support Big Data processing.

The challenges of Big Data

Before presenting this second part of the article, we will first explain the three major characteristics of Big Data, which is a colossal set of data, very difficult, if not impossible to process with traditional data management tools. These three characteristics, also called the three (3) V’s of Big Data are Volume, Speed and Variety:

    A considerable volume requiring processing tools as well as an adapted infrastructure.
  • A significant speed of data creation that requires advanced technologies to collect, process and share this data in real time.
  • A variety of data collected as well as their sources, defying the traditional means of data processing whose structure is clear and easy to manage.

Now that we know the principles of Big Data, it becomes easier to understand the challenges related to Big Data platforms:

  • The absence or inadequacy of the objectives formulated at the launch of the project
  • The lack of skills required in the various sectors of Big Data
  • The exponential increase in the size of the data
  • The undefined structure of the data
  • The difficulty of imposing data consistency
  • Data security
  • Finding and applying good data management and integration practices
  • Data veracity (How to manage uncertainty, imprecision, missing information…)
  • Technical challenges for big data analysis.

Audit of Big Data platforms

In recent years, the exploitation of large amounts of data to generate value has become one of the main technological and strategic priorities of organizations. Concepts such as RDBMS (Relational Database Management Systems), BI (Business Intelligence) or data warehouses are not new. However, there are different aspects, notably the 3 V’s of Big Data, which make Big Data projects unique, and which increase the risk factor linked to the success of this type of project. 

For example, the obsolescence of certain technologies, or their replacement by other more efficient or even less expensive technologies, can lead project managers to review the application architecture set up at the start of the project. This implies that a complete evaluation of the technologies used must be done, as well as a comparison between these and the technologies that could potentially replace them.

Another example is the scalability of the system. Indeed, it is possible that the volume of data stored and/or processed during the different steps of the process linked to the Big Data platform increases very quickly, sometimes exponentially. If this happens, the project’s infrastructure will be weakened: incidents may occur during the use of the platform, the severity of which may vary according to the strength of the technical architecture implemented, and its adequacy with the needs in terms of data processing: that is to say, from simple slowdowns of the platform to the total shutdown of the solution. In view of the above, a particular interest must be taken in auditing and evaluating Big Data platforms, in order to propose techniques, tools and methodologies that will enable organizations that have already implemented Big Data projects to evolve these platforms. 

In the next few articles, we are going to go deeper into the different aspects of Big Data, especially those related to the different application and technical architectures used in a Big Data project as well as to the optimizations that could be brought to them. We will then rely on a selective literature review in order to present the multiple scientific advances in the field.

You can now also listen to our new short format podcasts here or on Spotify. 

Suivez nous sur nos réseaux sociaux !