نبذة مختصرة : Today, an increased use of social networks, IoT and other different devices generates massive amounts of heterogeneous high velocity data, otherwise known as Big Data. Big Data is a term that describes the fundamental 3V paradigm: volume, velocity and variety. To meet an increasing demand for Big Data organizations having difficulties coming up with the efficient solution to store and process high volumes of low-density semi-structured and unstructured data. Therefore, the main goal of this paper is to analyze data storages that can handle the requirements of Big Data and offer real-time Big Data pipelines based on Lambda and Kappa architectures. The analysis of storages compatibility with semi-structured, unstructured repetitive and non-repetitive data, horizontal and vertical scaling and Kappa, Lambda architectures lead to the conclusions that old technologies and strategies aren’t enough to store and process semi-structured and unstructured Big Data, hence new platforms and technologies should be used instead. Also, NoSQL and Data Lakes proved to be a good solution for storing unstructured data, whereas relation data model storages have to be integrated with different technologies for storing unstructured data efficiently. Lastly, the developed Kappa and Lambda real-time Big Data processing pipelines were tested only with semi-structured data due to limited data sources of organization. However, the Kappa real-time Big Data pipeline theoretically is fully compatible with unstructured data due to the storages that have been selected.
No Comments.