Inhalt

Chapter 1: Introduction

CHAPTER 1

INTRODUCTION

These days internet is being widely used than it was used a few years back. It has become a core part of our life. Billions of people are using social media and social networking every day all across the globe. Such a huge number of people generate a flood of data which have become quite complex to manage. Considering this enormous data, a term has been coined to represent it. This term is called Big Data. Big Data is the term coined to refer this huge amount of data. The concept of big data is fast spreading its arms all over the world.

In this chapter, we’ll discuss about the definition, categories and characteristics of big data.

INTRODUCTION TO BIG DATA

The process of storing and analysing data to make some sense for the organization is called big data. In simple terms, data which is very large in size and yet growing exponentially with time is called as Big data.

Fig 1.1: Big Data [1]

Big Data refers to the large volume of data which may be structured or unstructured and which make use of certain new technologies and techniques to handle it. An organized form of data is known as structured data while an unorganised form of data is known as unstructured data. Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, and updating and information privacy. There are five dimensions to big data known as Volume, Variety, Velocity and the recently added Veracity and Value. A consensual definition that states that "Big Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value". The data sets in big data are so large and complex that we cannot handle them using traditional application software. There are certain frameworks like Hadoop designed for processing big data. These techniques are also used to extract useful insights from data using predictive analysis, user behaviour, and analytics.

For any application that contains limited amount of data we normally use Sql/Postgresql/Oracle/MySQL, but what in case of large applications like Facebook, Google, YouTube? This data is so large and complex that none of the traditional data management system is able to store and process it.

Facebook generates 500+ TB data per day as people upload various images, videos, posts etc. Similarly sending text/multimedia messages, updating Facebook/WhatsApp status, comments etc. generates huge data. If we use traditional data processing applications (SQL/Oracle/MySQL) to handle it, it will lead to loss of efficiency. So in order to handle exponential growth of data, data analysis becomes a required task. To overcome this problem, we use Big data. Big data includes both structured and unstructured data.

Traditional data management systems and existing tools are facing difficulties to process such a big data. R is one of the main computing tools used in statistical education and research. It is also widely used for data analysis and numerical computing in scientific research.

WHERE DOES BIG DATA COME FROM?

Nächste Seite

Seite 1 /

Chapter 1: Introduction

Impressum