Big Data and it’s 5 V’s
What is Big Data?
Big data can be defined as the huge amount of data available at various sources in different complexities, generated at different velocities and with different degrees of ambiguity, that cannot be processed with traditional technologies and processing methods. It is the same as the dataset used for data analysis, machine learning, etc. The only difference is the quantity. This data is collected from all the available resources, combined, and then stored as a single source which increases the complexity of the dataset. Hence, there is a need to use advanced tools and techniques to handle big data.
The amount of data keeps growing exponentially, and it is impossible to work on it using a single computer. Hence, there are many tools developed to handle this big data. Some of them are:
- Apache Hadoop(https://hadoop.apache.org/)
- Atlas.ti(https://prf.hn/l/6lo4Nnx)
- Apache Storm(https://storm.apache.org/) and many more
Importance of Big Data
Even though it is difficult to handle with traditional techniques, it plays an important role in predicting the future, making decisions, sales analysis, etc for any organization or domain. The huge amount of data and features allow the model to extract every property of the data and ultimately help in making proper and efficient decisions that help the organization.
It allows businesses to understand customer feedback and make suitable changes accordingly. Big data guarantee the proper functioning of an organization, manages the supply chain, insights into data patterns, and makes more data-driven decisions for any business or organization.
Types of Big data
As we know, big data is nothing but a huge amount of data. This data can be in different types of formats. These formats are discussed below.
Structured
A structured form of data is a type of data that is available in a well-defined format. This is a type of data format that can be retrieved, accessed, processed, and stored in a specific given format. This is also termed relational data. This type of data is stored in a table(rows and columns) format. These types of data are easy to handle and work upon as it requires less preprocessing. To access the data, we can use SQL(Structured query language). This is a language used for handling and managing structured data.
Example-
Unstructured
As the name suggests, the data does not have any well-defined format. This is the opposite of the structured data format. The data present in such type of formats does not adhere to any specific format. The contents of the data can be in the format of images, videos, audio, files, etc.
Semi-structured
This type of data has properties that are a combination of structured and unstructured data types. The data stored has a structure but is not bound to a particular format. Unlike relational data types, these are not stored in a table format. As it does not need structured query language, it is also called NoSQL data.
5 V’s of Big Data
The 5 V’s are nothing but the characteristics of the big data. These characteristics are listed below.
Volume
As the word suggests, this characteristic deals with the amount of data present in the dataset. That indicates whether a given data can be termed as big data or not. In simple words, the volume or the amount of the data present must be huge to perform the analysis and determine the results and output with the utmost accuracy.
Velocity
This refers to the speed of gathering data. In big data analysis, the data moves from various sources like mobiles, networks, servers, etc. The rate at which the data gets collected at a location is given by velocity. The companies use these gathered data to deduce some information.
Variety
This characteristic refers to the availability of heterogeneous data. That means that big data must have data in structured, unstructured, and semi-structured data formats. The dataset must constitute the data that is fetched from different reliable sources.
Veracity
This characteristic deals with the quality of data present in the dataset. As we are aware that big data means a huge amount of data, hence there are chances of some errors or wrong values. It is important to assure that the source of data is trustworthy. This helps in increasing the accuracy of the results after the analysis.
Value
This is the most important V amongst all five of them. The characteristics deal with the quality of the data present in the dataset. Whatever be the quantity of the data, if it does not bring any value then it of no use. The false data present will lead to errors in the output and noises in the data.
Sources of Big Data
There is a need for big data in almost every domain and industry. These sectors indirectly become the sources of big data. The correct use of this technology will lead to profit for the companies.
Some of the sectors which act as sources of big data are:
- Finance
- Healthcare
- Social Media
- Agriculture
- Transportation
Hope this help!
Cheers,
Sumedha Zaware