How big MNC’s like Google, Facebook, Instagram etc stores, manages and manipulate Thousands of Terabytes of data with High Speed and High Efficiency.
Data are characteristics or information, usually numerical, that are collected through observation. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum is a single value of a single variable.
To address the needs of handling complex variety of data we need a mechanism or engineering.It is needed to derive insights from complex and huge volumes of data.
WHAT IS BIG DATA?
The problem in data world where we have to keep the data but we don’t have space to keep it is known as Big Data.
Big data is large amount of data. It is a term used to describe data that is huge in amount and which keeps growing with time. This data can be used to track and mine information for analysis or research purpose.
It describes the large volume of data – both structured and unstructured; however the main focus is on unstructured data.
WHERE IS BIG DATA USED?
Big data is being used in industries that have high volume of unstructured data.
Facebook, Amazon, Microsoft, IBM all big companies are using Big Data.
It’s can also be used in smaller companies as the software is open source and can be installed on commodity hardware as well.
There are some restrictions faced by the traditional management systems that they cannot overcome, for instance, you cannot attach any file that has data with a size more than 25 MB. Hence, if you want to attach a file of size 150 MB, that becomes a part of Big Data.
5 V’s of Big Data:
Below are some facts about Big Data for some of the companies:
“40,000 search queries are performed on Google per second, i.e. 3.46 million searches a day.”
“Every minute, users send 31.25 million messages and watch 2.77 million videos on Facebook.”
“55 billion messages and 4.5 billion photos are sent each day on WhatsApp.”
“Walmart handles more than 1 million customer transactions every hour.”
“By 2025, the volume of digital data will increase to 163 zetabytes.”
Now, the question arises, What do the companies do with such huge volumes of data?
Well, these companies collect, store and analyze this data to draw business insights.
Example of Company “Google” using hadoop:
The way Google has managed the millions of gigabytes that it stores has inspired a range of other software projects, such as the open source Hadoop, which is specifically used to handle enormous (millions of GB) sets of data. Companies who need to process such volumes of data (such as pharma companies doing drug research) can use Amazon to store all that data and Hadoop to process it.
Hadoop is just one element of the continuing revolution in data management, just as G-Drive and Dropbox and others represent the consumer side.
Here is an example of how Facebook uses its big data:
Facebook collects large volumes of data in from of images, videos, comments, likes, messages, audio, calls, etc.
- It then analyzes this data to give personalized Facbook Ads.
- Also, using this data, Facebook gives you personalized news feeds.
- And photo tag suggestions.
“Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Plus it gave the first details on its new “Project Prism”.
- Hadoop is used in Facebook to do a lot of things because of its processing power and its fault tolerance capability.Basically, Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines and storing more than hundreds of millions of gigabytes.
- They use Hbase which is a big data database working on Hadoop for the facebook messaging service. They have modified Hbase to read data efficiently even after a region server goes down to avoid latency.
- Cassandra is another Big Data database which they use to drive user searches at Facebook.
- For storing images and fetching them they use something called (HayStack). Haystack is also periodically backed up to Hadoop. So that if Haystack goes down it can start back up from where it left off.
Facebook Inc. analytics chief Ken Rudin says, “Big Data is crucial to the company’s very being.” He goes on to say that, “Facebook relies on a massive installation of Hadoop, a highly scalable open-source framework that uses clusters of low-cost servers to solve problems. Facebook even designs its hardware for this purpose. Hadoop is just one of many Big Data technologies employed at Facebook.”
Here are a few examples that show how Facebook uses its Big Data.
- The Flashback
Honoring its 10th anniversary, Facebook offered its users the option of viewing and sharing a video that traces the course of their social network activity from the date of registration until the present. Called the “Flashback,” this video is a collection of photos and posts that received the most comments and likes and set to nostalgic background music. Other videos have been created since then, including those you can view and share in celebrating a “Friendversary,” the anniversary of two people becoming friends on Facebook. You’ll also be able to see a special video on your birthday. - I Voted
Facebook successfully tied the political activity to user engagement when they came out with a social experiment by creating a sticker allowing its users to declare “I Voted” on their profiles.This experiment ran during the 2010 midterm elections and seemed useful. Users who noticed the button were likely to vote and be vocal about the behavior of voting once they saw their friends were participating in it. Out of a total of 61 million users, then, 20% of the users who saw their friends voting, also clicked the sticker.The Data science unit at Facebook has claimed that with the combination of their stickers that motivated close to 60,000 voters directly, and the social contagion, which prompted 280,000 connected users to vote for a total of 340,000 additional voters in the midterm elections.For the 2016 elections, Facebook expanded its involvement into the voting process with reminders and directions to users’ polling places. - Celebrate Pride
Following the Supreme Court’s judgment on same-sex marriage as a Constitutional right, Facebook turned into a drenched rainbow spectacle called “Celebrate Pride,” a way of showing support for marriage equality. Facebook provided an easy, simple way to transform profile pictures into rainbow-colored ones. Celebrations such as these hadn’t been seen since 2013 when 3 million people updated their profile pictures to the red equals sign (the logo of the Human Rights Campaign).Within the first few hours of availability, more than a million users had changed their profile pictures, according to the spokesperson for Facebook, William Nevius. All this excitement also raised questions about what kind of research Facebook was conducting after their tracking user moods and citing behavior research. When the company published a paper, The Diffusion of Support in an Online Social Movement, two data scientists at Facebook had analyzed the factors which predicted the support for marriage equality on Facebook. Factors that contributed to a user changing profile pictures to the red sign were looked at. - Topic Data
Topic Data is a Facebook technology that displays to marketers the responses of the audience about brands, events, activities, and subjects in a way that keeps their personal information private. Marketers use the information from topic data to selectively change the way they market on the platform as well as other channels.This data was previously available through third parties but was not as useful because the sample size was too small to be significant, and the determination of demographics was almost impossible. With Topic Data, Facebook has grouped the data and stripped personal information for user activity to help marketers by offering insights on all the possible activities related to a specific topic. This gives marketers an actionable and comprehensive view of their audience for the first time.
Two Problems with Facebook:
Ken Rudin states that companies who rely on Big Data often owe their frustration to two mistakes:
They rely too much on one technology, like Hadoop. Facebook relies on a massive installation of Hadoop software, which is a highly scalable open-source framework that uses bundles of low-cost servers to solve problems.
The company even designs its in-house hardware for this purpose. Mr. Rudin says, “The analytic process at Facebook begins with a 300 petabyte data analysis warehouse. To answer a specific query, data is often pulled out of the warehouse and placed into a table so that it can be studied. The team also built a search engine that indexes data in the warehouse. These are just some of the many technologies that Facebook uses to manage and analyze information.”
Companies use big data to answer meaningless questions. Mr. Rudin also says, “At Facebook, a meaningful question is defined as one that leads to an answer that provides a basis for changing behavior. If you can’t imagine how the answer to a question would lead you to change your business practices, the question isn’t worth asking.”
“Another stat Facebook revealed was that over 100 petebytes of data are stored in a single Hadoop disk cluster.”
Example of a Company that uses Big Data for Customer Acquisition and Retention:
A real example of a company that uses big data analytics to drive customer retention is Coca-Cola. In the year 2015, Coca-Cola managed to strengthen its data strategy by building a digital-led loyalty program. Coca-Cola director of data strategy was interviewed by ADMA managing editor. The interview made it clear that big data analytics is strongly behind customer retention at Coca-Cola. Below is an abstract of the full interview on what Coca-Cola had to say about the role of big data in achieving customer retention.
Example of a Brand that uses Big Data for Targeted Adverts:
Netflix is a good example of a big brand that uses big data analytics for targeted advertising. With over 100 million subscribers, the company collects huge data, which is the key to achieving the industry status Netflix boosts. If you are a subscriber, you are familiar to how they send you suggestions of the next movie you should watch. Basically, this is done using your past search and watch data. This data is used to give them insights on what interests the subscriber most. See the screenshot below showing how Netflix gathers big data.
https://insidebigdata.com/2018/01/20/netflix-uses-big-data-drive-success/
Conclusion:
Big Data has turned out to be really important for businesses who want to maintain their files and huge amount of data. Companies have moved to Big Data technologies in order to maintain data for analysis or business development purposes.