What is meant by the quality of a big data model. Overview of the global big data market

16.07.18. Mail.ru launched Big Data as a Service

The Mail.ru cloud platform has been replenished with a service for big data analysis, Cloud Big Data, which is based on the Apache Hadoop and Spark frameworks. The service will be useful to retailers and financial organizations that need to analyze big data, but do not want to spend a lot of money on their own servers. Mail.ru takes money only for the actual operating time of the equipment. Thus, a Hadoop cluster of 10 nodes will cost the client 39 rubles. for one hour of work. Let us remember that last year MTS launched the same BigData service Cloud MTS. The cost of using the service was from 5 thousand rubles. per month. Also, big data processing services are provided by.

2017. MTS launched a cloud platform for processing Big Data


The MTS operator has launched a service for cloud processing of big data for business as part of its cloud platform #CloudMTS. Companies will be able to work with data in the Hadoop and Spark software environments. For example, this cloud service will help businesses target advertising, collect and process open data, and conduct financial and business analytics. Online stores will be able to analyze customer behavior and then offer ready-made targeted offers for various events and holidays. The service works using pre-installed calculation tools, but it is possible to create your own data processing algorithms. The cost of using the service starts from 5 thousand rubles per month, the price varies depending on the space occupied in the cloud. Let us remind you that the #CloudMTS platform was created in 2016. At first it provided only cloud storage services, and later it also added cloud computing services.

2016. Big Data forecast for the Rio Olympics


Soon, Big Data services will tell you what decisions to make to grow and ensure the security of your business. For now, they mainly train at sporting events. Remember, recently intellectual Microsoft platform predicted the results of the European Football Championship? So, she guessed nothing right. This time, the American company Gracenote, which specializes in big data processing, has calculated the most likely option for the medal standings at the Rio Olympics. The picture shows a forecast that was compiled a month before the Olympics. It is constantly updated. Current version - .

2016. Microsoft's intelligent platform predicted the results of the European Football Championship


Microsoft (like any self-respecting IT giant) already has an analytical platform based on Big Data processing and artificial intelligence - Microsoft Cortana Intelligence Suite. Based on various data from your business systems, it can predict customer churn, equipment breakdowns, revenue changes, etc. And now, Microsoft is giving us the opportunity to check how accurately this platform works. After analyzing football history, statistical information about teams, player performance, injuries, as well as comments from fans on social networks, she presented her forecast for the European Football Championship, which starts today. So, according to the forecast: Germany will defeat Spain in the final with a probability of 66%. And in the opening match, France will win against Romania with a probability of 71%.

2016. SAP and Yandex create Big Data service for customer retention


2 years ago Yandex launched a service that provides big data processing services for businesses. This service has already helped companies such as Beeline and Wargaming (Word of Tanks) avoid customer churn. It predicts churn periods based on historical data and gives the business the opportunity to prepare and offer some new promotion on time. Now, apparently, this Yandex technology has attracted the interest of the world's largest player in the corporate IT market - SAP. The companies have joined forces to develop a service to predict customer behavior. According to SAP and YDF, the service will be used in retail, e-commerce, banks and telecommunications. They say the service will be designed for medium-sized businesses, including the price.

2016. PROMT Analyzer - morphological Big Data analyzer


The PROMT company has released PROMT Analyzer - a solution in the field of artificial intelligence for working with big data in information and analytical systems. The tool is designed to search, extract, summarize and structure information from almost any text content in different languages ​​as in corporate systems, and in external sources. It analyzes any texts or documents, identifies entities in them (persons, organizations, geographical names, geopolitical entities, etc.), and also determines the actions related to these entities, the date and place of the action, and forms a holistic image of the document. PROMT Analyzer allows you to solve the most different tasks: analysis of the company’s internal resources (document flow systems), analysis of external resources (media, blogosphere, etc.), analysis of data obtained from closed sources to assess the criticality of situations, analysis of the activity of an object with reference to geography, as well as optimization of search engines and services support.

2016. Mail.Ru will help companies analyze their data


Mail.Ru strives to keep up with its main competitor, Yandex. A year ago Yandex big data analysis service for business. And now Mail.ru has opened a Big Data direction for corporate clients. First of all, it will deal with projects aimed at improving the efficiency of marketing and sales processes, production optimization, logistics, risk management, planning, personnel management and other work processes of various businesses. For example, Mail.ru will be able to create a model for predicting customer outflow, response to offers, and forecasting the reaction to an appeal through a specific communication channel. This will make interaction with a potential client more personalized. Mail.ru states that the company has been analyzing data virtually since its founding and has its own machine learning technologies.

2015. IBM will become the leading provider of business weather forecasts


Is weather important for business? Of course, especially if your business is an agricultural enterprise, travel agency, cafe or clothing store. The weather affects the stability of supplies, assortment selection and sales activity. In this case, every self-respecting business intelligence system should take into account the weather forecast. That's what IBM thought and bought the world's largest weather service, The Weather Company. IBM plans to feed data from three billion forecast reference points to its supercomputer Watson and revolutionize weather forecasting. They also plan to create a platform that will allow third-party business applications to use weather information for a fee.

2015. Video: How to use Big Data to attract talented employees


Do you still doubt that Big Data is useful for business? Then watch this video about how Beeline attracts new talented employees using Big Data. At the beginning of September, a Big Data Taxi in the form of a Tesla car was running around Moscow. According to a Beeline representative, in addition to helping to attract new talent, Big Data technologies allow the company to solve a variety of problems. Starting from such simple and trivial ones as “find all those who use a SIM purchased using someone else’s passport” and ending with “determine the subscriber’s age based on a set of indicators.”

2015. Microsoft introduced a talking Big Data platform


Big Data technologies promise companies magical optimization of business processes, for example: you will always have the right amount of goods in the right place, in right time. But those companies that have already tried Big Data say: in practice it doesn’t work. Existing Big Data systems are designed for analysts, and to an ordinary employee who must make a decision here and now, they do not help. Therefore, Microsoft decided to release a Big Data platform with a human face (more precisely, a voice) - Cortana Analytics Suite. It is based on the Azure cloud platform and uses voice Cortana assistant as an interface. It is assumed that with the help of a visual designer, any department head will be able to create mini-applications that process large amounts of data, and any employee will be able to ask Cortana and receive the right information at the right time, in the right place.

2015. Video: What is Big Data and who needs it?


Russian startup CleverData is positioned as a Big Data integrator. They implement projects to solve specific business problems using Big Data platforms and technologies. In the video, CleverData CEO Denis Afanasyev talks interestingly about what Big Data is and where this big data came from. It turns out that big data processing technologies have existed for decades, but the reason for the emergence of the marketing term Big Data is that (thanks to cloud computing) their cost has decreased and they have become accessible to small-medium companies. According to Denis, Big Data is most often used for marketing (customer base segmentation, online advertising), IT security (fraud detection, breakdown prediction), and risk management (assessing the creditworthiness of clients).

2015. SAP introduced the Next Big Thing - the S/4HANA ERP system


The first SAP ERP system was called R/2 and ran on mainframes. Then there was R/3. In 2004, SAP Business Suite appeared. The other day SAP presented (as they say) the most important product in its history - new version S4/HANA. When creating it, the developers were not thinking about how to outdo the eternal competitor Oracle, but about how to avoid being outdone by the aggressive SaaS providers Salesforce and Workday. Therefore, S4 will be able to work both locally and in the cloud. The main feature of the system is speed. As the name suggests, S4 is based on the leading Big-Data platform SAP HANA, which allows you to process very large data in seconds. The second main feature is the interface. Forget about complex tables and menus that you can’t figure out without a bottle. SAP wants the new powerful system to be controlled using a smartphone. You can use at least 25 simple SAP Fiori applications to work with SAP. Here is their video presentation:

2014. Yandex has opened a Big Data service for business


Yandex has launched the Yandex Data Factory project, which will provide big data processing services for businesses. To do this, it uses Matrixnet machine learning technology, which Yandex developed to rank sites in its search engine. It is stated that Yandex plans to become a competitor to companies such as SAP AG and Microsoft. At the moment, Yandex Data Factory specialists have implemented several pilot projects with European companies. In particular, Yandex artificial intelligence was used by a company servicing power lines to predict breakdowns, by a bank to target borrowers, and by a highway agency to predict traffic jams. In addition, it turns out that Yandex processes data obtained from the famous hadron collider at CERN.

2014. Microsoft will help Real Madrid win with Big Data


They do not seek good from good. Real Madrid have been playing quite well lately and achieving good results. However, the laurels of the German national team, which won the World Cup with help, haunt the president of the Madrid club, Florentino Perez (far left in the photo). Therefore, he signed a contract with Microsoft for $30 million, within the framework of which a modern IT infrastructure for the club will be created. Real Madrid coaching staff and players will receive Surface Pro 3 tablets with pre-installed Office 365 applications for closer staff collaboration. And using the analytical tools of Power BI for Office 365, team coaches will be able to study the performance of football players, identify long-term trends and even predict injuries.

2014. 1C-Bitrix launched Big Data service


Big Data - technologies for processing very large volumes of data in order to obtain simple and useful results for business - is one of the main new trends in the IT market. And the 1C-Bitrix BigData service is perhaps the first domestic service based on this technology. The first application of this artificial intelligence will be the optimization (personalization) of online stores using the Bitrix engine for each new visitor. Based on the analysis of a large amount of data about all past visitors, the service will be able to predict the behavior of a new visitor on the site, highlight customers similar to him, and make him personalized offers based on the purchase history of other customers. Probably, we can soon expect Big Data functions in the Bitrix24 business management system.

2014. SAP: The German team won the World Cup thanks to Big Data


Recently, last year, the Oracle yacht won the America's Cup, and then Oracle said that this victory was largely due to the Big Data analysis system in the Oracle cloud. Now the time has come for Oracle's eternal competitor, the German company SAP, to respond to this PR move. It turned out that the German team won the World Cup also thanks to Big Data. SAP has developed a Match Insights system that reads a football match into a 3-dimensional digital model and analyzes the actions of each player and the team as a whole. Not only the matches of our own team were analyzed (to correct errors and improve efficiency), but also the matches of competitors. Artificial intelligence found weak spots opponents and helped the team prepare for the match. The moral of the story is: Imagine what Big Data can do for your business.

2014. CROC launched a cloud-based Business Intelligence solution


System integrator Croc has launched a business intelligence service with the self-explanatory name “Business Intelligence as a Service” or BIaaS. The solution is designed for large organizations interested in reducing capital costs and accelerating adoption management decisions. The system is built on the EMC Greenplum product and is a Big Data level solution. Using this tool, you can analyze and compare large volumes of information, build key indicators and make business decisions, bypassing the stage of capital expenditures for the purchase of software, licenses and possible infrastructure modernization. The solution allows you to implement three possible scenarios for working with data - analytics for retail, analysis of contact center performance indicators, and evaluation management activities organization for compliance with KPIs.

2013. SAP makes big businesses efficient with Big Data. Competitors are crying


In recent years, SAP has proven itself to be the least innovative IT company (compared to competitors Oracle, Microsoft, IBM). All of SAP's own innovative projects mostly failed (remember), and the only thing SAP succeeded in doing was buying other companies (SuccessFactors, SyBase, Ariba). But this time SAP seems to have decided to outshine its competitors. And he will do this using the new fashionable technology Big Data. What it is?

Only the lazy don’t talk about Big data, but they hardly understand what it is and how it works. Let's start with the simplest thing - terminology. Speaking in Russian, Big data is various tools, approaches and methods for processing both structured and unstructured data in order to use them for specific tasks and goals.

Unstructured data is information that does not have a predetermined structure or is not organized in a particular order.

The term “big data” was introduced by Nature magazine editor Clifford Lynch back in 2008 in a special issue dedicated to the explosive growth of the world’s volumes of information. Although, of course, big data itself existed before. According to experts, the Big data category includes most data flows over 100 GB per day.

Read also:

Today, this simple term hides only two words - data storage and processing.

Big data - in simple words

In the modern world, Big data is a socio-economic phenomenon that is associated with the fact that new technological capabilities have emerged for analyzing a huge amount of data.

Read also:

To make it easier to understand, imagine a supermarket in which all the goods are not in the order you are used to. Bread next to the fruit, tomato paste next to the frozen pizza, lighter fluid in front of the tampon rack, which contains, among other things, avocados, tofu or shiitake mushrooms. Big data puts everything in its place and helps you find nut milk, find out the cost and expiration date, and also who, besides you, buys this milk and why it is better than cow’s milk.

Kenneth Cukier: Big data is better data

Big data technology

Huge volumes of data are processed so that a person can obtain specific and necessary results for their further effective use.

Read also:

In fact, Big data is a solution to problems and an alternative to traditional data management systems.

Techniques and methods of analysis applicable to Big data according to McKinsey:

  • Crowdsourcing;

    Data mixing and integration;

    Machine learning;

    Artificial neural networks;

    Pattern recognition;

    Predictive analytics;

    Simulation modeling;

    Spatial analysis;

    Statistical analysis;

  • Visualization of analytical data.

Horizontal scalability that enables data processing is the basic principle of big data processing. Data is distributed across computing nodes, and processing occurs without performance degradation. McKinsey also included relational management systems and Business Intelligence in the context of applicability.

Technologies:

  • NoSQL;
  • MapReduce;
  • Hadoop;
  • Hardware solutions.

Read also:

For big data, there are traditional defining characteristics developed by Meta Group back in 2001, which are called “ Three V»:

  1. Volume- the amount of physical volume.
  2. Velocity- growth rate and the need for fast data processing to obtain results.
  3. Variety- the ability to simultaneously process different types of data.

Big data: applications and opportunities

It is impossible to process the volumes of heterogeneous and rapidly arriving digital information with traditional tools. Data analysis itself allows you to see certain and imperceptible patterns that a person cannot see. This allows us to optimize all areas of our lives - from public administration to production and telecommunications.

For example, some companies a few years ago protected their clients from fraud, and taking care of the client’s money means taking care of your own money.

Susan Etliger: What about big data?

Solutions based on Big data: Sberbank, Beeline and other companies

Beeline has a huge amount of data about subscribers, which they use not only to work with them, but also to create analytical products, such as external consulting or IPTV analytics. Beeline segmented the database and protected clients from financial fraud and viruses, using HDFS and Apache Spark for storage, and Rapidminer and Python for data processing.

Read also:

Or let’s remember Sberbank with their old case called AS SAFI. This is a system that analyzes photographs to identify bank customers and prevent fraud. The system was introduced back in 2014, the system is based on comparing photographs from the database, which get there from web cameras on stands thanks to computer vision. The basis of the system is a biometric platform. Thanks to this, cases of fraud have decreased by 10 times.

Big data in the world

By 2020, according to forecasts, humanity will generate 40-44 zettabytes of information. And by 2025 it will grow 10 times, according to the report The Data Age 2025, which was prepared by analysts from IDC. The report notes that most of the data will be generated by businesses themselves, rather than ordinary consumers.

Research analysts believe that data will become a vital asset, and security a critical foundation in life. The authors of the work are also confident that technology will change the economic landscape, and regular user will communicate with connected devices about 4800 times a day.

Big data market in Russia

Big data typically comes from three sources:

  • Internet (social networks, forums, blogs, media and other sites);
  • Corporate document archives;
  • Readings from sensors, instruments and other devices.

Big data in banks

In addition to the system described above, Sberbank’s strategy for 2014-2018 includes: talks about the importance of analyzing super data for quality customer service, risk management and cost optimization. Now the bank uses Big data to manage risks, combat fraud, segment and assess the creditworthiness of customers, personnel management, forecasting queues in branches, calculating bonuses for employees and other tasks.

VTB24 uses big data to segment and manage customer outflows, generate financial reporting, and analyze reviews on social networks and forums. To do this, he uses solutions from Teradata, SAS Visual Analytics and SAS Marketing Optimizer.

It was predicted that the total global volume of data created and replicated in 2011 could be about 1.8 zettabytes (1.8 trillion gigabytes) - about 9 times more than what was created in 2006.

More complex definition

However` big data` involve more than just analyzing huge amounts of information. The problem is not that organizations create huge volumes of data, but that most of it is in a format that does not fit well with the traditional structured database format - web logs, videos, text documents, machine code or, for example, geospatial data . All this is stored in many different repositories, sometimes even outside the organization. As a result, corporations may have access to a huge amount of their data and lack the necessary tools to establish relationships between this data and draw meaningful conclusions from it. Add to this the fact that data is now being updated more and more frequently, and you get a situation in which traditional methods of information analysis cannot keep up with the huge volumes of constantly updated data, which ultimately opens the way for technology big data.

Best definition

In essence the concept big data involves working with information of a huge volume and diverse composition, very often updated and located in different sources in order to increase operational efficiency, create new products and increase competitiveness. The consulting company Forrester gives a brief formulation: ` Big Data brings together techniques and technologies that extract meaning from data at the extreme limits of practicality.

How big is the difference between business analytics and big data?

Craig Bathy, executive director of marketing and chief technology officer of Fujitsu Australia, pointed out that business analysis is a descriptive process of analyzing the results achieved by a business in a certain period of time, while the processing speed big data allows you to make the analysis predictive, capable of offering business recommendations for the future. Big data technologies also allow you to analyze more types of data than business intelligence tools, which makes it possible to focus on more than just structured repositories.

Matt Slocum of O'Reilly Radar believes that although big data and business analytics have the same goal (finding answers to a question), they differ from each other in three aspects.

  • Big data is designed to handle larger volumes of information than business analytics, and this certainly fits the traditional definition of big data.
  • Big data is designed to handle faster, faster-changing information, which means deep exploration and interactivity. In some cases, results are generated faster than the web page loads.
  • Big data is designed to process unstructured data that we are only beginning to explore how to use once we have been able to collect and store it, and we need algorithms and conversational capabilities to make it easier to find trends contained within these data sets.

According to the white paper "Oracle Information Architecture: An Architect's Guide to Big Data" published by Oracle, when working with big data, we approach information differently than when conducting business analysis.

Working with big data is not like the usual business intelligence process, where simply adding up known values ​​produces a result: for example, adding up paid invoices becomes sales for the year. When working with big data, the result is obtained in the process of cleaning it through sequential modeling: first, a hypothesis is put forward, a statistical, visual or semantic model, on its basis the correctness of the put forward hypothesis is checked and then the next one is put forward. This process requires the researcher to either interpret visual meanings or construct interactive queries based on knowledge, or develop adaptive `machine learning` algorithms that can produce the desired result. Moreover, the lifetime of such an algorithm can be quite short.

Big data analysis techniques

There are many different methods for analyzing data sets, which are based on tools borrowed from statistics and computer science (for example, machine learning). The list does not pretend to be complete, but it reflects the most popular approaches in various industries. It should be understood that researchers continue to work on creating new techniques and improving existing ones. In addition, some of the techniques listed are not necessarily applicable exclusively to big data and can be successfully used for smaller arrays (for example, A/B testing, regression analysis). Of course, the more voluminous and diversified the array is analyzed, the more accurate and relevant data can be obtained as a result.

A/B testing. A technique in which a control sample is alternately compared with others. Thus, it is possible to identify the optimal combination of indicators to achieve, for example, the best consumer response to a marketing offer. Big Data allow you to carry out a huge number of iterations and thus obtain a statistically reliable result.

Association rule learning. A set of techniques for identifying relationships, i.e. association rules between variables in large data sets. Used in data mining.

Classification. A set of techniques that allows you to predict consumer behavior in a certain market segment (purchase decisions, churn, consumption volume, etc.). Used in data mining.

Cluster analysis. A statistical method for classifying objects into groups by identifying common features that are not known in advance. Used in data mining.

Crowdsourcing. Methodology for collecting data from a large number of sources.

Data fusion and data integration. A set of techniques that allows you to analyze comments from social network users and compare them with sales results in real time.

Data mining. A set of techniques that allows you to determine the categories of consumers most susceptible to the promoted product or service, identify the characteristics of the most successful employees, and predict the behavioral model of consumers.

Ensemble learning. This method uses many predictive models, thereby improving the quality of the forecasts made.

Genetic algorithms. In this technique, possible solutions are represented in the form of `chromosomes', which can be combined and mutated. As in the process of natural evolution, the fittest individual survives.

Machine learning. A direction in computer science (historically it has been given the name “artificial intelligence”), which pursues the goal of creating self-learning algorithms based on the analysis of empirical data.

Natural language processing (NLP). A set of recognition techniques borrowed from computer science and linguistics natural language person.

Network analysis. A set of techniques for analyzing connections between nodes in networks. In relation to social networks, it allows you to analyze the relationships between individual users, companies, communities, etc.

Optimization. A set of numerical methods for redesigning complex systems and processes to improve one or more metrics. Helps in making strategic decisions, for example, the composition of the product line to be launched on the market, conducting investment analysis, etc.

Pattern recognition. A set of techniques with self-learning elements for predicting the behavioral model of consumers.

Predictive modeling. A set of techniques that allow you to create mathematical model a predetermined probable scenario for the development of events. For example, analysis of the CRM system database for possible conditions that will prompt subscribers to change providers.

Regression. A set of statistical methods for identifying a pattern between changes in a dependent variable and one or more independent variables. Often used for forecasting and predictions. Used in data mining.

Sentiment analysis. Techniques for assessing consumer sentiment are based on natural language recognition technologies. They allow you to isolate messages related to the subject of interest (for example, a consumer product) from the general information flow. Next, evaluate the polarity of the judgment (positive or negative), the degree of emotionality, etc.

Signal processing. A set of techniques borrowed from radio engineering that aims to recognize a signal against a background of noise and its further analysis.

Spatial analysis. A set of methods for analyzing spatial data, partly borrowed from statistics - terrain topology, geographical coordinates, geometry of objects. Source big data in this case they often appear geographic information systems(GIS).

Statistics. The science of collecting, organizing, and interpreting data, including developing questionnaires and conducting experiments. Statistical methods are often used to make value judgments about the relationships between certain events.

Supervised learning. A set of techniques based on machine learning technologies that allow you to identify functional relationships in analyzed data sets.

Simulation. Modeling the behavior of complex systems is often used to forecast, forecast and work through various scenarios in planning.

Time series analysis. A set borrowed from statistics and digital processing signal methods for analyzing data sequences repeated over time. Some obvious applications are tracking the stock market or patient illnesses.

Unsupervised learning. A set of techniques based on machine learning technologies that allow you to identify hidden functional relationships in the analyzed data sets. Has common features with Cluster Analysis.

Visualization. Methods graphical representation results of big data analysis in the form of charts or animated images to simplify interpretation and make the results easier to understand.


Visual representation of the results of big data analysis is of fundamental importance for their interpretation. It is no secret that human perception is limited, and scientists continue to conduct research in the field of improvement modern methods Presenting data in the form of images, charts or animations.

Analytical tools

As of 2011, some of the approaches listed in the previous subsection or a certain combination of them make it possible to implement analytical engines for working with big data in practice. Among the free or relatively inexpensive open Big Data analysis systems we can recommend:

  • Revolution Analytics (based on the R language for mathematical statistics).

Of particular interest on this list is Apache Hadoop, an open source software that has been proven as a data analyzer by most stock trackers over the past five years. As soon as Yahoo opened the Hadoop code to the open source community, a whole movement of creating products based on Hadoop immediately appeared in the IT industry. Almost all modern analysis tools big data provide Hadoop integration tools. Their developers are both startups and well-known global companies.

Markets for Big Data Management Solutions

Big Data Platforms (BDP, Big Data Platform) as a means of combating digital hording

Ability to analyze big data, colloquially called Big Data, is perceived as a benefit, and unambiguously. But is this really so? What could the rampant accumulation of data lead to? Most likely to what domestic psychologists, in relation to humans, call pathological hoarding, syllogomania, or figuratively “Plyushkin syndrome.” In English, the vicious passion to collect everything is called hording (from the English hoard - “stock”). According to the classification of mental illnesses, hording is classified as mental disorders. In the digital era, digital hoarding is added to the traditional material hording; it can affect both individuals and entire enterprises and organizations ().

World and Russian market

Big data Landscape - Main suppliers

Interest in collection, processing, management and analysis tools big data Almost all leading IT companies showed this, which is quite natural. Firstly, they directly encounter this phenomenon in own business, Secondly, big data open up excellent opportunities for developing new market niches and attracting new customers.

Many startups have appeared on the market that make business by processing huge amounts of data. Some of them use ready-made cloud infrastructure provided by large players like Amazon.

Theory and practice of Big Data in industries

History of development

2017

TmaxSoft forecast: the next “wave” of Big Data will require modernization of the DBMS

Enterprises know that the vast amounts of data they accumulate contain important information about their business and clients. If a company can successfully apply this information, it will have a significant advantage over its competitors and will be able to offer better products and services than theirs. However, many organizations still fail to effectively use big data due to the fact that their legacy IT infrastructure is unable to provide the necessary storage capacity, data exchange processes, utilities and applications required to process and analyze large amounts of unstructured data to extract valuable information from them, TmaxSoft indicated.

Additionally, the increased processing power needed to analyze ever-increasing volumes of data may require significant investment in an organization's legacy IT infrastructure, as well as additional maintenance resources that could be used to develop new applications and services.

On February 5, 2015, the White House released a report that discussed how companies are using " big data» to charge different prices to different customers, a practice known as “price discrimination” or “personalized pricing”. The report describes the benefits of big data for both sellers and buyers, and its authors conclude that many of the issues raised by big data and differential pricing can be addressed through existing anti-discrimination laws and regulations. protecting consumer rights.

The report notes that at this time, there is only anecdotal evidence of how companies are using big data in the context of personalized marketing and differentiated pricing. This information shows that sellers use pricing methods that can be divided into three categories:

  • study of the demand curve;
  • Steering and differentiated pricing based on demographic data; And
  • targeted behavioral marketing (behavioral targeting) and individualized pricing.

Studying the Demand Curve: To determine demand and study consumer behavior, marketers often conduct experiments in this area in which customers are randomly assigned to one of two possible price categories. “Technically, these experiments are a form of differential pricing because they result in different prices for customers, even if they are “non-discriminatory” in the sense that all customers have the same probability of being “sent” to a higher price.”

Steering: It is the practice of presenting products to consumers based on their membership in a specific demographic group. Yes, website computer company may offer the same laptop to different types of buyers at different prices based on the information they provide about themselves (for example, depending on whether the user is a representative of government agencies, scientific or commercial institutions, or an individual) or their geographic location (for example, determined by the computer’s IP address).

Targeted behavioral marketing and customized pricing: In these cases, customers' personal information is used to target advertising and customize pricing for certain products. For example, online advertisers use data collected by advertising networks and through third-party cookies about online user activity to target their advertisements. This approach, on the one hand, allows consumers to receive advertising of goods and services of interest to them. It may, however, cause concern for those consumers who do not want certain types of their personal data (such as information about visits to websites linked to medical and financial matters) were collected without their consent.

Although targeted behavioral marketing is widespread, there is relatively little evidence of personalized pricing in the online environment. The report speculates that this may be because the methods are still being developed, or because companies are hesitant to use custom pricing (or prefer to keep quiet about it) - perhaps fearing a backlash from consumers.

The report's authors suggest that "for the individual consumer, the use of big data clearly presents both potential rewards and risks." While acknowledging that big data raises transparency and discrimination issues, the report argues that existing anti-discrimination and consumer protection laws are sufficient to address them. However, the report also highlights the need for “ongoing oversight” when companies use sensitive information in ways that are not transparent or in ways that are not covered by existing regulatory frameworks.

This report continues the White House's efforts to examine the use of big data and discriminatory pricing on the Internet and the resulting consequences for American consumers. It was previously reported that the White House Big Data Working Group published its report on this issue in May 2014. The Federal Trade Commission (FTC) also addressed these issues during its September 2014 workshop on big data discrimination.

2014

Gartner dispels myths about Big Data

A fall 2014 research note from Gartner lists a number of common Big Data myths among IT leaders and provides rebuttals to them.

  • Everyone is implementing Big Data processing systems faster than us

Interest in Big Data technologies is at an all-time high: 73% of organizations surveyed by Gartner analysts this year are already investing in or planning to do so. But most of these initiatives are still in the very early stages, and only 13% of respondents have already implemented such solutions. The most difficult thing is to determine how to extract income from Big Data, to decide where to start. Many organizations get stuck in the pilot stage because they cannot tie the new technology to specific business processes.

  • We have so much data that there is no need to worry about small errors in it

Some IT managers believe that small data flaws do not affect the overall results of analyzing huge volumes. When there is a lot of data, each individual error actually has less of an impact on the result, analysts note, but the errors themselves also become more numerous. In addition, most of the analyzed data is external, of unknown structure or origin, so the likelihood of errors increases. So in the world of Big Data, quality is actually much more important.

  • Big Data technologies will eliminate the need for data integration

Big Data promises the ability to process data in its original format, with automatic schema generation as it is read. It is believed that this will allow information from the same sources to be analyzed using multiple data models. Many believe that this will also enable end users to interpret any data set as they see fit. In reality, most users often want the traditional way with a ready-made schema, where the data is formatted appropriately and there are agreements on the level of integrity of the information and how it should relate to the use case.

  • There is no point in using data warehouses for complex analytics

Many information management system administrators believe that there is no point in spending time creating a data warehouse, given that complex analytical systems rely on new types of data. In fact, many complex analytics systems use information from a data warehouse. In other cases, new types of data need to be additionally prepared for analysis in Big Data processing systems; decisions have to be made about the suitability of the data, the principles of aggregation and the required level of quality - such preparation may occur outside the warehouse.

  • Data warehouses will be replaced by data lakes

In reality, vendors mislead customers by positioning data lakes as a replacement for storage or as critical elements of the analytical infrastructure. Underlying data lake technologies lack the maturity and breadth of functionality found in warehouses. Therefore, managers responsible for data management should wait until lakes reach the same level of development, according to Gartner.

Accenture: 92% of those who implemented big data systems are satisfied with the results

Among the main advantages of big data, respondents named:

  • “searching for new sources of income” (56%),
  • “improving customer experience” (51%),
  • “new products and services” (50%) and
  • “an influx of new customers and maintaining the loyalty of old ones” (47%).

When introducing new technologies, many companies are faced with traditional problems. For 51%, the stumbling block was security, for 47% - the budget, for 41% - the lack of necessary personnel, and for 35% - difficulties in integrating with existing system. Almost all companies surveyed (about 91%) plan to soon solve the problem of staff shortages and hire big data specialists.

Companies are optimistic about the future of big data technologies. 89% believe they will change business as much as the Internet. 79% of respondents noted that companies that do not engage in big data will lose their competitive advantage.

However, respondents disagreed about what exactly should be considered big data. 65% of respondents believe that these are “large data files”, 60% believe that this is “advanced analytics and analysis”, and 50% believe that this is “data visualization tools”.

Madrid spends €14.7 million on big data management

In July 2014, it became known that Madrid would use big data technologies to manage city infrastructure. The cost of the project is 14.7 million euros, the basis of the implemented solutions will be technologies for analyzing and managing big data. With their help, the city administration will manage work with each service provider and pay accordingly depending on the level of services.

We are talking about administration contractors who monitor the condition of streets, lighting, irrigation, green spaces, clean up the territory and remove, as well as recycle waste. During the project, 300 key performance indicators of city services were developed for specially designated inspectors, on the basis of which 1.5 thousand various checks and measurements will be carried out daily. In addition, the city will begin using an innovative technology platform called Madrid iNTeligente (MiNT) - Smarter Madrid.

2013

Experts: Big Data is in fashion

Without exception, all vendors in the data management market are currently developing technologies for Big Data management. This new technological trend is also actively discussed by the professional community, both developers and industry analysts and potential consumers of such solutions.

As Datashift found out, as of January 2013, there was a wave of discussions around “ big data"exceeded all imaginable dimensions. After analyzing the number of mentions of Big Data on social networks, Datashift calculated that in 2012 the term was used about 2 billion times in posts created by about 1 million different authors around the world. This is equivalent to 260 posts per hour, with a peak of 3,070 mentions per hour.

Gartner: Every second CIO is ready to spend money on Big data

After several years of experimentation with Big data technologies and the first implementations in 2013, the adaptation of such solutions will increase significantly, Gartner predicts. Researchers surveyed IT leaders around the world and found that 42% of respondents have already invested in Big data technologies or plan to make such investments within the next year (data as of March 2013).

Companies are forced to spend money on processing technologies big data, since the information landscape is rapidly changing, requiring new approaches to information processing. Many companies have already realized that large amounts of data are critical, and working with them allows them to achieve benefits that are not available using traditional sources of information and methods of processing it. In addition, the constant discussion of the topic of “big data” in the media fuels interest in relevant technologies.

Frank Buytendijk, a vice president at Gartner, even called on companies to tone down their efforts as some worry they are falling behind competitors in their adoption of Big Data.

“There is no need to worry; the possibilities for implementing ideas based on big data technologies are virtually endless,” he said.

Gartner predicts that by 2015, 20% of Global 1000 companies will have a strategic focus on “information infrastructure.”

In anticipation of the new opportunities that big data processing technologies will bring, many organizations are already organizing the process of collecting and storing various types of information.

For education, government, and industrial organizations, the greatest potential for business transformation lies in the combination of accumulated data with so-called dark data (literally, “dark data”), the latter including email messages, multimedia and other similar content. According to Gartner, the winners in the data race will be those who learn to deal with a variety of sources of information.

Cisco survey: Big Data will help increase IT budgets

The Spring 2013 Cisco Connected World Technology Report, conducted in 18 countries by independent research firm InsightExpress, surveyed 1,800 college students and an equal number of young professionals between the ages of 18 and 30. The survey was conducted to find out the level of readiness of IT departments to implement projects Big Data and gain insight into the challenges involved, technological shortcomings and strategic value of such projects.

Most companies collect, record and analyze data. However, the report says, many companies face a range of complex business and information technology challenges with Big Data. For example, 60 percent of respondents admit that Big Data solutions can improve decision-making processes and increase competitiveness, but only 28 percent said that they are already receiving real strategic benefits from the accumulated information.

More than half of the IT executives surveyed believe that Big Data projects will help increase IT budgets in their organizations, as there will be increased demands on technology, personnel and professional skills. At the same time, more than half of respondents expect that such projects will increase IT budgets in their companies as early as 2012. 57 percent are confident that Big Data will increase their budgets over the next three years.

81 percent of respondents said that all (or at least some) Big Data projects will require the use of cloud computing. Thus, the spread cloud technologies may impact the adoption rate of Big Data solutions and the business value of those solutions.

Companies collect and use many different types of data, both structured and unstructured. Here are the sources from which survey participants receive their data (Cisco Connected World Technology Report):

Nearly half (48 percent) of IT leaders predict the load on their networks will double over the next two years. (This is especially true in China, where 68 percent of respondents share this view, and in Germany – 60 percent). 23 percent of respondents expect network load to triple over the next two years. At the same time, only 40 percent of respondents declared their readiness for explosive growth in network traffic volumes.

27 percent of respondents admitted that they need better IT policies and information security measures.

21 percent need more bandwidth.

Big Data opens up new opportunities for IT departments to add value and build strong relationships with business units, allowing them to increase revenue and strengthen the company's financial position. Big Data projects make IT departments a strategic partner to business departments.

According to 73 percent of respondents, the IT department will become the main driver of the implementation of the Big Data strategy. At the same time, respondents believe that other departments will also be involved in the implementation of this strategy. First of all, this concerns the departments of finance (named by 24 percent of respondents), research and development (20 percent), operations (20 percent), engineering (19 percent), as well as marketing (15 percent) and sales (14 percent).

Gartner: Millions of new jobs needed to manage big data

Global IT spending will reach $3.7 billion by 2013, which is 3.8% more than spending on information technology in 2012 (year-end forecast is $3.6 billion). Segment big data(big data) will develop at a much faster pace, says a Gartner report.

By 2015, 4.4 million jobs in information technology will be created to service big data, of which 1.9 million jobs will be in . Moreover, each such job will entail the creation of three additional jobs outside of the IT sector, so that in the United States alone, 6 million people will work to support the information economy in the next four years.

According to Gartner experts, the main problem is that there is not enough talent in the industry for this: both the private and public educational systems, for example in the United States, are not able to supply the industry with a sufficient number of qualified personnel. So of the new IT jobs mentioned, only one out of three will be staffed.

Analysts believe that the role of nurturing qualified IT personnel should be taken directly by companies that urgently need them, since such employees will be their ticket to the new information economy of the future.

2012

The first skepticism regarding "Big Data"

Analysts from Ovum and Gartner suggest that for a fashionable topic in 2012 big data The time may come to liberate yourself from illusions.

The term “Big Data” at this time usually refers to the constantly growing volume of information entering the operational mode from social media, sensor networks and other sources, as well as a growing range of tools used to process data and identify important business trends from it.

“Because of (or despite) the hype around the idea of ​​big data, manufacturers in 2012 looked at this trend with great hope,” said Tony Bayer, an analyst at Ovum.

Bayer reported that DataSift conducted a retrospective analysis of big data mentions in

Moscow_Exchange May 6, 2015 at 8:38 pm

Analytical overview of the Big Data market

  • Moscow Exchange company blog,
  • Big Data

"Big Data" is a topic that is actively discussed by technology companies. Some of them have become disillusioned with big data, while others, on the contrary, are making the most of it for business... A fresh analytical review of the domestic and global Big Data market, prepared by the Moscow Exchange together with IPOboard analysts, shows which trends are most relevant in the market now . We hope the information will be interesting and useful.

WHAT IS BIG DATA?

Key Features
Big Data is currently one of the key drivers of information technology development. This direction, relatively new for Russian business, has become widespread in Western countries. This is due to the fact that in the era of information technology, especially after the boom of social networks, a significant amount of information began to accumulate for each Internet user, which ultimately gave rise to the development of Big Data.

The term “Big Data” causes a lot of controversy; many believe that it only means the amount of accumulated information, but we should not forget about the technical side; this area includes storage technologies, computing, and services.

It should be noted that this area includes the processing of a large amount of information, which is difficult to process using traditional methods*.

Below is a comparison table between traditional and Big Data databases.

The field of Big Data is characterized by the following features:
Volume – volume, the accumulated database represents a large amount of information that is labor-intensive to process and store in traditional ways; they require a new approach and improved tools.
Velocity – speed, this attribute indicates both the increasing speed of data accumulation (90% of information was collected over the last 2 years) and the speed of data processing; real-time data processing technologies have recently become more in demand.
Variety – diversity, i.e. the ability to simultaneously process structured and unstructured information of various formats. The main difference between structured information is that it can be classified. An example of such information would be information about customer transactions.
Unstructured information includes video, audio files, free text, information coming from social networks. Today, 80% of information is unstructured. This information needs complex analysis to make it useful for further processing.
Veracity – reliability of data, users began to attach increasing importance to the reliability of available data. Thus, Internet companies have a problem in separating the actions carried out by a robot and a person on the company’s website, which ultimately leads to difficulties in data analysis.
Value – the value of the accumulated information. Big Data must be useful to the company and bring some value to it. For example, help in improving business processes, reporting or optimizing costs.

If the above 5 conditions are met, the accumulated volumes of data can be classified as large.

Areas of application of Big Data

The scope of use of Big Data technologies is extensive. Thus, with the help of Big Data, you can learn about customer preferences, the effectiveness of marketing campaigns, or conduct risk analysis. Below are the results of a survey by the IBM Institute on the areas of use of Big Data in companies.

As can be seen from the diagram, most companies use Big Data in the field of customer service, the second most popular area is operational efficiency; in the field of risk management, Big Data is less common at the moment.

It should also be noted that Big Data is one of the fastest growing areas of information technology; according to statistics, the total amount of data received and stored doubles every 1.2 years.
Between 2012 and 2014, the amount of data transferred monthly by mobile networks increased by 81%. According to Cisco estimates, in 2014 the volume of mobile traffic was 2.5 exabytes (a unit of measurement of the amount of information equal to 10^18 standard bytes) per month, and in 2019 it will be equal to 24.3 exabytes.
Thus, Big Data is an already established area of ​​technology, even despite its relatively young age, which has become widespread in many areas of business and plays an important role in the development of companies.

Big Data Technologies
Technologies used for collecting and processing Big Data can be divided into 3 groups:
  • Software;
  • Equipment;
  • Services.

The most common data processing (DP) approaches include:
SQL – a structured query language that allows you to work with databases. Using SQL, you can create and modify data, and the management of the data array is handled by the corresponding database management system.
NoSQL – the term stands for Not Only SQL (not only SQL). It includes a number of approaches aimed at implementing a database that differ from the models used in traditional relational DBMSs. They are convenient to use when the data structure is constantly changing. For example, to collect and store information on social networks.
MapReduce – calculation distribution model. Is used for parallel computing over very large data sets (petabytes* or more). In a program interface, it is not the data that is transferred to the program for processing, but the program to the data. Thus, the request is a separate program. The principle of operation is to sequentially process data using two methods: Map and Reduce. Map selects preliminary data, Reduce aggregates it.
Hadoop – used to implement search and contextual mechanisms for high-load sites – Facebook, eBay, Amazon, etc. Distinctive feature is that the system is protected from failure of any of the cluster nodes, since each block has at least one copy of the data on another node.
SAP HANA – high-performance NewSQL platform for data storage and processing. Provides high speed processing requests. Another distinctive feature is that SAP HANA simplifies the system landscape, reducing the cost of supporting analytical systems.

Technological equipment includes:

  • servers;
  • infrastructure equipment.
Servers include data storage.
Infrastructure equipment includes platform acceleration tools, uninterruptible power supplies, server console sets, etc.

Services.
Services include services for building the architecture of a database system, arranging and optimizing the infrastructure and ensuring the security of data storage.

Software, hardware, and services together form comprehensive platforms for data storage and analysis. Companies such as Microsoft, HP, EMC offer services for the development, deployment and management of Big Data solutions.

Applications in industries
Big Data has become widespread in many business sectors. They are used in healthcare, telecommunications, trade, logistics, financial companies, as well as in government administration.
Below are some examples of Big Data applications in some of the industries.

Retail
The databases of retail stores can accumulate a lot of information about customers, inventory management systems, and supplies of commercial products. This information can be useful in all areas of store activity.

Thus, with the help of accumulated information, you can manage the supply of goods, their storage and sale. Based on the accumulated information, it is possible to predict the demand and supply of goods. Also, a data processing and analysis system can solve other problems of a retailer, for example, optimizing costs or preparing reporting.

Financial services
Big Data makes it possible to analyze the borrower’s creditworthiness and is also useful for credit scoring* and underwriting**. The introduction of Big Data technologies will reduce the time for reviewing loan applications. With the help of Big Data, it is possible to analyze the transactions of a specific client and offer banking services that are suitable for him.

Telecom
In the telecommunications industry, Big Data has become widespread among mobile operators.
Cellular operators, along with financial institutions, have some of the most voluminous databases, which allows them to conduct the most in-depth analysis of accumulated information.
The main purpose of data analysis is to retain existing customers and attract new ones. To do this, companies segment customers, analyze their traffic, and determine the social affiliation of the subscriber.

In addition to using Big Data for marketing purposes, technologies are used to prevent fraudulent financial transactions.

Mining and petroleum industries
Big Data is used both in the extraction of minerals and in their processing and marketing. Based on the information received, enterprises can draw conclusions about the efficiency of field development and track the schedule overhaul and equipment condition, forecast demand for products and prices.

According to a survey by Tech Pro Research, Big Data is most widespread in the telecommunications industry, as well as in engineering, IT, financial and government enterprises. According to the results of this survey, Big Data is less popular in education and healthcare. The survey results are presented below:

Examples of using Big Data in companies
Today, Big Data is being actively implemented in foreign companies. Companies such as Nasdaq, Facebook, Google, IBM, VISA, Master Card, Bank of America, HSBC, AT&T, Coca Cola, Starbucks and Netflix are already using Big Data resources.

The applications of the processed information are varied and vary depending on the industry and the tasks that need to be performed.
Next, examples of the application of Big Data technologies in practice will be presented.

HSBC uses Big Data technologies to combat fraudulent transactions with plastic cards. With the help of Big Data, the company increased the efficiency of the security service by 3 times, and the recognition of fraudulent incidents by 10 times. The economic effect from the introduction of these technologies exceeded $10 million.

Antifraud* VISA allows in automatic mode identify fraudulent transactions, the system currently helps prevent $2 billion in fraudulent payments annually.

Watson supercomputer IBM analyzes in real time the flow of data on monetary transactions. According to IBM, Watson increased the number of fraudulent transactions detected by 15%, reduced false positives by 50% and increased the amount of money protected from transactions of this nature by 60%.

Procter & Gamble using Big Data to design new products and create global marketing campaigns. P&G has created dedicated Business Spheres offices where information can be viewed in real time.
Thus, the company’s management had the opportunity to instantly test hypotheses and conduct experiments. P&G believes that Big Data helps in forecasting company performance.

Office supplies retailer OfficeMax Using Big Data technologies, they analyze customer behavior. Big Data analysis made it possible to increase B2B revenue by 13% and reduce costs by $400,000 per year.

According to Caterpillar , its distributors miss out on $9 to $18 billion in profits each year simply because they do not implement Big Data processing technologies. Big Data would allow customers to manage their fleet more efficiently by analyzing information coming from sensors installed on the machines.

Today it is already possible to analyze the condition of key components, their degree of wear, and manage fuel and maintenance costs.

Luxottica group is a manufacturer of sports glasses, such brands as Ray-Ban, Persol and Oakley. The company uses Big Data technologies to analyze behavior potential clients and “smart” SMS marketing. As a result of Big Data, Luxottica group identified more than 100 million of its most valuable customers and increased the effectiveness of its marketing campaign by 10%.

WITH using Yandex Data Factory game developers World of Tanks analyze the behavior of players. Big Data technologies made it possible to analyze the behavior of 100 thousand World of Tanks players using more than 100 parameters (information about purchases, games, experience, etc.). As a result of the analysis, a forecast of user outflow was obtained. This information allows you to reduce user departure and work with game participants in a targeted manner. The developed model turned out to be 20-30% more effective than standard gaming industry analysis tools.

German Ministry of Labor uses Big Data in work related to the analysis of incoming applications for unemployment benefits. So, after analyzing the information, it became clear that 20% of benefits were paid undeservedly. With the help of Big Data, the Ministry of Labor reduced costs by 10 billion euros.

Toronto Children's Hospital implemented the Project Artemis project. This is an information system that collects and analyzes data on babies in real time. The system monitors 1260 indicators of each child’s condition every second. Project Artemis makes it possible to predict the unstable condition of a child and begin the prevention of diseases in children.

OVERVIEW OF THE WORLD BIG DATA MARKET

Current state of the world market
In 2014, Big Data, according to the Data Collective, became one of the priority areas for investment in the venture capital industry. According to the data information portal Computerra, this is due to the fact that developments from this direction have begun to produce significant results for their users. Over the past year, the number of companies with implemented projects in the field of big data management has increased by 125%, and the market volume has grown by 45% compared to 2013.

The majority of Big Data market revenue, according to Wikibon, in 2014 was made up of services, their share was equal to 40% of total revenue (see chart below):

If we consider Big Data for 2014 by subtype, the market will look like this:

According to Wikibon, applications and analytics accounted for 36% of Big Data revenue in 2014 from Big Data applications and analytics, 17% from computing equipment and 15% from data storage technologies. The least amount of revenue was generated by NoSQL technologies, infrastructure equipment and network provision for companies (corporate networks).

The most popular Big Data technologies are the in-memory platforms of SAP, HANA, Oracle, etc. The results of the T-Systems survey showed that they were chosen by 30% of the companies surveyed. The second most popular were NoSQL platforms (18% of users), companies also used analytical platforms from Splunk and Dell, they were chosen by 15% of companies. According to the survey results, Hadoop/MapReduce products turned out to be the least useful for solving Big Data problems.

According to an Accenture survey, in more than 50% of companies using Big Data technologies, Big Data costs range from 21% to 30%.
According to the following Accenture analysis, 76% of companies believe that these costs will increase in 2015, and 24% of companies will not change their budget for Big Data technologies. This suggests that in these companies Big Data has become an established area of ​​IT, which has become an integral part of the company’s development.

The results of the Economist Intelligence Unit survey confirm the positive effect of implementing Big Data. 46% of companies say that using Big Data technologies they have improved customer service by more than 10%, 33% of companies have optimized inventory and improved the productivity of fixed assets, and 32% of companies have improved planning processes.

Big Data in different countries of the world
Today, Big Data technologies are most often implemented in US companies, but other countries around the world have already begun to show interest. In 2014, according to IDC, countries in Europe, the Middle East, Asia (excluding Japan) and Africa accounted for 45% of the market for software, services and equipment in the field of Big Data.

Also, according to the CIO survey, companies from the Asia-Pacific region are rapidly adopting new solutions in the field of Big Data analysis, secure storage and cloud technologies. Latin America is in second place in terms of the number of investments in the development of Big Data technologies, ahead of European countries and the USA.
Next, a description and forecasts for the development of the Big Data market in several countries will be presented.

China
The volume of information in China is 909 exabytes, which is equal to 10% of the total volume of information in the world, by 2020 the volume of information will reach 8060 exabytes, the share of information in global statistics will also increase, in 5 years it will be equal to 18%. The potential growth of China's Big Data has one of the fastest growing dynamics.

Brazil
At the end of 2014, Brazil accumulated information worth 212 exabytes, which is 3% of the global volume. By 2020, the volume of information will grow to 1600 exabytes, which will account for 4% of the world's information.

India
According to EMC, the volume of accumulated data in India at the end of 2014 is 326 exabytes, which is 5% of the total volume of information. By 2020, the volume of information will grow to 2800 exabytes, which will account for 6% of the world's information.

Japan
The volume of accumulated data in Japan at the end of 2014 is 495 exabytes, which is 8% of the total volume of information. By 2020, the volume of information will grow to 2,200 exabytes, but Japan’s market share will decrease and amount to 5% of the total volume of information in the whole world.
Thus, the Japanese market size will decrease by more than 30%.

Germany
According to EMC, the volume of accumulated data in Germany at the end of 2014 is 230 exabytes, which is 4% of the total volume of information in the world. By 2020, the volume of information will grow to 1100 exabytes and amount to 2%.
In the German market, a large share of revenue, according to Experton Group forecasts, will be generated by the services segment, the share of which in 2015 will be 54%, and in 2019 will increase to 59%, shares software and equipment, on the contrary, will decrease.

Overall, the market size will grow from 1.345 billion euros in 2015 to 3.198 billion euros in 2019, an average growth rate of 24%.
Thus, based on the analytics of CIO and EMC, we can conclude that the developing countries of the world in the coming years will become markets for the active development of Big Data technologies.

Main market trends
According to IDG Enterprise, in 2015, companies' spending on Big Data will average $7.4 million per company, large companies intend to spend approximately $13.8 million, small and medium-sized companies - $1.6 million .
Most of the investment will be in areas such as data analysis, visualization and data collection.
Based on current trends and market demand, investments in 2015 will be used to improve data quality, improve planning and forecasting, and increase data processing speed.
Companies in the financial sector, according to Bain Company’s Insights Analysis, will make significant investments, so in 2015 they plan to spend $6.4 billion on Big Data technologies, the average growth rate of investments will be 22% until 2020. Internet companies plan to spend $2.8 billion, with an average growth rate of 26% for Big Data spending.
When conducting the Economist Intelligence Unit survey, priority areas for Big Data development in 2014 and in the next 3 years were identified, the distribution of responses is as follows:

According to IDC forecasts, market development trends are as follows:

  • In the next 5 years, costs for cloud solutions in the field of Big Data technologies will grow 3 times faster than costs for local solutions. Hybrid platforms for data storage will become in demand.
  • The growth of applications using sophisticated and predictive analytics, including machine learning, will accelerate in 2015, with the market for such applications growing 65% faster than applications that do not use predictive analytics.
  • Media analytics will triple in 2015 and will become a key driver of growth in the Big Data technology market.
  • The trend of introducing solutions for analyzing the constant flow of information that is applicable to the Internet of Things will accelerate.
  • By 2018, 50% of users will interact with services based on cognitive computing.
Market Drivers and Limiters
IDC experts identified 3 drivers of the Big Data market in 2015:

According to an Accenture survey, data security issues are now the main barrier to the implementation of Big Data technologies, with more than 51% of respondents confirming that they are worried about ensuring data protection and confidentiality. 47% of companies reported the impossibility of implementing Big Data due to limited budgets, 41% of companies indicated a lack of qualified personnel as a problem.

Wikibon predicts that the Big Data market will grow to $38.4 billion in 2015, up 36% year-on-year. In the coming years, there will be a decline in growth rates to 10% in 2017. Taking into account these forecasts, the market size in 2020 will be equal to 68.7 billion US dollars.

The distribution of the global Big Data market by business category will look like this:

As can be seen from the diagram, the majority of the market will be occupied by technologies in the field of improving customer service. Targeted marketing will be the second priority for companies until 2019; in 2020, according to Heavy Reading, it will give way to solutions to improve operational efficiency.
The segment “improving customer service” will also have the highest growth rate, with an increase of 49% annually.
The market forecast for Big Data subtypes will look like this:

The predominant market share, as can be seen from the diagram, is occupied by professional services, the highest growth rate will be in applications with analytics, their share will increase from the current 12% to 18% in 2020 and the volume of this segment will be equal to 12.3 billion US dollars. the share of computing equipment, on the contrary, will fall from 20% to 14% and amount to about 9.3 billion US dollars in 2020, the market for cloud technologies will gradually increase and in 2020 will reach 6.3 billion US dollars, the market share of solutions for data storage, on the contrary, will decrease from 15% in 2014 to 13% in 2020 and in monetary terms will be equal to $8.9 billion.
According to Bain & Company’s Insights Analysis forecast, the distribution of the Big Data market by industry in 2020 will be as follows:

  • The financial industry will spend $6.4 billion on Big Data with an average growth rate of 22% per year;
  • Internet companies will spend $2.8 billion and the average cost growth rate will be 26% over the next 5 years;
  • Public sector costs will be commensurate with the costs of Internet companies, but the growth rate will be lower - 22%;
  • The telecommunications sector will grow at a CAGR of 40% to reach US$1.2 billion in 2020;

Energy companies will invest a relatively small amount in these technologies - $800 million, but the growth rate will be one of the highest - 54% annually.
Thus, the largest share of the Big Data market in 2020 will be occupied by companies in the financial industry, and the fastest growing sector will be energy.
Following analysts' forecasts, the total market size will increase in the coming years. Market growth will be achieved through the implementation of Big Data technologies in developing countries of the world, as can be seen from the graph below.

The projected market size will depend on how developing countries perceive Big Data technologies and whether they will be as popular as in developed countries. In 2014, developing countries of the world accounted for 40% of the volume of accumulated information. According to EMC's forecast, the current market structure, with a predominance of developed countries, will change in 2017. According to EMC analytics, in 2020 the share of developing countries will be more than 60%.
According to Cisco and EMC, developing countries around the world will work quite actively with Big Data, largely due to the availability of technology and the accumulation of a sufficient amount of information to the Big Data level. The world map presented on the next page will show the forecast for the increase in volume and growth rate of Big Data by region.

ANALYSIS OF THE RUSSIAN MARKET

Current state Russian market

According to the results of a study by CNews Analytics and Oracle, the level of maturity of the Russian Big Data market has increased over the past year. Respondents, representing 108 large enterprises from various industries, demonstrated a higher degree of awareness of these technologies, as well as an established understanding of the potential of such solutions for their business.
As of 2014, according to IDC, Russia has accumulated 155 exabytes of information, which is only 1.8% of the world's data. The volume of information by 2020 will reach 980 exabytes and occupy 2.2%. Thus, the average growth rate of information volume will be 36% per year.
IDC estimates the Russian market at $340 million, of which $100 million are SAP solutions, approximately $240 million are similar solutions from Oracle, IBM, SAS, Microsoft, etc.
The growth rate of the Russian Big Data market is no less than 50% per year.
It is predicted that positive dynamics will continue in this sector of the Russian IT market, even in conditions of general economic stagnation. This is due to the fact that businesses continue to demand solutions that improve operational efficiency, as well as optimize costs, improve forecasting accuracy and minimize possible company risks.
The main service providers in the field of Big Data on the Russian market are:
  • Oracle
  • Microsoft
  • Cloudera
  • Hortonworks
  • Teradata.
Market overview by industry and experience in using Big Data in companies
According to CNews, in Russia only 10% of companies have begun to use Big Data technologies, when in the world the share of such companies is about 30%. Readiness for Big Data projects is growing in many sectors of the Russian economy, according to a report from CNews Analytics and Oracle. More than a third of the surveyed companies (37%) have started working with Big Data technologies, of which 20% are already using such solutions, and 17% are starting to experiment with them. The second third of respondents in currently are considering this possibility.

In Russia, Big Data technologies are most popular in the banking and telecom sectors, but they are also in demand in the mining industry, energy, retail, logistics companies and the public sector.
Next, examples of the use of Big Data in Russian realities will be considered.

Telecom
Telecom operators have some of the most voluminous databases, which allows them to conduct the most in-depth analysis of accumulated information.
One of the areas of application of Big Data technology is subscriber loyalty management.
The main purpose of data analysis is to retain existing customers and attract new ones. To do this, companies segment customers, analyze their traffic, and determine the social affiliation of the subscriber. In addition to using information for marketing purposes, telecom technologies are used to prevent fraudulent financial transactions.
One of bright examples of this industry is VimpelCom. The company uses Big Data to improve the quality of service at the level of each subscriber, compile reports, analyze data for network development, combat spam and personalize services.

Banks
A significant proportion of Big Data users are specialists from the financial industry. One of the successful experiments was carried out at the Ural Bank for Reconstruction and Development, where the information base began to be used to analyze clients, the bank began to offer specialized loan offers, deposits and other services. Within a year of using these technologies, the company's retail loan portfolio grew by 55%.
Alfa-Bank analyzes information from social networks, processes loan applications, and analyzes the behavior of users of the company’s website.
Sberbank also began processing a massive amount of data to segment clients, prevent fraudulent activities, cross-sell, and manage risks. In the future, it is planned to improve the service and analyze customer actions in real time.
The All-Russian Regional Development Bank analyzes the behavior of owners plastic cards. This makes it possible to identify transactions that are atypical for a particular client, thereby increasing the likelihood of detecting theft of funds from plastic cards.

Retail
In Russia, Big Data technologies have been implemented by both online and offline trading companies. Today, according to CNews Analytics, Big Data is used by 20% of retailers. 75% of retail professionals consider Big Data necessary for the development of a competitive company promotion strategy. According to Hadoop statistics, after the implementation of Big Data technology, profits in trading organizations increase by 7-10%.
M.Video specialists talk about improved logistics planning after the implementation of SAP HANA; also, as a result of its implementation, the preparation of annual reports was reduced from 10 days to 3, the speed of daily data loading was reduced from 3 hours to 30 minutes.
Wikimart uses these technologies to generate recommendations for site visitors.
One of the first offline stores to introduce Big Data analysis in Russia was Lenta. With the help of Big Data, retail began to study information about customers from cash register receipts. The retailer collects information to create behavioral models, which makes it possible to make more informed decisions at the operational and commercial level.

Oil and gas industry
In this industry, the scope of Big Data is quite wide. Big Data technologies can be used in the extraction of minerals from the subsoil. With their help, you can analyze the extraction process itself and the most effective ways its extraction, monitoring the drilling process, analyzing the quality of raw materials, as well as processing and marketing of final products. In Russia, Transneft and Rosneft have already begun to use these technologies.

Government bodies
In countries such as Germany, Australia, Spain, Japan, Brazil and Pakistan, Big Data technologies are used to solve national issues. These technologies help government authorities more effectively provide services to the population and provide targeted social support.
In Russia, these technologies began to be mastered by such government bodies as the Pension Fund, the Federal Tax Service and the Compulsory Medical Insurance Fund. The potential for implementing projects using Big Data is great; these technologies could help improve the quality of services, and, as a result, the standard of living of the population.

Logistics and transport
Big Data can also be used by transport companies. Using Big Data technologies, you can track your car fleet, take into account fuel costs, and monitor customer requests.
Russian Railways implemented Big Data technologies together with SAP. These technologies helped reduce the reporting preparation time by 43.5 times (from 14.5 hours to 20 minutes), and increase the accuracy of cost distribution by 40 times. Big Data was also introduced into planning and tariff regulation processes. In total, the companies use more than 300 systems based on SAP solutions, 4 data centers are involved, and the number of users is 220,000.

Main drivers and limiters of the market
The drivers for the development of Big Data technologies in the Russian market are:
  • Increased interest on the part of users in the capabilities of Big Data as a way to increase the competitiveness of a company;
  • Development of methods for processing media files at a global level;
  • Transfer of servers processing personal information to the territory of Russia, in accordance with the adopted law on the storage and processing of personal data;
  • Implementation of the industry plan for import substitution of software. This plan includes government support for domestic software manufacturers, as well as the provision of preferences for domestic IT products when purchasing at public expense.
  • In the new economic situation, when the dollar exchange rate has almost doubled, there will be a trend towards an increasing use of the services of Russian cloud service providers rather than foreign ones.
  • Creation of technology parks that contribute to the development of the information technology market, including the Big Data market;
  • State program for the implementation of grid systems based on Big Data technologies.

The main barriers to the development of Big Data in the Russian market are:

  • Ensuring data security and confidentiality;
  • Lack of qualified personnel;
  • Insufficient accumulated information resources to the Big Data level in most Russian companies;
  • Difficulties in introducing new technologies into established information systems of companies;
  • The high cost of Big Data technologies, which leads to a limited number of enterprises that have the opportunity to implement these technologies;
  • Political and economic uncertainty, which led to the outflow of capital and the freezing of investment projects in Russia;
  • Rising prices for imported products and a surge in inflation, according to IDC, are slowing down the development of the entire IT market.
Russian market forecast
As of today, the Russian Big Data market is not as popular as in developed countries. Most Russian companies show interest in it, but do not dare to take advantage of their opportunities.
Examples of large companies that have already benefited from the use of Big Data technologies are increasing awareness of the capabilities of these technologies.
Analysts also have quite optimistic forecasts regarding the Russian market. IDC believes that the Russian market share will increase over the next 5 years, unlike the German and Japanese markets.
By 2020, the volume of Big Data in Russia will grow from the current 1.8% to 2.2% of the global data volume. The amount of information will grow, according to EMC, from the current 155 exabytes to 980 exabytes in 2020.
At the moment, Russia continues to accumulate the volume of information to the level of Big Data.
According to a CNews Analytics survey, 44% of surveyed companies work with data of no more than 100 terabytes* and only 13% work with volumes above 500 terabytes.

Nevertheless, the Russian market, following global trends, will increase. As of 2014, IDC estimates the market size at $340 million.
The market growth rate in previous years was 50% per year, if it remains at the same level, then in 2018 the market volume will reach 1.7 billion US dollars. The share of the Russian market in the world market will be about 3%, increasing from the current 1.2%.

The most receptive industries to the use of Big Data in Russia include:

  • Retail and banks, for them, analysis of the customer base and assessment of the effect of marketing campaigns are primarily important;
  • Telecom – customer base segmentation and traffic monetization;
  • Public sector – reporting, analysis of applications from the public, etc.;
  • Oil companies – monitoring of work and planning of production and sales;
  • Energy companies – creation of intelligent electric power systems, operational monitoring and forecasting.
In developed countries, Big Data has become widespread in the fields of healthcare, insurance, metallurgy, Internet companies and manufacturing enterprises; most likely, in the near future, Russian companies from these areas will also appreciate the effect of introducing Big Data and will adapt these technologies in their industries.
In Russia, as well as in the world, in the near future there will be a trend towards data visualization, analysis of media files and the development of the Internet of things.
Despite the general stagnation of the economy, in the coming years, analysts predict further growth of the Big Data market, primarily due to the fact that the use of Big Data technologies gives its users a competitive advantage in terms of increasing the operational efficiency of the business, attracting additional flow of customers, minimizing risks and implementation of data forecasting technologies.
Thus, we can conclude that the Big Data segment in Russia is at the formation stage, but the demand for these technologies is increasing every year.

Main results of the market analysis

World market
At the end of 2014, the Big Data market is characterized by the following parameters:
  • market volume amounted to 28.5 billion US dollars, an increase of 45% compared to the previous year;
  • the majority of Big Data market revenue came from services, their share was equal to 40% of total revenue;
  • 36% of revenue came from Big Data applications and analytics, 17% from computing equipment and 15% from data storage technologies;
  • The most popular for solving Big Data problems are in-memory platforms from companies such as SAP, HANA and Oracle.
  • the number of companies with implemented projects in the field of Big Data management increased by 125%;
The market forecast for the next years is as follows:
  • in 2015 the market volume will reach 38.4 billion US dollars, in 2020 – 68.7 billion US dollars;
  • the average growth rate will be 16% annually;
  • average company costs for Big Data technologies will be $13.8 million for large companies and $1.6 million for small and medium-sized businesses;
  • technologies will be most widespread in the areas of customer service and targeted marketing;
  • In 2017, the global market structure will change towards the predominance of user companies from developing countries.
Russian market
The Russian Big Data market is at the stage of formation, the results of 2014 are as follows:
  • market volume reached USD 340 million;
  • the average market growth rate in previous years was 50% annually;
  • the total volume of accumulated information was 155 exabytes;
  • 10% of Russian companies began to use Big Data technologies;
  • Big Data technologies were more popular in the banking sector, telecoms, Internet companies and retail.
The Russian market forecast for the coming years is as follows:
  • the volume of the Russian market in 2015 will reach 500 million US dollars, and in 2018 – 1.7 billion US dollars;
  • the share of the Russian market in the global market will be about 3% in 2018;
  • the amount of accumulated data in 2020 will be 980 exabytes;
  • data volume will grow to 2.2% of global data volume in 2020;
  • Technologies for data visualization, media file analysis and the Internet of Things will become most popular.
Based on the results of the analysis, we can conclude that the Big Data market is still in the early stages of development, and in the near future we will see its growth and expansion of the capabilities of these technologies.

Thank you for taking the time to read this voluminous work, subscribe to our blog - we promise many new interesting publications!

What's happened Big Data(literally - big data)? Let's look first at the Oxford Dictionary:

Data- quantities, signs or symbols that a computer operates and that can be stored and transmitted in the form of electrical signals, recorded on magnetic, optical or mechanical media.

Term Big Data used to describe a large data set that grows exponentially over time. To process such a quantity of data, you cannot do without.

The benefits that Big Data provides:

  1. Collecting data from various sources.
  2. Improving business processes through real-time analytics.
  3. Storing huge amounts of data.
  4. Insights. Big Data is more insightful hidden information using structured and semi-structured data.
  5. Big data helps you reduce risk and make smart decisions with the right risk analytics

Big Data Examples

New York Stock Exchange generates daily 1 terabyte trading data for the past session.

Social media: statistics show that the database Facebook data uploaded daily 500 terabytes new data is generated mainly due to uploading photos and videos to social network servers, messaging, comments under posts, and so on.

Jet engine generates 10 terabytes data every 30 minutes during the flight. Since thousands of flights are made every day, the volume of data reaches petabytes.

Big Data classification

Big data forms:

  • Structured
  • Unstructured
  • Semi-structured

Structured form

Data that can be stored, accessed and processed in a form with a fixed format is called structured. Behind long time Computer science has made great strides in improving techniques for working with this type of data (where the format is known in advance) and has learned to benefit. However, today there are already problems associated with the growth of volumes to sizes measured in the range of several zettabytes.

1 zettabyte equals a billion terabytes

Looking at these numbers, it is easy to see the veracity of the term Big Data and the difficulties associated with processing and storing such data.

Data stored in a relational database is structured and looks like, for example, tables of company employees

Unstructured form

Data of unknown structure is classified as unstructured. In addition to its large size, this shape is characterized by a number of difficulties in processing and extracting useful information. A typical example of unstructured data is a heterogeneous source containing a combination of simple text files, images and videos. Today, organizations have access to large amounts of raw or unstructured data, but do not know how to extract value from it.

Semi-structured form

This category contains both of those described above, so semi-structured data has some form, but is not actually defined using tables in relational databases. An example of this category is personal data provided in XML file.

Prashant RaoMale35 Seema R.Female41 Satish ManeMale29 Subrato RoyMale26 Jeremiah J.Male35

Characteristics of Big Data

Big Data growth over time:

Blue color represents structured data (Enterprise data), which is stored in relational databases. Other colors indicate unstructured data from various sources (IP telephony, devices and sensors, social networks and web applications).

According to Gartner, big data varies in volume, rate of generation, variety, and variability. Let's take a closer look at these characteristics.

  1. Volume. The term Big Data itself is associated with large size. Data size is a critical metric in determining the potential value to be extracted. Every day, 6 million people use digital media, generating an estimated 2.5 quintillion bytes of data. Therefore, volume is the first characteristic to consider.
  2. Diversity- the next aspect. It refers to heterogeneous sources and the nature of data, which can be either structured or unstructured. Previously, spreadsheets and databases were the only sources of information considered in most applications. Today, data is in the form of emails, photos, videos, PDF files, audio is also considered in analytical applications. This variety of unstructured data leads to problems in storage, mining and analysis: 27% of companies are not confident that they are working with the right data.
  3. Generation speed. How quickly data is accumulated and processed to meet requirements determines potential. Speed ​​determines the speed of information flow from sources - business processes, application logs, social networking and media sites, sensors, mobile devices. The flow of data is huge and continuous over time.
  4. Variability describes the variability of data at some points in time, which complicates processing and management. For example, most data is unstructured in nature.

Big Data analytics: what are the benefits of big data

Promotion of goods and services: Access to data from search engines and sites like Facebook and Twitter allows businesses to more accurately develop marketing strategies.

Improving service for customers: traditional systems feedback with customers are being replaced with new ones, in which Big Data and natural language processing are used to read and evaluate customer feedback.

Risk calculation associated with the release of a new product or service.

Operational efficiency: big data is structured in order to quickly extract the necessary information and quickly produce accurate results. This combination of Big Data and storage technologies helps organizations optimize their work with rarely used information.