Data analysis in business analytics. Business Intelligence in the Age of Big Data

Small businesses in the CIS countries do not yet use data analysis for business development, determining correlations, searching for hidden patterns: entrepreneurs make do with reports from marketers and accountants. The leaders of small and partially medium-sized enterprises rely more on their intuition than on analysis. But at the same time, analytics has a huge potential: it helps to reduce costs and increase profits, make decisions faster and more objectively, optimize processes, better understand customers and improve the product.

An accountant will not replace an analyst

Small business executives often assume that marketing and accountant reports are a fairly accurate representation of a company's performance. But on the basis of dry statistics, it is very difficult to make a decision, and an error in calculations without specialized education is inevitable.

Case 1. Post-analysis of promotional campaigns. By the New Year, the entrepreneur announced a promotion in which certain goods were offered at a discount. After evaluating the revenue for the New Year period, he saw how sales increased, and was delighted with his resourcefulness. But let's take into account all the factors:

  • Sales are especially strong on Friday, the day when revenue is maximum - this is a weekly trend.
  • Compared to the increase in sales that usually occurs around the New Year, the gain is not so great.
  • If you filter out promotional items, it turns out that sales figures have deteriorated.

Case 2. Study of turnover. The women's clothing store has difficulties with logistics: the goods in some warehouses are in short supply, and in others they lie for months. How to determine, without analyzing sales, how many trousers to bring to one region, and how many coats to send to another, while getting the maximum profit? To do this, you need to calculate the turnover, the ratio of the speed of sales and the average inventory for a certain period. To put it simply, the turnover is an indicator of how many days the store will sell the goods, how quickly the average stock is sold, how quickly the goods pay off. It is economically unprofitable to store large reserves, as this freezes capital and slows down development. If the stock is reduced, there may be a shortage, and the company will again lose profits. Where to find the golden mean, the ratio at which the product does not stagnate in the warehouse, and at the same time you can give a certain guarantee that the customer will find the right unit in the store? To do this, the analyst should help you determine:

  • desired turnover,
  • turnover dynamics.

When settling with suppliers with a delay, you must also calculate the ratio of the credit line and turnover. Turnover in days = Average inventory * number of days / Turnover for this period.

Calculation of assortment balances and total turnover in stores helps to understand where it is necessary to move part of the goods. It is also worth calculating what turnover each unit of the assortment has in order to make a decision: markdown with reduced demand, re-order with increased demand, relocation to another warehouse. By category, you can develop a report on turnover in this form. It can be seen that T-shirts and jumpers are sold faster, but coats are sold for a long time. Can an ordinary accountant do this job? We doubt. At the same time, regular calculation of turnover and application of the results can increase profits by 8-10%.

In what areas is data analysis applicable?

  1. Sales. It is important to understand why sales are going well (or badly), what are the dynamics. To solve this problem, it is necessary to investigate the factors influencing profit and revenue - for example, to analyze the length of the receipt and revenue per customer. Such factors can be investigated by groups of goods, seasons, stores. You can identify sales peaks and pits by analyzing returns, cancellations, and other transactions.
  2. Finance. Monitoring of indicators is necessary for any financier to monitor cash flow and distribute assets across various business areas. This helps to evaluate the effectiveness of taxation and other parameters.
  3. Marketing. Any marketing company needs forecasts and post-analysis of stocks. At the stage of developing the idea, it is necessary to determine the groups of goods (control and target) for which we are creating an offer. This is also a job for a data analyst, since an ordinary marketer does not have the necessary tools and skills for good analysis. For example, if for the control group the amount of revenue and the number of customers are the same as the target group, the promotion did not work. To determine this, interval analysis is needed.
  4. Control. Having leadership qualities is not enough for a company leader. In any case, quantitative assessments of the work of personnel are necessary for the competent management of the enterprise. It is important to understand the effectiveness of wage fund management, the ratio of wages and sales, as well as the efficiency of processes - for example, the workload of cash desks or the employment of loaders during the day. This helps to properly distribute working hours.
  5. Web analysis. The site needs to be properly promoted so that it becomes a sales channel, and this requires the right promotion strategy. This is where web analysis can help you. How to apply it? To study the behavior, age, gender and other characteristics of customers, activity on certain pages, clicks, traffic channel, mailing performance, etc. This will help improve the business and website.
  6. Assortment management. ABC analysis is essential for assortment management. The analyst must distribute the product by characteristics in order to conduct this type of analysis and understand which product is the most profitable, which is the basis, and which should be discarded. To understand the stability of sales, it is good to conduct an XYZ analysis.
  7. Logistics. More understanding about procurement, goods, their storage and availability will be given by the study of logistics indicators. Losses and needs of goods, inventory is also important to understand for successful business management.

These examples show how powerful data analysis is, even for small businesses. An experienced director will increase the company's profits and benefit from the most insignificant data, using data analysis correctly, and visual reports will greatly simplify the work of a manager.

The main goal of any data analysis is to search and discover patterns in the amount of data. In business analysis, this goal becomes even broader. It is important for any leader not only to identify patterns, but also to find their cause. Knowing the cause will allow in the future to influence the business and makes it possible to predict the results of a particular action.

Goals of data analysis for the company

If we talk about business, then the goal of each company is to win the competition. So data analysis is your main advantage. He will help you:

  • Reduce company costs
  • Increase revenue
  • Reduce the time to complete business processes (find out the weak point and optimize it)
  • Increase the effectiveness of the company's business processes
  • Fulfill any other goals aimed at improving the efficiency and effectiveness of the company.

So, victory over competitors is in your hands. Don't rely on intuition. Analyze!

Data analysis goals for departments, divisions, products

Oddly enough, but the goals listed above are completely suitable for analyzing the activities of departments, product analysis or an advertising campaign.

The goal of any data analysis at any level is to identify a pattern and use this knowledge to improve the quality of a product or the work of a company or department.

Who needs data analysis?

Everyone. Indeed, any company, from any field of activity, any department and any product!

In what areas can data analysis be applied?

  • Manufacturing (construction, oil and gas, metallurgy, etc.)
  • Retail
  • E-commerce
  • Services
  • And many others

Which departments can be analyzed within the company?

  • Accounting and finance
  • Marketing
  • Advertising
  • Administration
  • And others.

Indeed, companies from any field, any departments within the company, any areas of activity can, should and it is important to analyze.

How BI analysis systems can help

BI analysis systems, automated analytics systems, big data for big data analysis are software solutions that already have built-in functionality for processing data, preparing it for analysis, analysis itself and, most importantly, for visualizing analysis results.

Not every company has an analyst department, or at least a developer who will maintain the analytical system and databases. In this case, such BI-analysis systems come to the rescue.

There are more than 300 solutions on the market today. Our company settled on the Tableau solution:

  • In 2018, Tableau became the leader of Gartner's research among BI solutions for the 6th time.
  • Tableau is easy to learn (and our workshops prove it)
  • No developer knowledge or statistics required to get started with Tableau

At the same time, companies that already work with Tableau say that reports that used to be collected in Excel in 6-8 hours now take no more than 15 minutes.

Don't believe? Try it yourself - download the trial version of Tableau and get tutorials on working with the program:

Download Tableau

Download the full version of Tableau Desktop for FREE, 14 days and get Tableau business intelligence training materials as a GIFT

Affordable work with Big Data using visual analytics

Improve business intelligence and solve routine tasks using the information hidden in Big Data using the TIBCO Spotfire platform. It is the only platform that provides business users with an intuitive, user-friendly user interface that allows them to use the full range of Big Data analytics technologies without the need for IT professionals or special education.

The Spotfire interface makes it equally convenient to work with both small data sets and multi-terabyte clusters of big data: sensor readings, information from social networks, points of sale or geolocation sources. Users of all skill levels easily access rich dashboards and analytical workflows simply by using visualizations, which are graphical representations of the aggregation of billions of data points.

Predictive analytics is learning by doing based on shared company experience to make better informed decisions. Using Spotfire Predictive Analytics, you can discover new market trends from your business intelligence insights and take action to mitigate risk to improve management decisions.

Overview

Connecting to Big Data for High-Performance Analytics

Spotfire offers three main types of analytics with seamless integration with Hadoop and other large data sources:

  1. Data visualization on demand (On-Demand Analytics): built-in, user-configurable data connectors that simplify super-fast, interactive data visualization
  2. Analysis in the database (In-Database Analytics): integration with the distributed computing platform, which allows you to make data calculations of any complexity based on big data.
  3. In-Memory Analytics: Integration with a statistical analysis platform that pulls data directly from any data source, including traditional and new data sources.

Together, these integration methods represent a powerful combination of visual exploration and advanced analytics.
It allows business users to access, combine and analyze data from any data source with powerful, easy-to-use dashboards and workflows.

Big data connectors

Spotfire Big Data Connectors support all types of data access: In-datasource, In-memory and On-demand. Built-in Spotfire data connectors include:

  • Certified Hadoop Data Connectors for Apache Hive, Apache Spark SQL, Cloudera Hive, Cloudera Impala, Databricks Cloud, Hortonworks, MapR Drill and Pivotal HAWQ
  • Other certified big data connectors include Teradata, Teradata Aster and Netezza
  • Connectors for historical and current data from sources such as OSI PI touch sensors

In-datasource distributed computing

In addition to Spotfire's handy visual selection of operations for SQL queries that access data distributed across data sources, Spotfire can create statistical and machine learning algorithms that operate within data sources and return only the results needed to create visualizations in the Spotfire system.

  • Users work with dashboards with visual selection functionality that access scripts using the built-in features of the TERR language,
  • TERR scripts invoke distributed computing functionality in conjunction with Map/Reduce, H2O, SparkR, or Fuzzy Logix,
  • These applications in turn access high performance systems like Hadoop or other data sources.
  • TERR can be deployed as an advanced analytics engine on Hadoop nodes that are managed with MapReduce or Spark. The TERR language can also be used for Teradata data nodes.
  • The results are visualized on Spotfire.

TERR for advanced analytics

TIBCO Enterprise Runtime for R (TERR) – TERR is an enterprise-level statistical package that has been developed by TIBCO to be fully compatible with the R language, building on the company's years of experience in the S+-related analytics system. This allows customers to continue developing applications and models not only using open source R, but also to integrate and deploy their R code on a commercially reliable platform without having to rewrite their code. TERR is more efficient, has better memory management, and provides faster data processing speeds over large volumes than the open source R language.

Combining all functionality

The combination of the aforementioned powerful functionality means that even for the most complex tasks that require high-level analytics, users interact with simple and easy-to-use interactive workflows. This allows business users to visualize and analyze data, and share analytics results, without having to know the details of the data architecture that underpins business intelligence.

Example: Spotfire interface for configuring, running and visualizing the results of a model that characterizes lost cargo. Through this interface, business users can perform calculations using TERR and H2O (a distributed computing framework) on transaction and shipment data stored in Hadoop clusters.

Analytical space for big data


Advanced and predictive analytics

Users use Spotfire's visual selection dashboards to launch a rich set of advanced features that make it easy to make predictions, build models, and optimize them on the fly. Using big data, analysis can be done inside the data source (In-Datasource), returning only the aggregated information and results needed to create visualizations on the Spotfire platform.


Machine learning

A wide range of machine learning tools are available in Spotfire's list of built-in features that can be used with a single click. Statisticians have access to the program code written in the R language and can extend the functionality used. Machine learning functionality can be shared with other users for easy reuse.

The following machine learning methods are available for continuous categorical variables on Spotfire and on TERR:

  • Linear and logistic regression
  • Decision trees, Random forest algorithm, Gradient boosting machines (GBM)
  • Generalized linear (additive) models ( Generalized Additive Models)
  • Neural networks


Content analysis

Spotfire provides analytics and data visualization, much of which has not been used before - this is unstructured text that is stored in sources such as documents, reports, CRM system notes, site logs, social media posts and much more.


Location analytics

High resolution layered maps are a great way to visualize big data. Spotfire's rich map functionality allows you to create maps with as many reference and functional layers as you need. Spotfire also gives you the ability to use sophisticated analytics while working with maps. In addition to geographical maps, the system creates maps to visualize user behavior, warehouses, production, raw materials and many other indicators.

So much and so much has been said about the analysis of information lately that one can completely get confused in the problem. It's good that so many people pay attention to such a hot topic. The only bad thing is that under this term everyone understands what he needs, often without having a general picture of the problem. Fragmentation in this approach is the reason for the misunderstanding of what is happening and what to do. Everything consists of pieces that are loosely interconnected and do not have a common core. Surely, you often heard the phrase "patchwork automation". Many people have encountered this problem many times before and can confirm that the main problem with this approach is that it is almost never possible to see the big picture. The situation is similar with analysis.

In order to understand the place and purpose of each analysis mechanism, let's look at it all in its entirety. It will be based on how a person makes decisions, since we are not able to explain how a thought is born, we will concentrate on how information technologies can be used in this process. The first option - the decision maker (DM), uses the computer only as a means of extracting data, and draws conclusions on his own. To solve such problems, reporting systems, multidimensional data analysis, charts and other visualization methods are used. The second option: the program not only extracts data, but also performs various kinds of pre-processing, for example, cleaning, smoothing, and so on. And to the data processed in this way, it applies mathematical methods of analysis - clustering, classification, regression, etc. In this case, the decision maker receives not raw, but heavily processed data, i.e. a person is already working with models prepared by a computer.

Due to the fact that in the first case, almost everything related to the decision-making mechanisms is assigned to a person, the problem with the selection of an adequate model and the choice of processing methods is taken out of the analysis mechanisms, i.e., the basis for decision-making is either an instruction (for example, how to implement mechanisms for responding to deviations), or intuition. In some cases, this is quite enough, but if the decision maker is interested in knowledge that is deep enough, so to speak, then simply data extraction mechanisms will not help here. More serious processing is needed. This is the second case. All the pre-processing and analysis mechanisms used allow decision makers to work at a higher level. The first option is suitable for solving tactical and operational problems, and the second is for replicating knowledge and solving strategic problems.

The ideal case would be to be able to apply both approaches to analysis. They allow to cover almost all needs of the organization in the analysis of business information. By varying the methods depending on the tasks, we will be able to squeeze the maximum out of the available information in any case.

The general scheme of work is shown below.

Often, when describing a product that analyzes business information, terms such as risk management, forecasting, market segmentation are used ... But in reality, the solution to each of these problems comes down to using one of the analysis methods described below. For example, forecasting is a regression problem, market segmentation is clustering, risk management is a combination of clustering and classification, and other methods are possible. Therefore, this set of technologies allows you to solve most business problems. In fact, they are atomic (basic) elements from which the solution of a particular problem is assembled.

Now we will describe separately each fragment of the scheme.

The primary source of data should be databases of enterprise management systems, office documents, the Internet, because it is necessary to use all the information that may be useful for making a decision. Moreover, we are talking not only about information internal to the organization, but also about external data (macroeconomic indicators, competitive environment, demographic data, etc.).

Although the data warehouse does not implement analysis technologies, it is the base on which you need to build an analytical system. In the absence of a data warehouse, the collection and systematization of the information necessary for analysis will take most of the time, which will largely negate all the advantages of analysis. After all, one of the key indicators of any analytical system is the ability to quickly get results.

The next element of the schema is the semantic layer. Regardless of how the information will be analyzed, it is necessary that it be understandable to the decision maker, since in most cases the analyzed data is located in different databases, and the decision maker should not delve into the nuances of working with the DBMS, then it is necessary to create a mechanism that transforms the terms subject area into calls to database access mechanisms. This task is performed by the semantic layer. It is desirable that it be the same for all analysis applications, thus it is easier to apply different approaches to the problem.

Reporting systems are designed to answer the question "what's going on". The first variant of its use: regular reports are used to control the operational situation and analyze deviations. For example, the system prepares daily reports on the balance of products in stock, and when its value is less than the average weekly sale, it is necessary to respond to this by preparing a purchase order, i.e. in most cases these are standardized business operations. Most often, some elements of this approach are implemented in one form or another in companies (even if just on paper), but this should not be allowed to be the only approach to data analysis available. The second option for using reporting systems: processing ad hoc requests. When a decision maker wants to test any thought (hypothesis), he needs to get food for thought confirming or refuting the idea, since these thoughts come spontaneously, and there is no exact idea of ​​what kind of information is required, a tool is needed that allows you to quickly and obtain this information in a convenient way. The extracted data is usually presented either in the form of tables or in the form of graphs and charts, although other representations are possible.

Although various approaches can be used to build reporting systems, the most common today is the OLAP mechanism. The main idea is to present information in the form of multidimensional cubes, where the axes represent dimensions (for example, time, products, customers), and the cells contain indicators (for example, the amount of sales, the average purchase price). The user manipulates the measurements and receives information in the desired context.

Because of its ease of understanding, OLAP has become widely accepted as a data analysis engine, but it must be understood that its capabilities in the field of deeper analysis, such as forecasting, are extremely limited. The main problem in solving forecasting problems is not at all the possibility of extracting the data of interest in the form of tables and charts, but the construction of an adequate model. Further, everything is quite simple. New information is fed to the input of the existing model, passed through it, and the result is the forecast. But building a model is a completely non-trivial task. Of course, you can put several ready-made and simple models into the system, for example, linear regression or something similar, quite often they do just that, but this does not solve the problem. Real problems almost always go beyond such simple models. Therefore, such a model will only detect explicit dependencies, the value of which is insignificant, which is already well known, or it will make too rough predictions, which is also completely uninteresting. For example, if you analyze the price of stocks in the stock market based on the simple assumption that tomorrow stocks will cost the same as today, then in 90% of cases you will guess. And how valuable is such knowledge? Only the remaining 10% are of interest to brokers. Primitive models in most cases give a result of about the same level.

The correct approach to building models is to improve them step by step. Starting with the first, relatively crude model, it is necessary to improve it as new data are accumulated and the model is applied in practice. Actually, the task of building forecasts and the like are beyond the scope of the mechanisms of reporting systems, so you should not expect positive results in this direction when using OLAP. To solve the problems of deeper analysis, a completely different set of technologies is used, united under the name Knowledge Discovery in Databases.

Knowledge Discovery in Databases (KDD) is the process of transforming data into knowledge. KDD includes issues of data preparation, selection of informative features, data cleaning, application of Data Mining (DM) methods, data post-processing, interpretation of the results. Data Mining is the process of discovering previously unknown, non-trivial, practically useful and accessible for interpretation knowledge in raw data, which is necessary for making decisions in various areas of human activity.

The beauty of this approach is that regardless of the subject area, we use the same operations:

  1. Extract data. In our case, this requires a semantic layer.
  2. Clear data. The use of "dirty" data for analysis can completely nullify the analysis mechanisms used in the future.
  3. Transform data. Various analysis methods require data prepared in a special way. For example, somewhere only digital information can be used as inputs.
  4. Conduct, in fact, the analysis - Data Mining.
  5. Interpret the results.

This process is repeated iteratively.

Data Mining, in turn, provides a solution to only 6 tasks - classification, clustering, regression, association, sequence and deviation analysis.

This is all that needs to be done to automate the knowledge extraction process. Further steps are already being taken by the expert, who is also the decision maker.

The interpretation of the results of computer processing rests with the person. It's just that different methods provide different food for thought. In the simplest case, these are tables and diagrams, and in a more complex case, models and rules. It is impossible to completely exclude human participation, because one or another result has no meaning until it is applied to a specific subject area. However, there is an opportunity to replicate knowledge. For example, the decision maker, using some method, determined which indicators affect the creditworthiness of buyers, and presented this in the form of a rule. The rule can be introduced into the system of issuing loans and thus significantly reduce credit risks by putting their assessments on stream. At the same time, the person involved in the actual issuance of documents does not require a deep understanding of the reasons for this or that conclusion. In fact, this is the transfer of methods once applied in industry to the field of knowledge management. The main idea is the transition from one-time and non-unified methods to conveyor ones.

Everything mentioned above is just the names of the tasks. And to solve each of them, various methods can be applied, ranging from classical statistical methods to self-learning algorithms. Real business problems are almost always solved by one of the above methods or their combination. Almost all tasks - forecasting, market segmentation, risk assessment, evaluation of the effectiveness of advertising campaigns, evaluation of competitive advantages and many others - come down to those described above. Therefore, having at your disposal a tool that solves the above list of tasks, we can say that you are ready to solve any business analysis problem.

If you paid attention, we have not mentioned anywhere what tool will be used for analysis, what technologies, because. the tasks themselves and the methods of their solution do not depend on the tools. This is just a description of a competent approach to the problem. You can use anything, it is only important that the entire list of tasks is covered. In this case, we can say that there is a truly full-featured solution. Very often, mechanisms are proposed as a "full-functional solution to business analysis problems" that cover only a small part of the tasks. Most often, a business information analysis system is understood only as OLAP, which is completely insufficient for a full-fledged analysis. Under a thick layer of advertising slogans is just a reporting system. Spectacular descriptions of this or that analysis tool hide the essence, but it is enough to start from the proposed scheme, and you will understand the actual state of things.