Posted by Thang Le Toan on 09 September 2019 11:09 PM
Artificial intelligence for IT operations (AIOps) is an umbrella term for the use of big data analytics, machine learning (ML) and other artificial intelligence (AI) technologies to automate the identification and resolution of common information technology (IT) issues. The systems, services and applications in a large enterprise produce immense volumes of log and performance data. AIOps uses this data to monitor assets and gain visibility into dependencies without and outside of IT systems.
An AIOps platform should bring three capabilities to the enterprise:
Automate routine practices.
Routine practices include user requests as well as non-critical IT system alerts. For example, AIOps can enable a help desk system to process and fulfill a user request to provision a resource automatically. AIOps platforms can also evaluate an alert and determine that it does not require action because the relevant metrics and supporting data available are within normal parameters.
Recognize serious issues faster and with greater accuracy than humans.
IT professionals might address a known malware event on a noncritical system, but ignore an unusual download or process starting on a critical server because they are not watching for this threat. AIOps addresses this scenario differently, prioritizing the event on the critical system as a possible attack or infection because the behavior is out of the norm, and deprioritizing the known malware event by running an antimalware function.
Streamline the interactions between data center groups and teams.
AIOps provides each functional IT group with relevant data and perspectives. Without AI-enabled operations, teams must share, parse and process information by meeting or manually sending around data. AIOps should learn what analysis and monitoring data to show each group or team from the large pool of resource metrics.
AIOps is generally used in companies that use DevOps or cloud computing and in large, complex enterprises. AIOps aids teams that use a DevOps model by giving development teams additional insight into their IT environment, which then gives the operations teams more visibility into changes in production. AIOps will also remove a lot of risks involved in hybrid cloud platforms by aiding operators across their IT infrastructure. In many cases, AIOps can help any large company that has an extensive IT environment. Being able to automate processes, recognize problems in an IT environment earlier and aid in smoothing communications between teams will help a majority of large companies with extensive or complicated IT environments.
Learn what AIOps is, the various technologies that underpin it, and the benefits and challenges enterprise IT teams can expect when they implement AIOps platforms.
Learn the basics of AIOps
AIOps uses a conglomeration of various AI strategies, including data output, aggregation, analytics, algorithms, automation and orchestration, machine learning and visualization. Most of these technologies are reasonably well-defined and mature.
AIOps data comes from log files, metrics and monitoring tools, helpdesk ticketing systems and other sources. Big data technologies aggregate and organize all of the systems' output into a useful form. Analytics techniques can interpret the raw information to create new data and metadata. Analytics reduces noise, which is unneeded or spurious data and also spots trends and patterns that enable the tool to identify and isolate problems, predict capacity demand and handle other events.
Analytics also requires algorithms to codify the organization's IT expertise, business policies and goals. Algorithms allow an AIOps platform to deliver the most desirable actions or outcomes -- algorithms are how the IT personnel prioritize security-related events and teach application performance decisions to the platform. The algorithms form the foundation for machine learning, wherein the platform establishes a baseline of normal behaviors and activities, and can then evolve or create new algorithms as data from the environment changes over time.
Automation is a key underlying technology to make AIOps tools take action. Automated functions occur when triggered by the outcomes of analytics and machine learning. For example, a tool's predictive analytics and ML determine that an application needs more storage, then it initiates an automated process to implement additional storage in increments consistent with algorithmic rules.
Finally, visualization tools deliver human-readable dashboards, reports, graphics and other output so users follow changes and events in the environment. With these visualizations, humans can take action on information that requires decision-making capabilities beyond those of the AIOps software.
AIOps benefits and drawbacks
When properly implemented and trained, an AIOps platform reduces the time and attention of IT staff spent on mundane, routine, everyday alerts. IT staff teaches AIOps platforms, which then evolve with the help of algorithms and machine learning, recycling knowledge gained over time to further improve the software's behavior and effectiveness. AIOps tools also perform continuous monitoring without a need for sleep. Humans in the IT department focus on serious, complex issues and on initiatives that increase business performance and stability.
AIOps software can observe causal relationships over multiple systems, services and resources, clustering and correlating disparate data sources. Those analytics and machine learning capabilities enable software to perform powerful root cause analysis, which accelerates the troubleshooting and remediation of difficult and unusual issues.
AIOps can improve collaboration and workflow activities between IT groups and between IT and other business units. With tailored reports and dashboards, teams can understand their tasks and requirements quickly, and interface with others without learning everything the other team needs to know.
Although the underlying technologies for AIOps are relatively mature, it is still an early field in terms of combining the technologies for practical use. AIOps is only as good as the data it receives and the algorithms that it is taught. The amount of time and effort needed to implement, maintain and manage an AIOps platform can be substantial. The diversity of available data sources as well as proper data storage, protection and retention are all important factors in AIOps results.
AIOps demands trust in tooling, which can be a gating factor for some businesses. For an AIOps tool to act autonomously, it must follow changes within its target environment accurately, gather and secure data, form correct conclusions based on the available algorithms and machine learning, prioritize actions properly and take the appropriate automated actions to match business priorities and objectives.
Implementing AIOps and AIOps vendors
To demonstrate value and mitigate risk from AIOps deployment, introduce the technology in small, carefully orchestrated phases. Decide on the appropriate hosting model for the tool, such as on-site or as a service. IT staff must understand and then train the system to suit needs, and to do so must have ample data from the systems under its watch.
ZFS on Linux lets admins error correct in real time and use solid-state disks for data caching. With the command-line interface, they can install it for these benefits.
ZFS is a file system that provides a way to store and manage large volumes of data, but you must manually install it.
ZFS on Linux does more than file organization, so its terminology differs from standard disk-related vocabulary. The file system collects data in pools. Vdevs, or virtual devices, make up each pool and provide redundancy if a physical device fails. You can store these pools on a single storage disk -- which is not a good idea if you encounter file corruption or if the drive fails -- or many disks.
Benefits of ZFS
It is free to install ZFS on Linux, and it provides robust storage with features such as:
on-the-fly error correction;
disk-level, enterprise-strength encryption;
transactional writes -- writing all or none of the data to ensure integrity;
use of solid-state disks to cache data; and
use of high-performance software rather than proprietary RAID hardware.
ZFS on Linux offers significant advantages over more traditional file systems such as ext, the journaling file system and Btrfs. With ZFS, it is easy to create a crash-consistent point in time that you can easily back up. ZFS can also support massive file sizes of up to 16 exabytes if the hardware meets performance requirements.
How to install ZFS
To install ZFS on Linux, type sudo apt-get install zfsutils-linux -y into the command-line interface (CLI). This example shows how to create a new ZFS data volume that spans two disks, but other ZFS disk configurations are also available. This tutorial uses the zfs-utils setup package.
Next, create the vdev disk container. This example adds two 20 GB disks. To identify the disks, use the sudo fdisk -l command. In this case, the two disks are /dev/sdb and /dev/sdc.
Now you can create the mirror setup with sudo zpool create mypool mirror /dev/sdb /dev/sdc.
Depending on the disk reader's setup when you install ZFS, you might get an error that states "/dev/sdb does not contain an extensible firmware interface label but it may contain partition information in the master boot record."
To fix it, use the -f switch so the full command is sudo zpool create -f mypool mirror /dev/sdb /dev/sdc. If you are successful, you won't receive an output or error message.
To reduce root folder clutter, group the ZFS in a subfolder instead of in the root drive.
At this point, the system creates a pool. To check the pool's status, use the sudo zpool status command. The CLI will show the following status and the included volumes.
Your pools should automatically mount and be available within the system. The pools' default location is in a directory off the root folder with the pool name. For example, mypool will mount on the /mypool folder, and you can use the pool just like any other mount point.
If you're not sure of a pool location, use sudo zfs get all | grep mountpoint to show which mount point the program uses and identify the mount point needed to bring the pool online.
With your data pools online, you can set most ZFS options via the CLI with sudo zfs. To set up more advanced ZFS functions, such as how to snapshot a read-only version of a file system, define storage pool thresholds or check data integrity with the checksum function, search the in-system ZFS resources with man zfs and reference the Ubuntu ZFS wiki.
If you're new to ZFS, double-check commands before you run them and ensure you understand how they move data within pools, address storage limits and sync data.
Posted by Thang Le Toan on 28 January 2019 01:34 AM
The term "backup," which has become synonymous with data protection, may be accomplished via several methods. Here's how to choose the best way to safeguard your data.
Protecting data against loss, corruption, disasters (manmade or natural) and other problems is one of the top priorities...
for IT organizations. In concept, the ideas are simple, although implementing an efficient and effective set of backup operations can be difficult.
The term backup has become synonymous with data protection over the past several decades and may be accomplished via several methods. Backup software applications have been developed to reduce the complexity of performing backup and recovery operations. Backing up data is only one part of a disaster protection plan, and may not provide the level of data and disaster recovery capabilities desired without careful design and testing.
The purpose of most backups is to create a copy of data so that a particular file or application may be restored after data is lost, corrupted, deleted or a disaster strikes. Thus, backup is not the goal, but rather it is one means to accomplish the goal of protecting data. Testing backups is just as important as backing up and restoring data. Again, the point of backing up data is to enable restoration of data at a later point in time. Without periodic testing, it is impossible to guarantee that the goal of protecting data is being met.
Backing up data is sometimes confused with archiving data, although these operations are different. A backup is a secondary copy of data used for data protection. In contrast, an archive is the primary data, which is moved to a less-expensive type of media (such as tape) for long-term, low-cost storage.
The most basic and complete type of backup operation is a full backup. As the name implies, this type of backup makes a copy of all data to another set of media, which can be tape, disk or a DVD or CD. The primary advantage to performing a full backup during every operation is that a complete copy of all data is available with a single set of media. This results in a minimal time to restore data, a metric known as a recovery time objective (RTO). However, the disadvantages are that it takes longer to perform a full backup than other types (sometimes by a factor of 10 or more), and it requires more storage space.
Thus, full backups are typically run only periodically. Data centers that have a small amount of data (or critical applications) may choose to run a full backup daily, or even more often in some cases. Typically, backup operations employ a full backup in combination with either incremental or differential backups.
An incremental backup operation will result in copying only the data that has changed since the last backup operation of any type. The modified time stamp on files is typically used and compared to the time stamp of the last backup. Backup applications track and record the date and time that backup operations occur in order to track files modified since these operations.
Because an incremental backup will only copy data since the last backup of any type, it may be run as often as desired, with only the most recent changes stored. The benefit of an incremental backup is that they copy a smaller amount of data than a full. Thus, these operations will complete faster, and require less media to store the backup.
A differential backup operation is similar to an incremental the first time it is performed, in that it will copy all data changed from the previous backup. However, each time it is run afterwards, it will continue to copy all data changed since the previous full backup. Thus, it will store more data than an incremental on subsequent operations, although typically far less than a full backup. Moreover, differential backups require more space and time to complete than incremental backups, although less than full backups.
Table 1: A comparison of different backup operations
Changes from backup 1
Changes from backup 1
Changes from backup 2
Changes from backup 1
Changes from backup 3
Changes from backup 1
As shown in "Table 1: A comparison of different backup operations," each type of backup works differently. A full backup must be performed at least once. Afterwards, it is possible to run either another full, an incremental or a differential backup. The first partial backup performed, either a differential or incremental will back up the same data. By the third backup operation, the data that is backed up with an incremental is limited to the changes since the last incremental. In comparison, the third backup with a differential backup will backup all changes since the first full backup, which was backup 1.
From these three primary types of backup types it is possible to develop an approach to protecting data. Typically one of the following approaches is used:
Full weekly + Differential daily
Full weekly + Incremental daily
Many considerations will affect the choice of the optimal backup strategy. Typically, each alternative and strategy choice involves making tradeoffs between performance, data protection levels, total amount of data retained and cost. In "Table 2: A backup strategy's impact on space" below, the media capacity requirements and media required for recovery are shown for three typical backup strategies. These calculations presume 20 TB of total data, with 5% of the data changing daily, and no increase in total storage during the period. The calculations are based on 22 working days in a month and a one month retention period for data.
Table 2: A backup strategy's impact on space
Common backup scenarios
Media Space Required for one Month (20 TB @ 5% daily rate of change)
Media required for recovery
Full daily (weekdays)
Space for 22 daily fulls (22 * 20 TB) = 440.00 TB
Most recent backup only
Full (weekly) + Differential (weekdays)
Fulls, plus most recent differential since full (5 * 20 TB) + (22 * 5%* 20 TB) = 124.23 TB
Most recent full + most recent differential
Full (weekly) + Incremental (weekdays)
Fulls, plus all incrementals since weekly full (5 * 20 TB) + (22 * 5% * 20 TB) = 122.00 TB
Most recent full + all incrementals since full
As shown above, performing a full backup daily requires the most amount of space, and will also take the most amount of time. However, more total copies of data are available, and fewer pieces of media are required to perform a restore operation. As a result, implementing this backup policy has a higher tolerance to disasters, and provides the least time to restore, since any piece of data required will be located on at most one backup set.
As an alternative, performing a full backup weekly, coupled with running incremental backups daily will deliver the shortest backup time during weekdays and use the least amount of storage space. However, there are fewer copies of data available and restore time is the longest, since it may be required to utilize six sets of media to recover the information needed. If data is needed from data backed up on Wednesday, the Sunday full backup, plus the Monday, Tuesday and Wednesday incremental media sets are required. This can dramatically increase recovery times, and requires that each media set work properly; a failure in one backup set can impact the entire restoration.
Running a weekly full backup plus daily differential backups delivers results in between the other alternatives. Namely, more backup media sets are required to restore than with a daily full policy, although less than with a daily incremental policy. Also, the restore time is less than using daily Incrementals, and more than daily fulls. In order to restore data from a particular day, at most two media sets are required, diminishing the time needed to recover and the potential for problems with an unreadable backup set.
For organizations with small data sets, running a daily full backup provides a high level of protection without much additional storage space costs. Larger organizations or those with more data find that running a weekly full Backup, coupled with either daily incrementals or differentials provides a better option. Using differentials provides a higher level of data protection with less restore time for most scenarios with a small increase in storage capacity. For this reason, using a strategy of weekly full backups with daily differential backups is a good option for many organizations.
Most of the advanced backup options such as synthetic full, mirror, reverse incremental and CDP require disk storage as the backup target. A synthetic full simply reconstructs the full backup image using all required incrementals or the differential on disk. This synthetic full may then be stored to tape for offsite storage, with the advantage being reduced restoration time. Mirroring is copying of disk storage to another set of disk storage, with reverse incrementals used to add incremental type of backup support. Finally, CDP allows a greater number of restoration points than traditional backup options.
When deciding which type of backup strategy to use the question is not what type of backup to use, but when to use each, and how these options should be combined with testing to meet the overall business cost, performance and availability goals.
Posted by Thang Le Toan on 05 October 2018 06:46 AM
Kubernetes TLS bootstrap improvements in version 1.12 tackle container management complexity, and users hope there's more where that came from.
IT pros have said Kubernetes TLS bootstrap is a step in the right direction, and they have professed hope that it's the first of many more to come.
Automated Transport Layer Security (TLS) bootstrap is now a stable, production-ready feature as of last week's release of Kubernetes 1.12 to the open source community. Previously, IT pros set up secure communication between new nodes, as they were added to a Kubernetes cluster separately and often manually. The Kubernetes TLS bootstrap feature automates the way Kubernetes nodes launch themselves into TLS-secured clusters at startup.
"The previous process was more complicated and error-prone. [TLS bootstrap] enables simpler pairing similar to Bluetooth or Wi-Fi push-button pairing," said Tim Pepper, a senior staff engineer at VMware and release lead for Kubernetes 1.12 at the Cloud Native Computing Foundation.
Kubernetes maintainers predict this automation will discourage sys admins' previous workarounds to ease management, such as the use of a single TLS credential for an entire cluster. This workaround prevented the use of Kubernetes security measures that require each node to have a separate credential, such as node authorization and admission controls.
Kubernetes 1.12 pushed to beta a similarly automated process for TLS certificate requests and rotation once clusters are setup. Stable support for such long-term manageability tops Kubernetes users' wish list.
"TLS bootstrap helps, but doesn't completely automate the process of TLS handshakes between nodes and the Kubernetes master," said Arun Velagapalli, principal security engineer at Kabbage Inc., a fintech startup in Atlanta. "It's still a lot of manual work within the [command-line interface] right now."
Kubernetes TLS bootstrap automates TLS communication between Kubernetes nodes, but security in depth also requires Kubernetes TLS management between pods and even individual containers. This has prompted Kabbage engineers to explore the Istio service mesh and HashiCorp Vault for automated container security management.
Kubernetes management challenges linger
Industry analysts overwhelmingly agreed that Kubernetes is the industry standard for container orchestration. A 451 Research survey of 200 enterprise decision-makers and developers in North America conducted in March 2018 found 84% of respondents plan to adopt Kubernetes, rather than use multiple container orchestration tools.
Chris Rileydirector of solutions architecture, cPrime Inc.
"It will take one to three years for most enterprises to standardize on Kubernetes, and we still see some use of Mesos, which has staying power for data-rich applications," said Jay Lyman, analyst at 451 Research. "But Kubernetes is well-timed as a strong distributed application framework for use in hybrid clouds."
Still, while many enterprises plan to deploy Kubernetes, IT experts questioned the extent of its widespread production use.
"A lot of people say they're using Kubernetes, but they're just playing around with it," said Jeremy Pullen, CEO and principal consultant at Polodis Inc., a DevSecOps and Lean management advisory firm in Tucker, Ga., which works with large enterprise clients. "The jury's still out on how many companies have actually adopted it, as far as I'm concerned."
The Kubernetes community still must make the container orchestration technology accessible to enterprise customers. Vendors such as Red Hat, Rancher and Google Cloud Platform offer Kubernetes distributions that automate cluster setup, but IT pros would like to see such features enter the standard Kubernetes upstream distribution, particularly for on-premises use.
"Manually creating on-premises Kubernetes is not a simple proposition, and the automation features for load balancers, storage, etc., are really public-cloud-centric," said Chris Riley, director of solutions architecture at cPrime Inc., an Agile software development consulting firm in Foster City, Calif. "If that same ease of use [came to] the default distro, I think that would help clients who are still sensitive about public cloud consider Kubernetes."
Kubernetes community leaders don't rule out this possibility as they consider the future of the project. Features in the works include the Kubernetes Cluster API and a standardized container storage interface slated for stable release by the end of 2018. Standardized and accessible cluster management is the top priority for the Kubernetes architecture special interest group.
"The question is, how many and which variations on [Kubernetes cluster management automation] does the community test there, and how do we curate the list we focus on?" Pepper said. "It becomes complicated to balance that. So, for now, we rely on service providers to do opinionated infrastructure integrations."
Posted by Thang Le Toan on 25 September 2018 12:12 AM
Storage at the edge is the collective methods and technologies that capture and retain digital information at the periphery of the network, as close to the originating source as possible. In the early days of the internet, managing storage at the edge was primarily the concern of network administrators who had employees at remote branch offices (ROBOS). By the turn of the century, the term was also being used to describe direct-attached storage (DAS) in notebook computers and personal digital assistants (PDAs) used by field workers. Because employees did not always remember to back up storage at the edge manually, a primary concern was how to automate backups and keep the data secure.
Today, as more data is being generated by networked internet of things (IoT) devices, administrators who deal with storage at the edge are more concerned with establishing workarounds for limited or intermittent connectivity and dealing with raw data that might need to be archived indefinitely. The prodigious volume of data coming from highway video surveillance cameras, for example, can easily overwhelm a traditional centralized storage model. This has led to experiments with pre-processing data at its source and centralizing storage for only a small part of the data.
In the case of automotive data, for example, log files stored in the vehicle might simply be tagged so should the need arise, the data in question could be sent to the cloud for deeper analysis. In such a scenario, intermediary micro-data centers or high-performance fog computing servers could be installed at remote locations to replicate cloud services locally. This not only improves performance, but also allows connected devices to act upon perishable data in fractions of a second. Depending upon the vendor and technical implementation, the intermediary storage location may be referred to by one of several names including IoT gateway, base station or hub.
Unstructured data is information, in many different forms, that doesn't hew to conventional data models and thus typically isn't a good fit for a mainstream relational database. Thanks to the emergence of alternative platforms for storing and managing such data, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications.
Traditional structured data, such as the transaction data in financial systems and other business applications, conforms to a rigid format to ensure consistency in processing and analyzing it. Sets of unstructured data, on the other hand, can be maintained in formats that aren't uniform, freeing analytics teams to work with all of the available data without necessarily having to consolidate and standardize it first. That enables more comprehensive analyses than would otherwise be possible.
Types of unstructured data
One of the most common types of unstructured data is text. Unstructured text is generated and collected in a wide range of forms, including Word documents, email messages, PowerPoint presentations, survey responses, transcripts of call center interactions, and posts from blogs and social media sites.
Other types of unstructured data include images, audio and video files. Machine data is another category, one that's growing quickly in many organizations. For example, log files from websites, servers, networks and applications -- particularly mobile ones -- yield a trove of activity and performance data. In addition, companies increasingly capture and analyze data from sensors on manufacturing equipment and other internet of things (IoT) connected devices.
In some cases, such data may be considered to be semi-structured -- for example, if metadata tags are added to provide information and context about the content of the data. The line between unstructured and semi-structured data isn't absolute, though; some data management consultants contend that all data, even the unstructured kind, has some level of structure.
Unstructured data analytics
Because of its nature, unstructured data isn't suited to transaction processing applications, which are the province of structured data. Instead, it's primarily used for BI and analytics. One popular application is customer analytics. Retailers, manufacturers and other companies analyze unstructured data to improve customer relationship management processes and enable more-targeted marketing; they also do sentiment analysis to identify both positive and negative views of products, customer service and corporate entities, as expressed by customers on social networks and in other forums.
Predictive maintenance is an emerging analytics use case for unstructured data. For example, manufacturers can analyze sensor data to try to detect equipment failures before they occur in plant-floor systems or finished products in the field. Energy pipelines can also be monitored and checked for potential problems using unstructured data collected from IoT sensors.
Analyzing log data from IT systems highlights usage trends, identifies capacity limitations and pinpoints the cause of application errors, system crashes, performance bottlenecks and other issues. Unstructured data analytics also aids regulatory compliance efforts, particularly in helping organizations understand what corporate documents and records contain.
Unstructured data techniques and platforms
Analyst firms report that the vast majority of new data being generated is unstructured. In the past, that type of information often was locked away in siloed document management systems, individual manufacturing devices and the like -- making it what's known as dark data, unavailable for analysis.
But things changed with the development of big data platforms, primarily Hadoop clusters, NoSQL databases and the Amazon Simple Storage Service (S3). They provide the required infrastructure for processing, storing and managing large volumes of unstructured data without the imposition of a common data model and a single database schema, as in relational databases and data warehouses.
A variety of analytics techniques and tools are used to analyze unstructured data in big data environments. Text analytics tools look for patterns, keywords and sentiment in textual data; at a more advanced level, natural language processing technology is a form of artificial intelligence that seeks to understand meaning and context in text and human speech, increasingly with the aid of deep learning algorithms that use neural networks to analyze data. Other techniques that play roles in unstructured data analytics include data mining, machine learning and predictive analytics.
Data science teams need the right skills and solid processes
For data scientists, big data systems and AI-enabled advanced analytics technologies open up new possibilities to help drive better business decision-making. "Like never before, we have access to data, computing power and rapidly evolving tools," Forrester Research analyst Kjell Carlsson wrote in a July 2017 blog post.
The downside, Carlsson added, is that many organizations "are only just beginning to crack the code on how to unleash this potential." Often, that isn't due to a lack of internal data science skills, he said in a June 2018 blog; it's because companies treat data science as "an artisanal craft" instead of a well-coordinated process that involves analytics teams, IT and business units.
Of course, possessing the right data science skills is a predicate to making such processes work. The list of skills that LinkedIn's analytics and data science team wants in job candidates includes the ability to manipulate data, design experiments with it and build statistical and machine learning models, according to Michael Li, who heads the team.
But softer skills are equally important, Li said in an April 2018 blog. He cited communication, project management, critical thinking and problem-solving skills as key attributes. Being able to influence decision-makers is also an important part of "the art of being a data scientist," he wrote.
The problem is that such skills requirements are often "completely out of reach for a single person," Miriam Friedel wrote in a September 2017 blog when she was director and senior scientist at consulting services provider Elder Research. Friedel, who has since moved on to software vendor Metis Machine as data science director, suggested in the blog that instead of looking for the proverbial individual unicorn, companies should build "a team unicorn."
This handbook more closely examines that team-building approach as well as critical data science skills for the big data and AI era.
Reskilling the analytics team: Math, science and creativity
Technical skills are a must for data scientists. But to make analytics teams successful, they also need to think creatively, work in harmony and be good communicators.
In a 2009 study of its employee data, Google discovered that the top seven characteristics of a successful manager at the company didn't involve technical expertise. For example, they included being a good coach and an effective communicator, having a clear vision and strategy, and empowering teams without micromanaging them. Technical skills were No. 8.
Google's list, which was updated this year to add collaboration and strong decision-making capabilities as two more key traits, applies specifically to its managers, not to technical workers. But the findings from the study, known as Project Oxygen, are also relevant to building an effective analytics team.
Obviously, STEM skills are incredibly important in analytics. But as Google's initial and subsequent studies have shown, they aren't the whole or even the most important part of the story. As an analytics leader, I'm very glad that someone has put numbers to all this, but I've always known that the best data scientists are also empathetic and creative storytellers.
According to the latest employment projections report by the U.S. Bureau of Labor Statistics, statisticians are in high demand. Among occupations that currently employ at least 25,000 people, statistician ranks fifth in projected growth rate; it's expected to grow by 33.8% from 2016 to 2026. For context, the average rate of growth that the statistics bureau forecasts for all occupations is 7.4%. And with application software developers as the only other exception, all of the other occupations in the top 10 are in the healthcare or senior care verticals, which is consistent with an aging U.S. population.
Statistician is fifth among occupations with at least 25,000 workers projected to grow at the fastest rates.
Thanks to groundbreaking innovations in technology and computing power, the world is producing more data than ever before. Businesses are using actionable analytics to improve their day-to-day processes and drive diverse functions like sales, marketing, capital investment, HR and operations. Statisticians and data scientists are making that possible, using not only their mathematical and scientific skills, but also creativity and effective communication to extract and convey insights from the new data resources.
In 2017, IBM partnered with job market analytics software vendor Burning Glass Technologies and the Business-Higher Education Forum on a study that showed how the democratization of data is forcing change in the workforce. Without diving into the minutia, I gathered from the study that with more and more data now available to more and more people, the insights garnered from the data set you apart as an employee -- or as a company.
Developing and encouraging our analytics team
The need to find and communicate these insights influences how we hire and train our up-and-coming analytics employees at Dun & Bradstreet. Our focus is still primarily on mathematics, but we also consider other characteristics like critical- and innovative-thinking abilities as well as personality traits, so our statisticians and data scientists are effective in their roles.
Our employees have the advantage of working for a business-to-business company that has incredibly large and varied data sets -- containing more than 300 million business records -- and a wide variety of customers that are interested in our analytics services and applications. They get to work on a very diverse set of business challenges, share cutting-edge concepts with data scientists in other companies and develop creative solutions to unique problems.
Our associates are encouraged to pursue new analytical models and data analyses, and we have special five-day sprints where we augment and enhance some of the team's more creative suggestions. These sprints not only challenge the creativity of our data analysts, but also require them to work on their interpersonal and communication skills while developing these applications as a group.
Socializing the new, creative data analyst
It's very important to realize that some business users aren't yet completely comfortable with a well-rounded analytics team. For the most part, when bringing in an analyst, they're looking for confirmation of a hypothesis rather than a full analysis of the data at hand.
If that's the case in your organization, then be persistent. As your team continues to present valuable insights and creative solutions, your peers and business leaders across the company will start to seek guidance from data analysts as partners in problem-solving much more frequently and much earlier in their decision-making processes.
As companies and other institutions continue to amass data exponentially and rapid technological changes continue to affect the landscape of our businesses and lives, growing pains will inevitably follow. Exceptional employees who have creativity and empathy, in addition to mathematical skills, will help your company thrive through innovation. Hopefully, you have more than a few analysts who possess those capabilities. Identify and encourage them -- and give permission to the rest of your analytics team to think outside the box and rise to the occasion.
Data scientist vs. business analyst: What's the difference?
Data science and business analyst roles differ in that data scientists must deep dive into data and come up with unique business solutions -- but the distinctions don't end there.
What is the difference between data science and business analyst jobs? And what kind of training or education is required to become a data scientist?
There are a number of differences between data scientists and business analysts, the two most common business analytics roles, but at a high level, you can think about the distinction as similar to a medical researcher and a lab technician. One uses experimentation and the scientific method to search out new, potentially groundbreaking discoveries, while the other applies existing knowledge in an operational context.
Data scientist vs. business analyst comes down to the realms they inhabit. Data scientists delve into big data sets and use experimentation to discover new insights in data. Business analysts, on the other hand, typically use self-service analytics tools to review curated data sets, build reports and data visualizations, and report targeted findings -- things like revenue by quarter or sales needed to hit targets.
What does a data scientist do?
A data scientist takes analytics and data warehousing programs to the next level: What does the data really say about the company, and is the company able to decipher relevant data from irrelevant data?
A data scientist should be able to leverage the enterprise data warehouse to dive deeper into the data that comes out or to analyze new types of data stored in Hadoop clusters and other big data systems. A data scientist doesn't just report on data like a classic business analyst does, he also delivers business insights based on the data.
A data scientist job also requires a strong business sense and the ability to communicate data-driven conclusions to business stakeholders. Strong data scientists don't just address business problems, they'll also pinpoint the problems that have the most value to the organization. A data scientist plays a more strategic role within an organization.
Data scientist education, skills and personality traits
Data scientists look through all the available data with the goal of discovering a previously hidden insight that, in turn, can provide a competitive advantage or address a pressing business problem. Data scientists do not simply collect and report on data -- they also look at it from many angles, determine what it means and then recommend ways to apply the data. These insights could lead to a new product or even an entirely new business model.
Data scientists apply advanced machine learning models to automate processes that previously took too long or were inefficient. They use data processing and programming tools -- often open source, like Python, R and TensorFlow -- to develop new applications that take advantage of advances in artificial intelligence. These applications may perform a task such as transcribing calls to a customer service line using natural language processing or automatically generating text for email campaigns.
What does a business analyst do?
A business analyst -- a title often used interchangeably with data analyst -- focuses more on delivering operational insights to lines of business using smaller, more targeted data sets. For example, a business analyst tied to a sales team will work primarily with sales data to see how individual team members are performing, to identify members who might need extra coaching and to search for other areas where the team can improve on its performance.
Business analysts typically use self-service analytics and data visualization tools. Using these tools, business analysts can build reports and dashboards that team members can use to track their performance. Typically, the information contained in these reports is retrospective rather than predictive.
Data scientist vs. business analyst training, tools and trends
To become a business analyst, you need a familiarity with statistics and the basic fundamentals of data analysis, but there are many self-service analytics tools that do the mathematical heavy lifting for you. Of course, you have to know if it's statistically meaningful to join two separate data sets, and you have to understand the distinction between correlation and causation. But, on the whole, a deep background in mathematics is unnecessary.
To become a data scientist, on the other hand, you need a strong background in math. This is one of the primary differences in the question of data scientists vs. business analysts.
Many data scientists have doctorates in some field of math. Many have backgrounds in physics or other advanced sciences that lean heavily on statistical inference.
Business analysts can generally pick up the technical skills they need on the job. Whether an enterprise uses Tableau, Qlik or Power BI -- the three most common self-service analytics platforms -- or another tool, most use graphical user interfaces that are designed to be intuitive and easy to pick up.
Data science jobs require more specific technical training. In addition to advanced mathematical education, data scientists need deep technical skills. They must be proficient in several common coding languages -- including Python, SQL and Java -- which enable them to run complex machine learning models against big data stored in Hadoop or other distributed data management platforms. Most often, data scientists pick up these skills from a college-level computer science curriculum.
However, trends in data analytics are beginning to collapse the line between data science and data analysis. Increasingly, software companies are introducing platforms that can automate complex tasks using machine learning. At the same time, self-service software supports deeper analytical functionality, meaning data scientists are increasingly using tools that were once solely for business analysts.
Companies often report the highest analytics success when blending teams, so data scientists working alongside business analysts can produce operational benefits. This means that the data scientist vs. business analyst distinctions could become less important as time goes on -- a trend that may pay off for enterprises.
Hiring vs. training data scientists: The case for each approach
Hiring data scientists is easier said than done -- so should you try to train current employees in data science skills? That depends on your company's needs, writes one analytics expert.
Companies are faced with a dilemma on big data analytics initiatives: whether to hire data scientists from outside or train current employees to meet new demands. In many cases, realizing big data's enormous untapped potential brings the accompanying need to increase data science skills -- but building up your capacity can be tricky, especially in a crowded market of businesses looking for analytics talent.
Even with a shortage of available data scientists, screening and interviewing for quality hires is time- and resource-intensive. Alternatively, training data scientists from within may be futile if internal candidates don't have the fundamental aptitude.
At The Data Incubator, we've helped hundreds of companies train employees on data science and hire new talent -- and, often, we've aided organizations in handling the tradeoffs between the two approaches. Based on the experiences we've had with our corporate clients, you should consider the following factors when deciding which way to go.
New hires bring in new thinking
The main benefit of hiring rather than training data scientists comes from introducing new ideas and capabilities into your organization. What you add may be technical in nature: For example, are you looking to adopt advanced machine learning techniques, such as neural networks, or to develop real-time customer insights by using Spark Streaming? It may be cultural, too: Do you want an agile data science team that can iterate rapidly -- even at the expense of "breaking things," in Facebook's famous parlance? Or one that can think about data creatively and find novel approaches to using both internal and external information?
At other times, it's about having a fresh set of eyes looking at the same problems. Many quant hedge funds intentionally hire newly minted STEM Ph.D. holders -- people with degrees in science, technology, engineering or math -- instead of industry veterans precisely to get a fresh take on financial markets. And it isn't just Wall Street; in other highly competitive industries, too, new ideas are the most important currency, and companies fight for them to remain competitive.
How a company sources new talent can also require some innovation, given the scarcity of skilled data scientists. Kaggle and other competition platforms can be great places to find burgeoning data science talent. The public competitions on Kaggle are famous for bringing unconventional stars and unknown whiz kids into the spotlight and demonstrating that the best analytics performance may come from out of left field.
Similarly, we've found that economists and other social scientists often possess the same strong quantitative skill sets as their traditional STEM peers, but are overlooked by HR departments and hiring managers alike.
Training adds to existing expertise
In other cases, employers may value industry experience first and foremost. Domain expertise is complex, intricate and difficult to acquire in some industries. Such industries often already have another science at their core. Rocketry, mining, chemicals, oil and gas -- these are all businesses in which knowledge of the underlying science is more important than data science know-how.
Highly regulated industries are another case in point. Companies facing complex regulatory burdens must often meet very specific, and frequently longstanding, requirements. Banks must comply with financial risk testing and with statutes that were often written decades ago. Similarly, the drug approval process in healthcare is governed by a complex set of immutable rules. While there is certainly room for innovation via data science and big data in these fields, it is constrained by regulations.
Companies in this position often find training data scientists internally to be a better option for developing big data analytics capabilities than hiring new talent. For example, at The Data Incubator, we work with a large consumer finance institution that was looking for data science capabilities to help enhance its credit modeling. But its ideal candidate profile for that job was very different from the ones sought by organizations looking for new ideas on business operations or products and services.
The relevant credit data comes in slowly: Borrowers who are initially reliable could become insolvent months or years after the initial credit decision, which makes it difficult to predict defaults without a strong credit model. And wrong decisions are very expensive: Loan defaults result in direct hits to the company's profitability. In this case, we worked with the company to train existing statisticians and underwriters on complementary data science skills around big data.
Of course, companies must be targeted in selecting training candidates. They often start by identifying employees who possess strong foundational skills for data science -- things like programming and statistics experience. Suitable candidates go by many titles, including statisticians, actuaries and quantitative analysts, more popularly known as quants.
Find the right balance for your needs
For many companies, weighing the options for hiring or training data scientists comes down to understanding their specific business needs, which can vary even in different parts of an organization. It's worth noting that the same financial institution that trained its staffers to do analytics for credit modeling also hired data scientists for its digital marketing team.
Without the complex regulatory requirements imposed on the underwriting side, the digital marketing team felt it could more freely innovate -- and hence decided to bring in new blood with new ideas. These new hires are now building analytical models that leverage hundreds of data signals and use advanced AI and machine learning techniques to more precisely target marketing campaigns at customers and better understand the purchase journeys people take.
Ultimately, the decision of whether to hire or train data scientists must make sense for an organization. Companies must balance the desire to innovate with the need to incorporate existing expertise and satisfy regulatory requirements. Getting that balance right is a key step in a successful data science talent strategy.
Self-service business intelligence (SSBI) is an approach to data analytics that enables business users to access and work with corporate data even though they do not have a background in statistical analysis, business intelligence (BI) or data mining. Allowing end users to make decisions based on their own queries and analyses frees up the organization's business intelligence and information technology (IT) teams from creating the majority of reports and allows those teams to focus on other tasks that will help the organization reach its goals.
Because self-service BI software is used by people who may not be tech-savvy, it is imperative that the user interface (UI) for BI software be intuitive, with a dashboard and navigation that is user-friendly. Ideally, training should be provided to help users understand what data is available and how that information can be queried to make data-driven decisions to solve business problems, but once the IT department has set up the data warehouse and data marts that support the business intelligence system, business users should be able to query the data and create personalized reports with very little effort.
While self-service BI encourages users to base decisions on data instead of intuition, the flexibility it provides can cause unnecessary confusion if there is not a data governance policy in place. Among other things, the policy should define what the key metrics for determining success are, what processes should be followed to create and share reports, what privileges are necessary for accessing confidential data and how data quality, security and privacy will be maintained.
Explore the data discovery software market, including the products and vendors helping enterprises glean insights using data visualization and self-service BI.
Turning data into business insight is the ultimate goal. It's not about gathering as much data as possible, it's about applying tools and making discoveries that help a business succeed. The data discovery software market includes a range of software and cloud-based services that can help organizations gain value from their constantly growing information resources.
These products fall within the broad BI category, and at their most basic, they search for patterns within data and data sets. Many of these tools use visual presentation mechanisms, such as maps and models, to highlight patterns or specific items of relevance. The tools deliver visualizations to users, including nontechnical workers, such as business analysts, via dashboards, reports, charts and tables.
The big benefit here: data discovery tools provide detailed insights gleaned from data to better inform business decisions. In many cases, the tools accomplish this with limited IT involvement because the products offer self-service features.
Using extensive research into the data discovery software market, TechTarget editors focused on the data discovery software vendors that lead in market share, plus those that offer traditional and advanced functionality. Our research included data from TechTarget surveys, as well as reports from respected research firms, including Gartner and Forrester.
Alteryx Inc.'s Connect markets itself as a collaborative data exploration and data cataloging platform for the enterprise that changes how information workers discover, prioritize and analyze all the relevant information within an organization.
The data discovery software market includes a range of software and cloud-based services that can help organizations gain value from their constantly growing information resources.
Alteryx Connect key features include:
Data Asset Catalog, which collects metadata from information systems, enabling better relevant data organization;
Business Glossary, which defines standard business terms in a data dictionary and links them to assets in the catalog; and
Data Discovery, which lets users discover the information they need through search capabilities.
Other features include:
Data Enrichment and Collaboration, which allows users to annotate, discuss and rate information to offer business context and provide an organization with relevant data; and
Certification and Trust, which provides insights into information asset trustworthiness through certification, lineage and versioning.
Alteryx touts these features as decreasing the time necessary to gain insight and supporting faster, data-driven decisions by improving collaboration, enhancing analytic productivity and ensuring data governance.
Domo Inc. provides a single-source system for end-to-end data integration and preparation, data discovery, and sharing in the cloud. It's mobile-focused, and it doesn't need you to integrate desktop software, third-party tools or on-premises servers.
With more than 500 native connectors, Domo designed the platform for quick and easy access to data from across the business, according to the company. It contains a central repository that ingests the data and aids version and access control.
Domo also provides one workspace from which people can choose and explore all the data sets available to them in the platform.
Data discovery capabilities include Data Lineage, a path-based view that clarifies data sources. This feature also enables simultaneous display of data tables alongside visualizations, aiding insight discovery, as well as card-based publishing and sharing.
GoodData Enterprise Insights Platform
The GoodData Corp.'s cloud-based Enterprise Insights Platform is an end-to-end data discovery software platform that gathers data and user decisions, transforming them into actionable insights for line-of-business users.
The platform provides insights in the form of recommendations and predictive analytics with the goal of delivering the analytics that matter most for real-time decision-making. Customers, partners and employees see information that is relevant to the decision at hand, presented in what GoodData claims is a personalized, contextual, intuitive and actionable form. Users can also integrate these insights directly into applications.
IBM Watson Explorer
IBM has a host of data discovery products, and one of the key offerings is IBM Watson Explorer. It's a cognitive exploration and content analysis platform that enables business users to easily explore and analyze structured, unstructured, internal, external and public data for trends and patterns.
Organizations have used Watson Explorer to understand 100% of incoming calls and emails, to improve the quality of information, and to enhance their ability to use that information, according to IBM.
Machine learning models, natural language processing and next-generation APIs combine to help organizations unlock value from all of their data and gain a secure, 360-degree view of their customers, in context, according to the company.
The platform also enables users to classify and score structured and unstructured data with machine learning to reach the most relevant information. And a new mining application gives users deep insights into structured and unstructured data.
Informatica LLC offers multiple data management products powered by its Claire engine as part of its Intelligent Data Platform. The Claire engine is a metadata-driven AI technology that automatically scans enterprise data sets and exploits machine learning algorithms to infer relationships about the data structure and provide recommendations and insights. By augmenting end users' individual knowledge with AI, organizations can discover more data from more users in the enterprise, according to the company.
Another component, Informatica Enterprise Data Catalog, scans and catalogs data assets across the enterprise to deliver recommendations, suggestions and data management task automation. Semantic search and dynamic facet capabilities allow users to filter search results and get data lineage, profiling statistics and holistic relationship views.
Informatica Enterprise Data Lake enables data analysts to quickly find data using semantic and faceted search and to collaborate with one another in shared project workspaces. Machine learning algorithms recommend alternative data sets. Analysts can sample and prepare datasets in an Excel-like data preparation interface, which analysts can operationalize as reusable workflows.
Information Builders WebFocus
Information Builders claims its WebFocus data discovery software platform helps companies use BI and analytics strategically across and beyond the enterprise.
The platform includes a self-service visual discovery tool that enables nontechnical business users to conduct data preparation; visually analyze complex data sets; generate sophisticated data visualizations, dashboards, and reports; and share content with other users. Its extensive visualization and charting capabilities provide an approach to self-service discovery that supports any type of user, Information Builders claims.
Information Builders offers a number of tools related to the WebFocus BI and analytics platform that provide enterprise-grade analytics and data discovery. One is WebFocus InfoApps, which can take advantage of custom information applications designed to enable nontechnical users to rapidly gather insights and explore specific business contexts. InfoApps can include parameterized dashboards, reports, charts and visualizations.
Another tool, WebFocus InfoAssist, enables governed self-service reporting, analysis and discovery capabilities to nontechnical users. The product offers a self-service BI capability for immediate data access and analysis.
Microsoft Power BI
Microsoft Power BI is a cloud-based business analytics service that enables users to visualize and analyze data. The same users can distribute data insights anytime, anywhere, on any device in just a few clicks, according to the company.
As a BI and analytics SaaS tool, Power BI equips users across an organization to build reports with colleagues and share insights. It connects to a broad range of live data through dashboards, provides interactive reports and delivers visualizations that include KPIs from data on premises and in the cloud.
Organizations can use machine learning to automatically scan data and gain insights, ask questions of the data using natural language queries, and take advantage of more than 140 free custom visuals created by the user community.
Power BI applications include dashboards with prebuilt content for cloud services, including Salesforce, Google Analytics and Dynamics 365. It also integrates seamlessly with Microsoft products, such as Office 365, SharePoint, Excel and Teams.
Organizations can start by downloading Power BI Desktop for free, while Power BI Pro and Premium offer several licensing options for companies that want to deploy Power BI across their organization.
MicroStrategy Desktop Client
MicroStrategy Ltd. designed its Desktop client to deliver self-service BI and help business users or departmental analysts analyze data with out-of-the-box visualizations. Data discovery capabilities are available via Mac or Windows PC web browsers and native mobile apps for iOS and Android.
All the interfaces are consistent and users can promote content between the interfaces. With the MicroStrategy Desktop client, business users can visualize data on any chart or graph, including natural language generation narratives, Google Charts, geospatial maps and data-driven documents visualizations.
They can access data from more than 100 data sources, including spreadsheets, RDBMS, cloud systems, and more; prepare, blend, and profile data with graphical interfaces; share data as a static PDF or as an interactive dashboard file; and promote offline content to a server and publish governed and certified dashboards.
OpenText EnCase Risk Manager
OpenText EnCase Risk Manager enables organizations to understand the sensitive data they have in their environment, where the data exists and its value.
The data discovery software platform helps organizations identify, categorize and remediate sensitive information across the enterprise, whether that information exists in the form of personally identifiable customer data, financial records or intellectual property. EnCase Risk Manager provides the ability to search for standard patterns, such as national identification numbers and credit card data, with the ability to discover entirely unique or proprietary information specific to a business or industry.
Risk Manager is platform-agnostic and able to identify this information throughout the enterprise wherever structured or unstructured data is stored, be that on endpoints, servers, cloud repositories, SharePoint or Exchange. Pricing starts at $60,000.
Oracle Big Data Discovery
Oracle Big Data Discovery enables users to find, explore and analyze big data. They can use the platform to discover new insights from data and share results with other tools and resources in a big data ecosystem, according to the company.
The platform uses Apache Spark, and Oracle claims it's designed to speed time to completion, make big data more accessible to business users across an organization and decrease the risks associated with big data projects.
Big Data Discovery provides rapid visual access to data through an interactive catalog of the data; loads local data from Excel and CSV files through self-service wizards; provides data set summaries, annotations from other users, and recommendations for related data sets; and enables search and guided navigation.
Together with statistics about each individual attribute in any data set, these capabilities expose the shape of the data, according to Oracle, enabling users to understand data quality, detect anomalies, uncover outliers and ultimately determine potential. Organizations can use the platform to visualize attributes by data type; glean which are the most relevant; sort attributes by potential, so the most meaningful information displays first; and use a scratchpad to uncover potential patterns and correlations between attributes.
Qlik View Sense
Qlik Sense is Qlik's next-generation data discovery software platform for self-service BI. It supports a full range of analytics use cases including self-service visualization and exploration, guided analytics applications and dashboards, custom and embedded analytics, mobile analytics, and reporting, all within a governed, multi-cloud architecture.
It offers analytics capabilities for all types of users, including associative exploration and search, smart visualizations, self-service creation and data preparation, geographic analysis, collaboration, storytelling, and reporting. The platform also offers fully interactive online and offline mobility and an insight advisor that generates relevant charts and insights using AI.
The product can readily integrate streaming data sources from IoT, social media and messaging with at-rest data for real-time contextual analysis.
Freely distributed accelerators include product templates to help users get to production quickly.
Tibco's Insight Platform combines live streaming data with queries on large at-rest volumes. Historical patterns are interactively identified with Spotfire, running directly against Hadoop and Spark. The Insight Platform can then apply these patterns to streaming data for predictive and operational insights.
For the enterprise, Qlik Sense provides a platform that includes open and standard APIs for customization and extension, data integration scripting, broad data connectivity and data-as-a-service, centralized management and governance, and a multi-cloud architecture for scalability across on-premises environments, as well as private and public cloud environments.
Qlik Sense runs on the patented Qlik Associative Engine, which allows users to explore information without query-based tools. And the new Qlik cognitive engine works with the associative engine to augment the user, offering insight suggestions and automation in context with user behavior.
Qlik Sense is available in cloud and enterprise editions.
Salesforce Einstein Discovery
Salesforce's Einstein Discovery, an AI-powered feature within the Einstein Analytics portfolio, allows business users to automatically analyze millions of data points to understand their current business, explore historical trends, and automatically receive guided recommendations on what they can do to expand deals or resolve customer service cases faster.
Einstein Discovery for Analysts lets users analyze data in Salesforce CRM, CSV files or data from external data sources. In addition, users can take advantage of smart data preparation capabilities to make data improvements, run analyses to create stories, further explore these stories in Einstein Analytics for advanced visualization capabilities, and push insights into Salesforce objects for all business users.
Einstein Discovery for Business Users provides access to insights in natural language and into Salesforce -- within Sales Cloud or Service Cloud, for example. Einstein Discovery for Analysts is available for $2,000 per user, per month. Einstein Discovery for Business Users is $75 per user, per month.
SAS Visual Analytics
SAS Institute Inc.'s Visual Analytics on SAS Viya provides interactive data visualizations to help users explore and better understand data.
The product provides a scalable, in-memory engine along with a user-friendly interface, SAS claims. The combination of interactive data exploration, dashboards, reporting and analytics is designed to help business users find valuable insights without coding. Any user can assess probable outcomes and make more informed, data-driven decisions.
SAS Visual Analytics capabilities include:
automated forecasting, so users can select the most appropriate forecasting method to suit the data;
scenario analysis, which identifies important variables and how changes to them can influence forecasts;
goal-seeking to determine the values of underlying factors that would be required to achieve the target forecast; and
decision trees, allowing users to create a hierarchical segmentation of the data based on a series of rules applied to each observation.
Other features include network diagrams so users can see how complex data is interconnected; path analysis, which displays the flow of data from one event to another as a series of paths; and text analysis, which applies sentiment analysis to video, social media streams or customer comments to provide quick insights into what's being discussed online.
SAP Analytics Cloud
SAP's Analytics Cloud service offers analytics capabilities for all users in one data discovery software product, including discovery, analysis, planning, predicting and collaborating, in one integrated cloud platform, according to SAP.
The service gives users business insights based on its ability to turn embedded data analytics into business applications, the company claims.
Among the potential benefits:
enhanced user experience with the service's visualization and role-based personalization features;
better business results from deep collaboration and informed decisions due to SAP's ability to integrate with existing on-premises applications; and
simplified data across an organization to ensure faster, fact-based decision-making.
In addition, Analytics Cloud is free from operating system constraints, download requirements and setup tasks. It provides real-time analytics and extensibility using SAP Cloud Platform, which can reduce the total cost of ownership because all the features are offered in one SaaS product for all users.
Sisense Ltd. is an end-to-end platform that ingests data from a variety of sources before analyzing, mashing and visualizing it. Its open API framework also enables a high degree of customization without the input of designers, data scientists or IT specialists, according to Sisense.
The Sisense analytics engine runs 10 to 100 times faster than in-memory platforms, according to the company, dealing with terabytes of data and potentially eliminating onerous data preparation work. The platform provides business insights augmented by machine learning and anomaly detection. In addition, the analytics tool offers the delivery of insights beyond the dashboard, offering new forms of BI access, including chatbots and autonomous alerts.
Tableau Software Inc.'s Desktop is a visual analytics and data discovery software platform that lets users see and understand their data with drag-and-drop simplicity, according to the company. Users can create interactive visualizations and dashboards to gain immediate insights without the need for any programming. They can then share their findings with colleagues.
Tableau Desktop can connect to an organization's data in the cloud, on premises or both using one of 75 native data connectors or Tableau's Web Data Connector. This includes connectors to cloud data sources from cloud databases such as Amazon Redshift, Google BigQuery, SQL Server, SAP and Oracle, plus applications such as Salesforce and ServiceNow.
Tibco Software Inc.'s Spotfire is an enterprise analytics platform that connects to and blends data from files, relational and NoSQL databases, OLAP, Hadoop and web services, as well as to cloud applications such as Google Analytics and Salesforce.
Operational intelligence (OI) is an approach to data analysis that enables decisions and actions in business operations to be based on real-time data as it's generated or collected by companies. Typically, the data analysis process is automated, and the resulting information is integrated into operational systems for immediate use by business managers and workers.
OI applications are primarily targeted at front-line workers who, hopefully, can make better-informed business decisions or take faster action on issues if they have access to timely business intelligence (BI) and analytics data. Examples include call-center agents, sales representatives, online marketing teams, logistics planners, manufacturing managers and medical professionals. In addition, operational intelligence can be used to automatically trigger responses to specified events or conditions.
What is now known as OI evolved from operational business intelligence, an initial step focused more on applying traditional BI querying and reporting. OI takes the concept to a higher analytics level, but operational BI is sometimes still used interchangeably with operational intelligence as a term.
How operational intelligence works
In most OI initiatives, data analysis is done in tandem with data processing or shortly thereafter, so workers can quickly identify and act on problems and opportunities in business operations. Deployments often include real-time business intelligence systems set up to analyze incoming data, plus real-time data integration tools to pull together different sets of relevant data for analysis.
Stream processing systems and big data platforms, such as Hadoop and Spark, can also be part of the OI picture, particularly in applications that involve large amounts of data and require advanced analytics capabilities. In addition, various IT vendors have combined data streaming, real-time monitoring and data analytics tools to create specialized operational intelligence platforms.
As data is analyzed, organizations often present operational metrics, key performance indicators (KPIs) and business insights to managers and other workers in interactive dashboards that are embedded in the systems they use as part of their jobs; data visualizations are usually included to help make the information easy to understand. Alerts can also be sent to notify users of developments and data points that require their attention, and automated processes can be kicked off if predefined thresholds or other metrics are exceeded, such as stock trades being spurred by prices hitting particular levels.
Operational intelligence uses and examples
Stock trading and other types of investment management are prime candidates for operational intelligence initiatives because of the need to monitor huge volumes of data in real time and respond rapidly to events and market trends. Customer analytics is another area that's ripe for OI. For example, online marketers use real-time tools to analyze internet clickstream data, so they can better target marketing campaigns to consumers. And cable TV companies track data from set-top boxes in real time to analyze the viewing activities of customers and how the boxes are functioning.
The growth of the internet of things has sparked operational intelligence applications for analyzing sensor data being captured from manufacturing machines, pipelines, elevators and other equipment; that enables predictive maintenance efforts designed to detect potential equipment failures before they occur. Other types of machine data also fuel OI applications, including server, network and website logs that are analyzed in real time to look for security threats and IT operations issues.
There are less grandiose operational intelligence use cases, as well. That includes the likes of call-center applications that provide operators with up-to-date customer records and recommend promotional offers while they're on the phone with customers, as well as logistics ones that help calculate the most efficient driving routes for fleets of delivery vehicles.
OI benefits and challenges
The primary benefit of OI implementations is the ability to address operational issues and opportunities as they arise -- or even before they do, as in the case of predictive maintenance. Operational intelligence also empowers business managers and workers to make more informed -- and hopefully better -- decisions on a day-by-day basis. Ultimately, if managed successfully, the increased visibility and insight into business operations can lead to higher revenue and competitive advantages over rivals.
But there are challenges. Building operational intelligence architecture typically involves piecing together different technologies, and there are numerous data processing platforms and analytics tools to choose between, some of which may require new skills in organizations. High performance and sufficient scalability are also needed to handle the real-time workloads and large volumes of data common in OI applications without choking the system.
Also, most business processes at a typical company don't require real-time data analysis. With that in mind, a key part of operational intelligence projects involves determining which end users need up-to-the-minute data and then training them to handle the information once it starts being delivered to them in that fashion.
Operational intelligence vs. business intelligence
Conventional BI systems support the analysis of historical data that has been cleansed and consolidated in a data warehouse or data mart before being made available for business analytics uses. BI applications generally aim to tell corporate executives and business managers what happened in the past on revenues, profits and other KPIs to aid in budgeting and strategic planning.
Early on, BI data was primarily distributed to users in static operational reports. That's still the case in some organizations, although many have shifted to dashboards with the ability to drill down into data for further analysis. In addition, self-service BI tools let users run their own queries and create data visualizations on their own, but the focus is still mostly on analyzing data from the past.
Operational intelligence systems let business managers and front-line workers see what's currently happening in operational processes and then immediately act upon the findings, either on their own or through automated means. The purpose is not to facilitate planning, but to drive operational decisions and actions in the moment.