Live Chat Software by Kayako
 News Categories
(19)Microsoft Technet (2)StarWind (6)TechRepublic (3)ComuterTips (1)SolarWinds (1)Xangati (1) (27)VMware (8)NVIDIA (9)VDI (1)pfsense vRouter (3)VEEAM (3)Google (2)RemoteFX (1) (1)MailCleaner (1)Udemy (1)AUGI (2)AECbytes Architecture Engineering Constrution (7)VMGuru (2)AUTODESK (5) (1)Atlantis Blog (18)AT.COM (2) (1) (16) (2)hadoop360 (3)bigdatastudio (1) (1) (3)VECITA (1) (1)Palo Alto Networks (4) (2) (1)Nhịp Cầu đầu tư (3)VnEconomy (1)Reuters (1)Tom Tunguz (1) (1)Esri (1) (1)tweet (1)Tesla (1) (6)ITCNews (1) (1) Harvard Business Review (1)Haravan (2) (1) (3) (3)IBM (1) (2) (1) (8) (1) (1) (4) (1) (1) (1) (1) (1) (1) (1) (5) (4) (1) (1) (1) (1) (2) (22) (1) (1) (1) (1) (2) (1) (3) (2) (2) (1) (19) (1) (1) (1) (1) (1) (1) (2)Engenius (1) (1) (1) (1) (1) (3) (6) (1)
RSS Feed
Latest Updates
New Quantum backup appliance brings Veeam to tape
Posted by Thang Le Toan on 17 August 2018 12:16 AM

Integrated disk backup appliances are now common, but Quantum and Veeam have taken the converged architecture to tape backups to eliminate the need for a dedicated external server.

Quantum and Veeam today launched what they call a converged tape appliance, which integrates Veeam data protection software with Quantum backup.

Until now, Veeam Backup and Replication users could only create tape backups by connecting a dedicated external server to host the Veeam software. The new Quantum backup appliance removed that layer of complexity by installing a blade server into its Scalar i3 tape library. The server is preconfigured to be Veeam-ready, with Windows Server 2016 -- Veeam's server OS of choice -- preinstalled.

By installing a server to a tape device, Quantum has created the industry's first ever tape appliance with built-in compute power.

"In some ways, this is a new category of product," said Eric Bassier, senior director of product management and marketing at Quantum, based in Colorado Springs, Colo. "In Veeam environments, every other tape vendor's tape requires a dedicated external server that runs the Veeam tape server software ... We took that server, we built it into the tape library so that we eliminate that physical server, and we make it that much simpler and faster to create tape for offline protection."

Customers can buy a device with one IBM SAS LTO-8 tape drive for $17,000 or a two-drive version for $23,000.

Tape storage has been around for a long time and "still remains one of the lowest-cost, long-term ways to store your data," said Ken Ringdahl, vice president of global alliance architecture at Veeam, based in Baar, Switzerland. But Bassier stressed the new Quantum backup appliance's true role in the modern backup and recovery system is it protects against ransomware.

Render of Quantum Scalar i3 tape appliance


Quantum's Scalar converged tape appliance for Veeam

"It's offline. Data stored on tape is not connected to the network in any way. Because it's offline, it is the best and most effective protection against ransomware," Bassier said.

The ransomware threat has brought on a renewed interest in tape for backup.

Edwin Yuen, senior analyst at Enterprise Strategy Group in Milford, Mass., said ransomware has gotten more sophisticated over the past 12 to 18 months, and tape provides an offline backup method.

"Ransomware is not an acute event," Yuen said. "You're getting infected, and it's sitting there, waiting. Oftentimes, it's mapping out or finding other backups or other restores."

Storing data offline in a tape cartridge like the new Quantum backup option provides an air gap between live production systems and backed up data that is not possible to achieve with disk. That air gap can prevent ransomware from infecting live data.

If you really think about tape, it's one of those technologies that got dismissed, but never actually went away.
Edwin Yuensenior analyst, Enterprise Strategy Group

"If you really think about tape, it's one of those technologies that got dismissed, but never actually went away. It was consistently used; it just wasn't in vogue, so to speak. But there's certainly been a renewed interest in new uses for tape," Yuen said. "This integration by Quantum and Veeam really makes it a lot easier to bring tape into this configuration, so you can take advantage of that air gap."

According to Yuen, thanks to market maturity and the age of magnetic tape technology, there are now only a few major companies that manufacture tape libraries. This is why Yuen said he finds the partnership between Quantum and Veeam especially noteworthy, as it demonstrates a relatively young company showing interest in tape.

"The fact that these two companies came together shows interest across the board," Yuen said. "It's not a 20-year standard industry company, but one that's been an up-and-comer now getting into the tape market through this appliance."

Read more »

Network Management Software Become a Master (EzMaster 0.13.13)
Posted by Thang Le Toan on 17 August 2018 12:12 AM


Save Time, Build Revenue and Improve Efficiencies

Complete Scalability

Start small and grow; control an office network or multiple networks in separate buildings. Expand and easily see your devices across town, states or the country.

Unlimited Flexibility

EnGenius hardware and ezMaster network management software combineto add flexibility and management simplicity when you need it.

Unmatched Affordability

Affordable, predictable costs and a lower TCO per deployment.No per AP licensing and annual subscription fees.


Rich Reporting & Analytics

Pinpoint & address potential problems before they affect users with invaluable reporting, analytics & real-time monitoring tools.


New ezMaster Users

Manage, Monitor & Troubleshoot Neutron and EnTurbo Hardware, download ezMaster today

  1. If you have No Virtual Machine currently running on your PC or Server
  2. If you are currently running on a virtualized platform of VMWare Player 7 or Oracle Virtual Box
  3. If you are currently running on a virtualized platform of VMWare ESXi vSphere
  4. If you are currently running on a virtualized platform of 2012R2 Hyper-V
  5. If you are currently running on a virtualized platform of Win10 64-bit Hyper-V


Existing ezMaster Users

Make sure you have the newest features & update to the latest version of ezMaster


Read more »

non-volatile storage (NVS)
Posted by Thang Le Toan on 16 August 2018 05:26 AM

Non-volatile storage (NVS) is a broad collection of technologies and devices that do not require a continuous power supply to retain data or program code persistently on a short- or long-term basis.


Three common examples of NVS devices that persistently store data are tape, a hard disk drive (HDD) and a solid-state drive (SSD). The term non-volatile storage also applies to the semiconductor chips that store the data or controller program code within devices such as SSDs, HDDs, tape drives and memory modules.

Many types of non-volatile memory chips are in use today. For instance, NAND flash memory chips commonly store data in SSDs in enterprise and personal computer systems, USB sticks, and memory cards in consumer devices such as mobile telephones and digital cameras. NOR flash memory chips commonly store controller code in storage drives and personal electronic devices.

Non-volatile storage technologies and devices vary widely in the manner and speed in which they transfer data to and retrieve data or program code from a chip or device. Other differentiating factors that have a significant impact on the type of non-volatile storage a system manufacturer or user chooses include cost, capacity, endurance and latency.

For example, an SSD equipped with NAND flash memory chips can program, or write, and read data faster and at lower latency through electrical mechanisms than a mechanically addressed HDD or tape drive that uses a head to write and read data to magnetic storage media. However, the per-bit price to store data in a flash-based SSD is generally higher than the per-bit cost of an HDD or tape drive, and flash SSDs can sustain a limited number of write cycles before they wear out.

Volatile vs. non-volatile storage devices

The key difference between volatile and non-volatile storage devices is whether or not they are able to retain data in the absence of a power supply. Volatile storage devices lose data when power is interrupted or turned off. By contrast, non-volatile devices are able to keep data regardless of the status of the power source.

Common types of volatile storage include static random access memory (SRAM) and dynamic random access memory (DRAM). Manufacturers may add battery power to a volatile memory device to enable it to persistently store data or controller code.

Enterprise and consumer computing systems often use a mix of volatile and non-volatile memory technologies, and each memory type has advantages and disadvantages. For instance, SRAM is faster than DRAM and well suited to high-speed caching. DRAM is less expensive to produce and requires less power than SRAM, and manufacturers often use it to store program code that a computer needs to operate.

Comparison of non-volatile memory types

By contrast, non-volatile NAND flash is slower than SRAM and DRAM, but it is cheaper to produce. Manufacturers commonly use NAND flash memory to store data persistently in business systems and consumer devices. Storage devices such as flash-based SSDs access data at a block level, whereas SRAM and DRAM support random data access at a byte level.

Like NAND, NOR flash is less expensive to produce than volatile SRAM and DRAM. NOR flash costs more than NAND flash, but it can read data faster than NAND, making it a common choice to boot consumer and embedded devices and to store controller code in SSDs, HDDs and tape drives. NOR flash is generally not used for long-term data storage due to its poor endurance.

Trends and future directions

Manufacturers are working on additional types of non-volatile storage to try to lower the per-bit cost to store data and program code, improve performance, increase endurance levels and reduce power consumption.

For instance, manufacturers developed 3D NAND flash technology in response to physical scaling limitations of two-dimensional, or planar, NAND flash. They are able to reach higher densities at a lower cost per bit by vertically stacking memory cells with 3D NAND technology than they can by using a single layer of memory cells with planar NAND.

NVM use cases

Emerging 3D XPoint technology, co-developed by Intel Corp. and Micron Technology Inc., offers higher throughput, lower latency, greater density and improved endurance over more commonly used NAND flash technology. Intel ships 3D XPoint technology under the brand name Optane in SSDs and in persistent memory modules intended for data center use. Persistent memory modules are also known as storage class memory.

3D XPoint non-volatile technology

Micron Technology Inc.

An image of a 3D XPoint technology die.

Using non-volatile memory express (NVMe) technology over a computer's PCI Express (PCIe) bus in conjunction with flash storage and newer options such as 3D XPoint can further accelerate performance, and reduce latency and power consumption. NVMe offers a more streamlined command set to process input/output (I/O) requests with PCIe-based SSDs than the Small Computer System Interface (SCSI) command set does with Serial Attached SCSI (SAS) storage drives and the analog telephone adapter (ATA) command set does with Serial ATE (SATA) drives.

Everspin Technologies DDR3 ST-MRAM storage.

Everspin Technologies Inc.

Everspin's EMD3D064M 64 Mb DDR3 ST-MRAM in a Ball Grid Array package.

Emerging non-volatile storage technologies currently in development or in limited use include ferroelectric RAM (FRAM or FeRAM), magnetoresistive RAM (MRAM), phase-change memory (PCM), resistive RAM (RRAM or ReRAM) and spin-transfer torque magnetoresistive RAM (STT-MRAM or STT-RAM).

Read more »

SSD write cycle
Posted by Thang Le Toan on 16 August 2018 05:24 AM

An SSD write cycle is the process of programming data to a NAND flash memory chip in a solid-state storage device.

A block of data stored on a flash memory chip must be electrically erased before new data can be written, or programmed, to the solid-state drive (SSD). The SSD write cycle is also known as the program/erase (P/E) cycle.

When an SSD is new, all of the blocks are erased and new, incoming data is directly written to the flash media. Once the SSD has filled all of the free blocks on the flash storage media, it must erase previously programmed blocks to make room for new data to be written. Blocks that contain valid, invalid or unnecessary data are copied to different blocks, freeing the old blocks to be erased. The SSD controller periodically erases the invalidated blocks and returns them into the free block pool.

The background process an SSD uses to clean out the unnecessary blocks and make room for new data is called garbage collection. The garbage collection process is generally invisible to the user, and the programming process is often identified simply as a write cycle, rather than a write/erase or P/E cycle.

Why write cycles are important

A NAND flash SSD is able to endure only a limited number of write cycles. The program/erase process causes a deterioration of the oxide layer that traps electrons in a NAND flash memory cell, and the SSD will eventually become unreliable, wear out and lose its ability to store data.

The number of write cycles, or endurance, varies based on the type of NAND flash memory cell. An SSD that stores a single data bit per cell, known as single-level cell (SLC) NAND flash, can typically support up to 100,000 write cycles. An SSD that stores two bits of data per cell, commonly referred to as multi-level cell (MLC) flash, generally sustains up to 10,000 write cycles with planar NAND and up to 35,000 write cycles with 3D NAND. The endurance of SSDs that store three bits of data per cell, called triple-level cell (TLC) flash, can be as low as 300 write cycles with planar NAND and as high as 3,000 write cycles with 3D NAND. The latest quadruple-level cell (QLC) NAND will likely support a maximum of 1,000 write cycles.

Comparison of NAND flash memory

As the number of bits per NAND flash memory cell increases, the cost per gigabyte (GB) of the SSD declines. However, the endurance and the reliability of the SSD are also lower.

NAND flash writes



Common write cycle problems

Challenges that SSD manufacturers have had to address to use NAND flash memory to store data reliably over an extended period of time include cell-to-cell interference as the dies get smaller, bit failures and errors, slow data erases and write amplification.

Manufacturers have enhanced the endurance and reliability of all types of SSDs through controller software-based mechanisms such as wear-leveling algorithms, external data buffering, improved error correction code (ECC) and error management, data compression, overprovisioning, better internal NAND management and block wear-out feedback. As a result, flash-based SSDs have not worn out as quickly as users once feared they would.

Vendors commonly offer SSD warranties that specify a maximum number of device drive writes per day (DWPD) or terabytes written (TBW). DWPD is the number of times the entire capacity of the SSD can be overwritten on a daily basis during the warranty period. TBW is the total amount of data that an SSD can write before it is likely to fail. Vendors of flash-based systems and SSDs often offer guarantees of five years or more on their enterprise drives.

Manufacturers sometimes specify the type of application workload for which an SSD is designed, such as write-intensive, read-intensive or mixed-use. Some vendors allow the customer to select the optimal level of endurance and capacity for a particular SSD. For instance, an enterprise user with a high-transaction database might opt for a greater DWPD number at the expense of capacity. Or a user operating a database that does infrequent writes might choose a lower DWPD and a higher capacity.

Read more »

cache memory
Posted by Thang Le Toan on 16 August 2018 05:11 AM

Cache memory, also called CPU memory, is high-speed static random access memory (SRAM) that a computer microprocessor can access more quickly than it can access regular random access memory (RAM). This memory is typically integrated directly into the CPU chip or placed on a separate chip that has a separate bus interconnect with the CPU. The purpose of cache memory is to store program instructions and data that are used repeatedly in the operation of programs or information that the CPU is likely to need next. The computer processor can access this information quickly from the cache rather than having to get it from computer's main memory. Fast access to these instructions increases the overall speed of the program.

As the microprocessor processes data, it looks first in the cache memory. If it finds the instructions or data it's looking for there from a previous reading of data, it does not have to perform a more time-consuming reading of data from larger main memory or other data storage devices. Cache memory is responsible for speeding up computer operations and processing.

Once they have been opened and operated for a time, most programs use few of a computer's resources. That's because frequently re-referenced instructions tend to be cached. This is why system performance measurements for computers with slower processors but larger caches can be faster than those for computers with faster processors but less cache space.

This CompTIA A+ video tutorial explains
cache memory.

Multi-tier or multilevel caching has become popular in server and desktop architectures, with different levels providing greater efficiency through managed tiering. Simply put, the less frequently certain data or instructions are accessed, the lower down the cache level the data or instructions are written.

Implementation and history

Mainframes used an early version of cache memory, but the technology as it is known today began to be developed with the advent of microcomputers. With early PCs, processor performance increased much faster than memory performance, and memory became a bottleneck, slowing systems.

In the 1980s, the idea took hold that a small amount of more expensive, faster SRAM could be used to improve the performance of the less expensive, slower main memory. Initially, the memory cache was separate from the system processor and not always included in the chipset. Early PCs typically had from 16 KB to 128 KB of cache memory.

With 486 processors, Intel added 8 KB of memory to the CPU as Level 1 (L1) memory. As much as 256 KB of external Level 2 (L2) cache memory was used in these systems. Pentium processors saw the external cache memory double again to 512 KB on the high end. They also split the internal cache memory into two caches: one for instructions and the other for data.

Processors based on Intel's P6 microarchitecture, introduced in 1995, were the first to incorporate L2 cache memory into the CPU and enable all of a system's cache memory to run at the same clock speed as the processor. Prior to the P6, L2 memory external to the CPU was accessed at a much slower clock speed than the rate at which the processor ran, and slowed system performance considerably.

Early memory cache controllers used a write-through cache architecture, where data written into cache was also immediately updated in RAM. This approached minimized data loss, but also slowed operations. With later 486-based PCs, the write-back cache architecture was developed, where RAM isn't updated immediately. Instead, data is stored on cache and RAM is updated only at specific intervals or under certain circumstances where data is missing or old.

Cache memory mapping

Caching configurations continue to evolve, but cache memory traditionally works under three different configurations:

  • Direct mapped cache has each block mapped to exactly one cache memory location. Conceptually, direct mapped cache is like rows in a table with three columns: the data block or cache line that contains the actual data fetched and stored, a tag with all or part of the address of the data that was fetched, and a flag bit that shows the presence in the row entry of a valid bit of data.
  • Fully associative cache mapping is similar to direct mapping in structure but allows a block to be mapped to any cache location rather than to a prespecified cache memory location as is the case with direct mapping.
  • Set associative cache mapping can be viewed as a compromise between direct mapping and fully associative mapping in which each block is mapped to a subset of cache locations. It is sometimes called N-way set associative mapping, which provides for a location in main memory to be cached to any of "N" locations in the L1 cache.

Format of the cache hierarchy

Cache memory is fast and expensive. Traditionally, it is categorized as "levels" that describe its closeness and accessibility to the microprocessor.

cache memory diagram

L1 cache, or primary cache, is extremely fast but relatively small, and is usually embedded in the processor chip as CPU cache.

L2 cache, or secondary cache, is often more capacious than L1. L2 cache may be embedded on the CPU, or it can be on a separate chip or coprocessor and have a high-speed alternative system bus connecting the cache and CPU. That way it doesn't get slowed by traffic on the main system bus.

Level 3 (L3) cache is specialized memory developed to improve the performance of L1 and L2. L1 or L2 can be significantly faster than L3, though L3 is usually double the speed of RAM. With multicore processors, each core can have dedicated L1 and L2 cache, but they can share an L3 cache. If an L3 cache references an instruction, it is usually elevated to a higher level of cache.

In the past, L1, L2 and L3 caches have been created using combined processor and motherboard components. Recently, the trend has been toward consolidating all three levels of memory caching on the CPU itself. That's why the primary means for increasing cache size has begun to shift from the acquisition of a specific motherboard with different chipsets and bus architectures to buying a CPU with the right amount of integrated L1, L2 and L3 cache.

Contrary to popular belief, implementing flash or more dynamic RAM (DRAM) on a system won't increase cache memory. This can be confusing since the terms memory caching (hard disk buffering) and cache memory are often used interchangeably. Memory caching, using DRAM or flash to buffer disk reads, is meant to improve storage I/O by caching data that is frequently referenced in a buffer ahead of slower magnetic disk or tape. Cache memory, on the other hand, provides read buffering for the CPU.

Specialization and functionality

In addition to instruction and data caches, other caches are designed to provide specialized system functions. According to some definitions, the L3 cache's shared design makes it a specialized cache. Other definitions keep instruction caching and data caching separate, and refer to each as a specialized cache.

Translation lookaside buffers (TLBs) are also specialized memory caches whose function is to record virtual address to physical address translations.

Still other caches are not, technically speaking, memory caches at all. Disk caches, for instance, can use RAM or flash memory to provide data caching similar to what memory caches do with CPU instructions. If data is frequently accessed from disk, it is cached into DRAM or flash-based silicon storage technology for faster access time and response.

SSD caching vs. primary storage
SSD caching vs. primary storage
Current Time 0:00
Duration Time 3:00
SSD caching vs. primary storage

Dennis Martin, founder and president of Demartek LLC, explains the pros and cons of using solid-state drives as cache and as primary storage.

Specialized caches are also available for applications such as web browsers, databases, network address binding and client-side Network File System protocol support. These types of caches might be distributed across multiple networked hosts to provide greater scalability or performance to an application that uses them.


The ability of cache memory to improve a computer's performance relies on the concept of locality of reference. Locality describes various situations that make a system more predictable, such as where the same storage location is repeatedly accessed, creating a pattern of memory access that the cache memory relies upon.

There are several types of locality. Two key ones for cache are temporal and spatial. Temporal locality is when the same resources are accessed repeatedly in a short amount of time. Spatial locality refers to accessing various data or resources that are in close proximity to each other.

Cache vs. main memory

DRAM serves as a computer's main memory, performing calculations on data retrieved from storage. Both DRAM and cache memory are volatile memories that lose their contents when the power is turned off. DRAM is installed on the motherboard, and the CPU accesses it through a bus connection.

Dynamic RAM


An example of dynamic RAM.

DRAM is usually about half as fast as L1, L2 or L3 cache memory, and much less expensive. It provides faster data access than flash storage, hard disk drives (HDDs) and tape storage. It came into use in the last few decades to provide a place to store frequently accessed disk data to improve I/O performance.

DRAM must be refreshed every few milliseconds. Cache memory, which also is a type of random access memory, does not need to be refreshed. It is built directly into the CPU to give the processor the fastest possible access to memory locations, and provides nanosecond speed access time to frequently referenced instructions and data. SRAM is faster than DRAM, but because it's a more complex chip, it's also more expensive to make.

Comparison of memory types

Cache vs. virtual memory

A computer has a limited amount of RAM and even less cache memory. When a large program or multiple programs are running, it's possible for memory to be fully used. To compensate for a shortage of physical memory, the computer's operating system (OS) can create virtual memory.

To do this, the OS temporarily transfers inactive data from RAM to disk storage. This approach increases virtual address space by using active memory in RAM and inactive memory in HDDs to form contiguous addresses that hold both an application and its data. Virtual memory lets a computer run larger programs or multiple programs simultaneously, and each program operates as though it has unlimited memory.

Virtual memory in the memory hierarchy
Where virtual memory fits in the memory hierarchy.

In order to copy virtual memory into physical memory, the OS divides memory into pagefiles or swap files that contain a certain number of addresses. Those pages are stored on a disk and when they're needed, the OS copies them from the disk to main memory and translates the virtual addresses into real addresses.

Read more »

all-flash array (AFA)
Posted by Thang Le Toan on 16 August 2018 05:09 AM

An all-flash array (AFA), also known as a solid-state storage disk system, is an external storage array that uses only flash media for persistent storage. Flash memory is used in place of the spinning hard disk drives (HDDs) that have long been associated with networked storage systems.

Vendors that sell all-flash arrays usually allow customers to mix flash and disk drives in the same chassis, a configuration known as a hybrid array. However, those products often represent the vendor's attempt to retrofit an existing disk array by replacing some of the media with flash.

All-flash array design: Retrofit or purpose-built

Other vendors sell purpose-built systems designed natively from the ground up to only support flash. These models also embed a broad range of software-defined storage features to manage data on the array.

A defining characteristic of an AFA is the inclusion of native software services that enable users to perform data management and data protection directly on the array hardware. This is different from server-side flash installed on a standard x86 server. Inserting flash storage into a server is much cheaper than buying an all-flash array, but it also requires the purchase and installation of third-party management software to supply the needed data services.

Leading all-flash vendors have written algorithms for array-based services for data management, including clones, compression and deduplication -- either an inline or post-process operation -- snapshots, replication, and thin provisioning.

As with its disk-based counterpart, an all-flash array provides shared storage in a storage area network (SAN) or network-attached storage (NAS) environment.

How an all-flash array differs from disk

Flash memory, which has no moving parts, is a type of nonvolatile memory that can be erased and reprogrammed in units of memory called blocks. It is a variation of erasable programmable read-only memory (EEPROM), which got its name because the memory blocks can be erased with a single action, or flash. A flash array can transfer data to and from solid-state drives (SSDs) much faster than electromechanical disk drives.

The advantage of an all-flash array, relative to disk-based storage, is full bandwidth performance and lower latency when an application makes a query to read the data. The flash memory in an AFA typically comes in the form of SSDs, which are similar in design to an integrated circuit.

Pure FlashBlade

Pure Storage

Image of a Pure Storage FlashBlade enterprise storage array

Flash is more expensive than spinning disk, but the development of multi-level cell (MLC) flash, triple-level cell (TLC) NAND flash and 3D NAND flash has lowered the cost. These technologies enable greater flash density without the cost involved in shrinking NAND cells.

MLC flash is slower and less durable than single-level cell (SLC) flash, but companies have developed software that improves its wear level to make MLC acceptable for enterprise applications. SLC flash remains the choice for applications with the highest I/O requirements, however. TLC flash reduces the price more than MLC, although it also comes with performance and durability tradeoffs that can be mitigated with software. Vendor products that support TLC SSDs include the Dell EMC SC Series and Kaminario K2 arrays.

Considerations for buying an all-flash array

Deciding to buy an AFA involves more than simple comparisons of vendor products. An all-flash array that delivers massive performance increases to a specific set of applications may not provide equivalent benefits to other workloads. For example, running virtualized applications in flash with inline data deduplication and compression tends to be more cost-effective than flash that supports streaming media in which unique files are uncompressible.

An all-SSD system will produce smaller variations than that of an HDD array in maximum, minimum and average latencies. This makes flash a good fit for most read-intensive applications.

The tradeoff comes in write amplification, which relates to how an SSD will rewrite data to erase an entire block. Write-intensive workloads require a special algorithm to collect all the writes on the same block of the SSD, thus ensuring the software always writes multiple changes to the same block.

Garbage collection can present a similar issue with SSDs. A flash cell can only withstand a limited number of writes, so wear leveling can be used to increase flash endurance. Most vendors design their all-flash systems to minimize the impact of garbage collection and wear leveling, although users with write-intensive workloads may wish to independently test a vendor's array to determine the best configuration.

Despite paying a higher upfront price for the system, users who buy an AFA may see the cost of storage decline over time. This is tied to an all-flash array's increased CPU utilization, which means an organization will need to buy fewer application servers.

The physical size of an AFA is smaller than that of a disk array, which lowers the rack count. Having fewer racks in a system also reduces the heat generated and the cooling power consumed in the data center.

All-flash array vendors, products and markets

Flash was first introduced as a handful of SSDs in otherwise all-HDD systems with the purpose to create a small flash tier to accelerate a few critical applications. Thus was born the hybrid flash array.

The next phase of evolution arrived with the advent of software that enabled an SSD to serve as a front-end cache for disk storage, extending the benefit of faster performance across all the applications running on the array.

The now-defunct vendor Fusion-io was an early pioneer of fast flash. Launched in 2005, Fusion-io sold Peripheral Component Interface Express (PCIe) cards packed with flash chips. Inserting the PCIe flash cards in server slots enabled a data center to boost the performance of traditional server hardware. Fusion-io was acquired by SanDisk in 2014, which itself was subsequently acquired by Western Digital Corp.

Also breaking ground early was Violin, whose systems -- designed with custom-built silicon -- gained customers quickly, fueling its rise in public markets in 2013. By 2017, Violin was surpassed by all-flash competitors whose arrays integrated sophisticated software data services. After filing for bankruptcy, the vendor was relaunched by private investors as Violin Systems in 2018, with a focus on selling all-flash storage to managed service providers.

comparison of all-flash storage arrays
Independent analyst Logan G. Harbaugh compares various all-flash arrays. This chart was created in August 2017.

All-flash array vendors, such as Pure Storage and XtremIO -- part of Dell EMC -- were among the earliest to incorporate inline compression and data deduplication, which most other vendors now include as a standard feature. Adding deduplication helped give AFAs the opportunity for price parity with storage based on cheaper rotating media.

A sampling of leading all-flash array products includes the following:

  • Dell EMC VMAX
  • Dell EMC Unity
  • Dell EMC XtremIO
  • Dell EMC Isilon NAS
  • Fujitsu Eternus AF
  • Hewlett Packard Enterprise (HPE) 3PAR StoreServ
  • HPE Nimble Storage AF series
  • Hitachi Vantara Virtual Storage Platform
  • Huawei OceanStor
  • IBM FlashSystem V9000
  • IBM Storwize 5000 and Storwize V7000F
  • Kaminario K2
  • NetApp All-Flash Fabric-Attached Array (NetApp AFF)
  • NetApp SolidFire family -- including NetApp HCI
  • Pure Storage FlashArray
  • Pure FlashBlade NAS/object storage array
  • Tegile Systems T4600 -- bought in 2017 by Western Digital
  • Tintri EC Series

Impact on hybrid arrays use cases

Falling flash prices, data growth and integrated data services have increased the appeal of all-flash arrays for many enterprises. This has led to industry speculation that all-flash storage can supplant hybrid arrays, although there remain good reasons to consider using a hybrid storage infrastructure.

HDDs offer predictable performance at a fairly low cost per gigabyte, although they use more power and are slower than flash, resulting in a high cost per IOPS. All-flash arrays also have a lower cost per IOPS, coupled with the advantages of speed and lower power consumption, but they carry a higher upfront acquisition price and per-gigabyte cost.

AFA vs. hybrid array

A hybrid flash array enables enterprises to strike a balance between relatively low cost and balanced performance. Since a hybrid array supports high-capacity disk drives, it offers greater total storage than an AFA.

All-flash NVMe and NVMe over Fabrics

All-flash arrays based on nonvolatile memory express (NVMe) flash technologies represent the next phase of maturation. The NVMe host controller interface speeds data transfer by enabling an application to communicate directly with back-end storage.

NVMe is meant to be a faster alternative to the Small Computer System Interface (SCSI) standard that transfers data between a host and a target device. Development of the NVMe standard is under the auspices of NVM Express Inc., a nonprofit organization comprising more than 100 member technology companies.

The NVMe standard is widely considered to be the eventual successor to the SAS and SATA protocols. NVMe form factors include add-in cards, U.2 2.5-inch and M.2 SSD devices.

Some of the NVMe-based products available include:

  • DataDirect Networks Flashscale
  • Datrium DVX hybrid system
  • HPE Persistent Memory
  • Kaminario K2.N
  • Micron Accelerated Solutions NVMe reference architecture
  • Micron SolidScale NVMe over Fabrics appliances
  • Pure Storage FlashArray//X
  • Tegile IntelliFlash

A handful of NVMe-flash startups are bringing products to market, as well, including:

  • Apeiron Data Systems combines NVMe drives with data services housed in field-programmable gate arrays instead of servers attached to storage arrays.
  • E8 Storage E8-D24 NVMe flash arrays replicate snapshots to attached compute servers to reduce management overhead on the array.
  • Excelero software-defined storage runs on any x86 server.
  • Mangstor MX6300 NVMe over Fabrics (NVMe-oF) storage is branded PCIe NVMe add-in cards on Dell PowerEdge servers.
  • Pavilion Data Systems-branded Pavilion Memory Array.
  • Vexata VX-100 is based on the software-defined Vexata Active Data Fabric.

Industry experts expect 2018 to usher in more end-to-end, rack-scale flash storage systems based on NVMe-oF. These systems integrate custom NVMe flash modules as a fabric in place of a bunch of NVMe SSDs.

The NVMe-oF transport mechanism enables a long-distance connection between host devices and NVMe storage devices. IBM, Kaminario and Pure Storage have publicly disclosed products to support NVMe-oF, although most storage vendors have pledged support.

All-flash storage arrays in hyper-converged infrastructure

Hyper-converged infrastructure (HCI) systems combine computing, networking, storage and virtualization resources as an integrated appliance. Most hyper-convergence products are designed to use disk as front-end storage, relying on a moderate flash cache layer to accelerate applications or to use as cold storage. For reasons related to performance, most HCI arrays were not traditionally built primarily for flash storage, although that started to change in 2017.

Now the leading HCI vendors sell all-flash versions. Among these vendors are Cisco, Dell EMC, HPE, Nutanix, Pivot3 and Scale Computing. NetApp launched an HCI product in October 2017 built around its SolidFire all-flash storage platform.

Read more »

zero-day (computer)
Posted by Thang Le Toan on 03 August 2018 01:03 AM

Zero-day is a flaw in software, hardware or firmware that is unknown to the party or parties responsible for patching or otherwise fixing the flaw. The term zero day may refer to the vulnerability itself, or an attack that has zero days between the time the vulnerability is discovered and the first attack. Once a zero-day vulnerability has been made public, it is known as an n-day or one-day vulnerability.

Ordinarily, when someone detects that a software program contains a potential security issue, that person or company will notify the software company (and sometimes the world at large) so that action can be taken. Given time, the software company can fix the code and distribute a patch or software update. Even if potential attackers hear about the vulnerability, it may take them some time to exploit it; meanwhile, the fix will hopefully become available first. Sometimes, however, a hacker may be the first to discover the vulnerability. Since the vulnerability isn't known in advance, there is no way to guard against the exploit before it happens. Companies exposed to such exploits can, however, institute procedures for early detection.

Security researchers cooperate with vendors and usually agree to withhold all details of zero-day vulnerabilities for a reasonable period before publishing those details. Google Project Zero, for example, follows industry guidelines that give vendors up to 90 days to patch a vulnerability before the finder of the vulnerability publicly discloses the flaw. For vulnerabilities deemed "critical," Project Zero allows only seven days for the vendor to patch before publishing the vulnerability; if the vulnerability is being actively exploited, Project Zero may reduce the response time to less than seven days.

Zero-day exploit detection

Zero-day exploits tend to be very difficult to detect. Antimalware software and some intrusion detection systems (IDSes) and intrusion prevention systems (IPSes) are often ineffective because no attack signature yet exists. This is why the best way to detect a zero-day attack is user behavior analytics. Most of the entities authorized to access networks exhibit certain usage and behavior patterns that are considered to be normal. Activities falling outside of the normal scope of operations could be an indicator of a zero-day attack.

For example, a web application server normally responds to requests in specific ways. If outbound packets are detected exiting the port assigned to that web application, and those packets do not match anything that would ordinarily be generated by the application, it is a good indication that an attack is going on.

Zero-day exploit period

Some zero-day attacks have been attributed to advanced persistent threat (APT) actors, hacking or cybercrime groups affiliated with or a part of national governments. Attackers, especially APTs or organized cybercrime groups, are believed to reserve their zero-day exploits for high-value targets.

N-day vulnerabilities continue to live on and are subject to exploits long after the vulnerabilities have been patched or otherwise fixed by vendors. For example, the credit bureau Equifax was breached in 2017 by attackers using an exploit against the Apache Struts web framework. The attackers exploited a vulnerability in Apache Struts that was reported, and patched, earlier in the year; Equifax failed to patch the vulnerability and was breached by attackers exploiting the unpatched vulnerability.

Likewise, researchers continue to find zero-day vulnerabilities in the Server Message Block protocol, implemented in the Windows OS for many years. Once the zero-day vulnerability is made public, users should patch their systems, but attackers continue to exploit the vulnerabilities for as long as unpatched systems remain exposed on the internet.

Defending against zero-day attacks

Zero-day exploits are difficult to defend against because they are so difficult to detect. Vulnerability scanning software relies on malware signature checkers to compare suspicious code with signatures of known malware; when the malware uses a zero-day exploit that has not been previously encountered, such vulnerability scanners will fail to block the malware.

Since a zero-day vulnerability can't be known in advance, there is no way to guard against a specific exploit before it happens. However, there are some things that companies can do to reduce their level of risk exposure.

  • Use virtual local area networks to segregate some areas of the network or use dedicated physical or virtual network segments to isolate sensitive traffic flowing between servers.
  • Implement IPsec, the IP security protocol, to apply encryption and authentication to network traffic.
  • Deploy an IDS or IPS. Although signature-based IDS and IPS security products may not be able to identify the attack, they may be able to alert defenders to suspicious activity that occurs as a side effect to the attack.
  • Use network access control to prevent rogue machines from gaining access to crucial parts of the enterprise environment.
  • Lock down wireless access points and use a security scheme such as Wi-Fi Protected Access 2 for maximum protection against wireless-based attacks.
  • Keep all systems patched and up to date. Although patches will not stop a zero-day attack, keeping network resources fully patched may make it more difficult for an attack to succeed. When a zero-day patch does become available, apply it as soon as possible.
  • Perform regular vulnerability scanning against enterprise networks and lock down any vulnerabilities that are discovered.

While maintaining a high standard for information security may not prevent all zero-day exploits, it can help defeat attacks that use zero-day exploits after the vulnerabilities have been patched.

Examples of zero-day attacks

Multiple zero-day attacks commonly occur each year. In 2016, for example, there was a zero-day attack (CVE-2016-4117) that exploited a previously undiscovered flaw in Adobe Flash Player. Also in 2016, more than 100 organizations succumbed to a zero day bug (CVE-2016-0167) that was exploited for an elevation of privilege attack targeting Microsoft Windows.


In 2017, a zero-day vulnerability (CVE-2017-0199) was discovered in which a Microsoft Office document in rich text format was shown to be able to trigger the execution of a visual basic script containing PowerShell commands upon being opened. Another 2017 exploit (CVE-2017-0261) used encapsulated PostScript as a platform for initiating malware infections.

The Stuxnet worm was a devastating zero-day exploit that targeted supervisory control and data acquisition (SCADA) systems by first attacking computers running the Windows operating system. Stuxnet exploited four different Windows zero-day vulnerabilities and spread through infected USB drives, making it possible to infect both Windows and SCADA systems remotely without attacking them through a network. The Stuxnet worm has been widely reported to be the result of a joint effort by U.S. and Israel intelligence agencies  to disrupt Iran's nuclear program.

Learn more about zero-day attacks
from the CompTia security course.


FBI admits to using zero-day exploits, not disclosing them

The FBI has admitted to using zero-day exploits rather than disclosing them, and experts say this should not be a surprise considering the history of federal agency actions.

In a surprise bout of openness, Amy Hess, executive assistant director for science and technology with the FBI, admitted that the FBI uses zero-day exploits, but said the agency does struggle with the decision.


In an interview with The Washington Post, Hess called it a "constant challenge" to decide whether it is better to use a zero-day exploit "to be able to identify a person who is threatening public safety" or to disclose the vulnerability in order to allow developers to secure products being used by the public. Hess also noted the FBI prefers not to rely on zero-day exploits because the fact that they can be patched at any moment makes them unreliable.

Jeff Schilling, CSO for Armor, said the surprise might come from the fact that many people don't know that the FBI has a foreign intelligence collection mission.

"Any agency that has a foreign intelligence collection mission in cyberspace has to make decisions every day on the value gained in leveraging a zero day to collect intelligence data, especially with the impact of not letting people who are at risk know of the potential vulnerability which could be compromised," Schilling said, adding that the need for the government to find a balance between security and intelligence is not a new phenomenon. "This country experienced the same intelligence gained versus operational impact during World War II (WWII) when the intelligence community did not disclose that we had broken both the Japanese and German codes. Lots of sailors, soldiers and airmen lost their lives to keep those secrets. I think the FBI and the rest of the intelligence community have the same dilemmas as the intelligence community in WWII, however, at this point, data, not lives are at risk."

Robert Hansen, vice president for WhiteHat Security Labs, said it boils down to whether the public trusts the government to not abuse its power in this area, and whether the government should assume that only it knows about these exploits.

"In general, I think that although the net truth is that most people in government have good intentions, they can't all be relied upon to hold such belief systems," Hansen said. "And, given that in most cases exploits are found much later, it stands to reason that it's more dangerous to keep vulnerabilities in place. That's not to diminish their value, however, it's very dangerous to presume that an agency is the only one [that] can and will find and leverage that vulnerability."

Adam Kujawa, head of malware intelligence at Malwarebytes Labs, said the draw of zero-day exploits may be too strong for government agencies to resist.

"The 'benefit' of this method [is] simply having access to a weapon that theoretically can't be protected against," Kujawa said. "This is like being able to shoot someone with a nuke when they are only wearing a bullet proof vest -- completely unstoppable, theoretically. Law enforcement, when they have a target in mind, be it a cybercriminal, terrorist, et cetera, are able to breach the security of the suspect and gather intelligence or collect information on them to identify any criminal activity that might happen or will happen."

Daren Glenister, field CTO at Intralinks Inc., noted that while leaving vulnerabilities unpatched leads to risk, there is also some benefit to not publishing vulnerabilities too soon.

"Patching a threat may take a vendor days or weeks. Every hour lost in providing a patch introduces additional risk to data and access to systems," Glenister said. "[However], by not publishing zero-day threats, it minimizes the widespread underground threat from hackers that occurs every time a new threat is disclosed."

The NSA recently detailed its vulnerability disclosure policy, but while doing so never mentioned whether or not the agency used zero-day exploits. Multiple experts said this admission by the FBI makes it safe to assume the NSA is also leveraging zero days in its efforts.

Adam Meyer, chief security strategist at SurfWatch Labs Inc., said it is not only reasonable to expect the NSA is actively exploiting zero days, but many others are as well.

"I believe it is safe to assume that any U.S. agency with a Defense or Homeland Security mission area are using exploits to achieve a presence against their targets," Meyer said. "Unfortunately, I also think it is safe to assume that every developed country in the world is doing the exact same thing. The reality is a zero day can be used against us just as much as for us."

Schilling said using zero days may not be the only option, but noted that human intelligence gathering carries much greater risks.

"At the end of the day, if we are leveraging zero days to stay ahead of our national threats, I am ok with us accepting the risk of data loss and compromises," Schilling said. "History has shown that we have accepted higher costs to protect our intelligence collection, and I think we are still OK today in the risk we are accepting as it is to save lives."

Kujawa said that while there are viable alternatives to using zero days to gather intelligence, it is hard to ignore the ease and relative safety of this method.

"There are plenty of viable methods of extracting information from a suspect; however the zero-day method is incredibly effective, very quiet and very fast. Law enforcement could attack systems using known exploits, social engineering tactics or gaining physical access to the system and installing malware manually, however none of these methods are guaranteed and they all can be protected against if the suspect is practicing common security procedures. The zero-day method will fall into the same bucket as the other attacks soon enough, however, so we will have to wait and see what the future holds for law enforcement in trying to gather evidence and intelligence on criminal suspects."

Read more »

what's a spear phishing mail ?
Posted by Thang Le Toan on 02 August 2018 01:38 AM

Spear phishing is an email-spoofing attack that targets a specific organization or individual, seeking unauthorized access to sensitive information. Spear-phishing attempts are not typically initiated by random hackers, but are more likely to be conducted by perpetrators out for financial gain, trade secrets or military information.

As with emails used in regular phishing expeditions, spear phishing messages appear to come from a trusted source. Phishing messages usually appear to come from a large and well-known company or website with a broad membership base, such as Google or PayPal. In the case of spear phishing, however, the apparent source of the email is likely to be an individual within the recipient's own company -- generally, someone in a position of authority -- or from someone the target knows personally.

Visiting United States Military Academy professor and National Security Agency official Aaron Ferguson called it the "colonel effect."  To illustrate his point, Ferguson sent out a message to 500 cadets, asking them to click a link to verify grades. Ferguson's message appeared to come from a Col. Robert Melville of West Point. Over 80% of recipients clicked the link in the message. In response, they received a notification that they'd been duped and a warning that their behavior could have resulted in downloads of spyware, Trojan horses and/or other malware.

Many enterprise employees have learned to be suspicious of unexpected requests for confidential information and will not divulge personal data in response to emails or click on links in messages unless they are positive about the source. The success of spear phishing depends upon three things: The apparent source must appear to be a known and trusted individual; there is information within the message that supports its validity, and the request the individual makes seems to have a logical basis.

Spear-phishing email

Spear phishing vs. phishing vs. whaling

This familiarity is what sets spear phishing apart from regular phishing attacks. Phishing emails are typically sent by a known contact or organization. These include a malicious link or attachment that installs malware on the target's device, or directs the target to a malicious website that is set up to trick them into giving sensitive information like passwords, account information or credit card information.

Spear phishing has the same goal as normal phishing, but the attacker first gathers information about the intended target. This information is used to personalize the spear-phishing attack. Instead of sending the phishing emails to a large group of people, the attacker targets a select group or an individual. By limiting the targets, it's easier to include personal information -- like the target's first name or job title -- and make the malicious emails seem more trustworthy.

The same personalized technique is used in whaling attacks, as well. A whaling attack is a spear-phishing attack directed specifically at high-profile targets like C-level executives, politicians and celebrities. Whaling attacks are also customized to the target and use the same social-engineering, email-spoofing and content-spoofing methods to access sensitive data.

Examples of successful attacks

In one version of a successful spear-phishing attack, the perpetrator finds a webpage for their target organization that supplies contact information for the company. Using available details to make the message seem authentic, the perpetrator drafts an email to an employee on the contact page that appears to come from an individual who might reasonably request confidential information, such as a network administrator. The email asks the employee to log into a bogus page that requests the employee's username and password, or click on a link that will download spyware or other malicious programming. If a single employee falls for the spear phisher's ploy, the attacker can masquerade as that individual and use social-engineering techniques to gain further access to sensitive data.

In 2015, independent security researcher and journalist Brian Krebs reported that Ubiquiti Networks Inc. lost $46.7 million to hackers who started the attack with a spear-phishing campaign. The hackers were able to impersonate communications from executive management at the networking firm and performed unauthorized international wire transfers.

Spear phishing defense

Spear-phishing attacks -- and whaling attacks -- are often harder to detect than regular phishing attacks because they are so focused.

In an enterprise, security-awareness training for employees and executives alike will help reduce the likelihood of a user falling for spear-phishing emails. This training typically educates enterprise users on how to spot phishing emails based on suspicious email domains or links enclosed in the message, as well as the wording of the messages and the information that may be requested in the email.


How to prevent a spear phishing attack from infiltrating an enterprise

While spear phishing emails are becoming harder to detect, there are still ways to prevent them. Threats expert Nick Lewis gives advice.

Spear phishing and social engineering are becoming more popular as attackers target humans as a particularly dependable point of ingress (HBGary, RSA, etc.). Considering that a well-crafted spear phishing email is almost indistinguishable from a legitimate email, what is the best way to prevent users from clicking on spear phishing links?

Phishing, social engineering and spear phishing have been growing in popularity over the last 10 or more years. The introduction of spear phishing and other newer forms of phishing are an evolution of social engineering or fraud. Attackers have found ways to exploit weaknesses in technologies like VoIP, IM and SMS messages, among others,  to commit fraud, and will continue to adapt as new technologies develop. Humans will always be an integral part of information security for an organization, but can always be targeted, regardless of the technologies in use. Humans are sometimes the weakest link.

To help minimize the chance of a spear phishing attack successfully infiltrating the enterprise, you can follow the advice from US-CERT on phishing or the guidance from the Anti-Phishing Working Group. Both have technical steps you can put in place, but both also include a security awareness and education component. Potentially the most effective method to combat phishing and its variants is to make sure users know to question suspicious communications and to verify the communication (email, IM, SMS, etc.) out-of-band with the requesting party. For example, if an employee gets an email from a colleague that doesn’t sound like it came from the sender or seems in some way suspicious, he or she should contact the sender using a different means of communication -- such as the phone -- to confirm the email. If the email can’t be verified, then it should be reported to your information security group, the Anti-Phishing Working Group or the FTC at

Enterprises with high security needs could choose not to connect their systems to the Internet, not allow Internet email inbound except for approved domains, or only allow inbound email from approved email addresses. This will not stop all phishing attacks and will significantly decrease usability, but may be necessary for high-security environments.

Read more »

XMPP (Extensible Messaging and Presence Protocol)
Posted by Thang Le Toan on 20 July 2018 12:14 AM

XMPP (Extensible Messaging and Presence Protocol) is a protocol based on Extensible Markup Language (XML) and intended for instant messaging (IM) and online presence detection. It functions between or among servers, and facilitates near-real-time operation. The protocol may eventually allow Internet users to send instant messages to anyone else on the Internet, regardless of differences in operating systems and browsers.

XMPP is sometimes called the Jabber protocol, but this is a technical misnomer. Jabber, an IM application similar to ICQ (I Seek You) and others, is based on XMPP, but there are many applications besides Jabber that are supported by XMPP. The IEEE XMPP working group, a consortium of engineers and programmers, is adapting XMPP for use as an Internet Engineering Task Force (IETF) technology. In addition, the Messaging and Presence Interoperability Consortium (MPIC) is considering XMPP as an important interoperability technology. Eventually, XMPP is expected to support IM applications with authentication, access control, a high measure of privacy, hop-by-hop encryption, end-to-end encryption, and compatibility with other protocols.

IBM and Microsoft are working on a similar standard called SIP for Instant Messaging and Presence Leveraging Extensions (SIMPLE) based on Session Initiation Protocol (SIP).

Tham khảo:

Read more »

unstructured data
Posted by Thang Le Toan on 19 July 2018 11:58 PM

Unstructured data is information, in many different forms, that doesn't hew to conventional data models and thus typically isn't a good fit for a mainstream relational database. Thanks to the emergence of alternative platforms for storing and managing such data, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications.

Traditional structured data, such as the transaction data in financial systems and other business applications, conforms to a rigid format to ensure consistency in processing and analyzing it. Sets of unstructured data, on the other hand, can be maintained in formats that aren't uniform, freeing analytics teams to work with all of the available data without necessarily having to consolidate and standardize it first. That enables more comprehensive analyses than would otherwise be possible.

Types of unstructured data

One of the most common types of unstructured data is text. Unstructured text is generated and collected in a wide range of forms, including Word documents, email messages, PowerPoint presentations, survey responses, transcripts of call center interactions, and posts from blogs and social media sites.

Other types of unstructured data include images, audio and video files. Machine data is another category, one that's growing quickly in many organizations. For example, log files from websites, servers, networks and applications -- particularly mobile ones -- yield a trove of activity and performance data. In addition, companies increasingly capture and analyze data from sensors on manufacturing equipment and other internet of things (IoT) connected devices.

In some cases, such data may be considered to be semi-structured -- for example, if metadata tags are added to provide information and context about the content of the data. The line between unstructured and semi-structured data isn't absolute, though; some data management consultants contend that all data, even the unstructured kind, has some level of structure.

Unstructured data analytics

Because of its nature, unstructured data isn't suited to transaction processing applications, which are the province of structured data. Instead, it's primarily used for BI and analytics. One popular application is customer analytics. Retailers, manufacturers and other companies analyze unstructured data to improve customer relationship management processes and enable more-targeted marketing; they also do sentiment analysis to identify both positive and negative views of products, customer service and corporate entities, as expressed by customers on social networks and in other forums.

Predictive maintenance is an emerging analytics use case for unstructured data. For example, manufacturers can analyze sensor data to try to detect equipment failures before they occur in plant-floor systems or finished products in the field. Energy pipelines can also be monitored and checked for potential problems using unstructured data collected from IoT sensors.

Analyzing log data from IT systems highlights usage trends, identifies capacity limitations and pinpoints the cause of application errors, system crashes, performance bottlenecks and other issues. Unstructured data analytics also aids regulatory compliance efforts, particularly in helping organizations understand what corporate documents and records contain.

Unstructured data techniques and platforms

Analyst firms report that the vast majority of new data being generated is unstructured. In the past, that type of information often was locked away in siloed document management systems, individual manufacturing devices and the like -- making it what's known as dark data, unavailable for analysis.

But things changed with the development of big data platforms, primarily Hadoop clusters, NoSQL databases and the Amazon Simple Storage Service (S3). They provide the required infrastructure for processing, storing and managing large volumes of unstructured data without the imposition of a common data model and a single database schema, as in relational databases and data warehouses.

A variety of analytics techniques and tools are used to analyze unstructured data in big data environments. Text analytics tools look for patterns, keywords and sentiment in textual data; at a more advanced level, natural language processing technology is a form of artificial intelligence that seeks to understand meaning and context in text and human speech, increasingly with the aid of deep learning algorithms that use neural networks to analyze data. Other techniques that play roles in unstructured data analytics include data mining, machine learning and predictive analytics.


Read more »

Help Desk Software by Kayako