Live Chat Software by Kayako
 News Categories
(24)Microsoft Technet (2)StarWind (6)TechRepublic (4)ComuterTips (1)SolarWinds (1)Xangati (1) (30)VMware (8)NVIDIA (9)VDI (1)pfsense vRouter (4)VEEAM (3)Google (2)RemoteFX (1) (1)MailCleaner (1)Udemy (1)AUGI (2)AECbytes Architecture Engineering Constrution (8)VMGuru (2)AUTODESK (9) (1)Atlantis Blog (40)AT.COM (2) (1) (16) (3)hadoop360 (3)bigdatastudio (1) (1) (3)VECITA (1) (1)Palo Alto Networks (5) (2) (1)Nhịp Cầu đầu tư (3)VnEconomy (1)Reuters (1)Tom Tunguz (1) (1)Esri (1) (1)tweet (1)Tesla (1) (7)ITCNews (1) (1) Harvard Business Review (1)Haravan (2) (1) (8) (3)IBM (1) (2) (1) (9) (1) (1) (4) (1) (1) (1) (1) (1) (1) (1) (4) (5) (4) (3) (1) (1) (1) (3) (1) (27) (1) (1) (1) (5) (2) (1) (1) (3) (2) (2) (1) (21) (1) (1) (1) (1) (1) (1) (2)Engenius (1) (1) (1) (1) (1) (3) (6) (1) (2) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (2)VTV (6)NguyenTatThanh School (1) (1)
RSS Feed
March 2018 Server StorageIO Data Infrastructure Update Newsletter
Posted by Thang Le Toan on 30 August 2018 11:02 PM

Volume 18, Issue 3 (March 2018)

Hello and welcome to the March 2018 Server StorageIO Data Infrastructure Update Newsletter.

If you are wondering where the January and February 2018 update newsletters are, they are rolled into this combined edition. In addition to the short email version (free signup here), you can access full versions (html here and PDF here) along with previous editions here.

In this issue:

Enjoy this edition of the Server StorageIO Data Infrastructure update newsletter.

Cheers GS

Data Infrastructure and IT Industry Activity Trends

Data Infrastructure Data Protection and Backup BC BR DR HA Security

World Backup day is coming up on March 31 which is a good time to remember to verify and validate that your data protection is working as intended. On one hand I think it is a good idea to call out the importance of making sure your data is protected including backed up.

On the other hand data protection is not a once a year, rather a year around, 7 x 24 x 365 day focus. Also the focus needs to be on more than just backup, rather, all aspects of data protection from archiving to business continuance (BC), business resiliency (BR), disaster recovery (DR), always on, always accessible, along with security and recovery.

Data Infrastructure Data Protection Backup 4 3 2 1 rule
Data Infrastructure 4 3 2 1 Data Protection and Backup

Some data spring thoughts, perspectives and reminders. Data lakes may swell beyond their banks causing rivers of data to flood as they flow into larger reservoirs, great data lakes, gulfs of data, seas and oceans of data. Granted, some of that data will be inactive cold parked like glaciers while others semi-active floating around like icebergs. Hopefully your data is stored on durable storage solutions or services and does not melt.

Data Infrastructure Server Storage I/O flash SSD NVMe
Various NAND Flash SSD devices and SAS, SATA, NVMe, M.2 interfaces

Non-Volatile Memory (NVM) including various solid state device (SSD) mediums (e.g. nand flash, 3D XPoint, MRAM among others), packaging (drives, PCIe Add in cars [AiC] along with entire systems, appliances or arrays). Also part of the continue evolution of NVM, SSD and other persistent memories (PM) including storage class memories (SCM) are different access protocol interfaces.

Keep in mind that there is a difference between NVM (medium) and NVMe (access), NVM is the generic category of mediums or media and devices such as nand flash, nvram, 3D XPoint among others SCM (and PMs). In other words, NVM is what data devices use for storing data, NVMe is how devices and systems are accessed. NVMe and its variations is how NVM, SSD, PM, SCM media and devices get accessed locally, as well as over network fabrics (e.g. NVMe-oF an FC-NVMe).

NVMe continues to evolve including with networked fabric variations such as RDMA based NVMe over Fabric (NVMe-oF), along with Fibre Channel based (FC-NVMe). The Fibre Channel Industry Association trade group recently held its second multi-vendor plugfest in support of NVMe over Fibre Channel.

Read more about NVM, NVMe, SSD, SCM, flash and related technologies, tools, trends, tips via the following resources:

Has Object Storage failed to live up to its industry hype lacking traction? Or, is object storage (also known as blobs) progressing with customer adoption and deployment on normal realistic timelines? Recently I have seen some industry comments about object storage not catching on with customers or failing to live up to its hyped expectation. IMHO object storage is very much alive along with block, file, table (e.g. database SQL and NoSQL repositories), message/queue among others, as well as emerging blockchain aka data exchanges.

Various Industry and Customer Adoption Deployment timeline
Various Industry and Customer Adoption Deployment Timeline (Via:

An issue with object storage is that it is still new, still evolving, many IT environments applications do not yet speak or access objects and blobs natively. Likewise as is often the case, industry adoption and deployment is usually early and short term around the hype, vs. the longer cycle of customer adoption and deployment. The downside for those who only focus on object storage (or blobs) is that they may be under pressure to do things short term instead of adjusting to customer cycles which take longer, however real adoption and deployment also last longer.

While the hype and industry buzz around object storage (and blobs) may have faded, customer adoption continues and is here to stay, along with block, file among others, learn more at Also keep in mind that there is a difference between industry and customer adoption along with deployment.

Some recent Industry Activities, Trends, News and Announcements include:

In case you missed it, Amazon Web Services (e.g. AWS) announced EKS (Elastic Kubernetes Service) which as its name implies, is an easy to use and manage Kubernetes (containers, serverless data infrastructure) running on AWS. AWS joins others including Microsoft Azure Kubernetes Services (AKS), Googles Kubernetes Engine, EasyStack (ESContainer for openstack and Kubernetes),VMware Pivotal Container Service (PKS) among others. What this means is that in the container serverless data infrastructure ecosystem Kubernetes container management (orchestration platform) is gaining in both industry as well as customer adoption along with deployment.

Check out other industry news, comments, trends perspectives here.

Data Infrastructure Server StorageIO Comments Content

Server StorageIO Commentary in the news, tips and articles

Recent Server StorageIO industry trends perspectives commentary in the news.

Via BizTech: Why Hybrid (SSD and HDD) Storage Might Be Fit for SMB environments
Via Excelero: Server StorageIO white paper enabling database DBaaS productivity
Via Cloudian: YouTube video interview file services on object storage with HyperFile
Via CDW Solutions: Comments on Software Defined Access
Via SearchStorage: Comments on Cloudian HyperStore on demand cloud like pricing
Via EnterpriseStorageForum: Comments and tips on Software Defined Storage Best Practices
Via PRNewsWire: Comments on Excelero NVMe NVMesh Database and DBaaS solutions
Via SearchStorage: Comments on NooBaa multi-cloud storage management
Via CDW: Comments on New IT Strategies Improve Your Bottom Line 
Via EnterpriseStorageForum: Comments on Software Defined Storage: Pros and Cons
Via DataCenterKnowledge: Comments on The Great Data Center Headache IoT
Via SearchStorage: Comments on Dell and VMware merger scenario options
Via PRNewswire: Comments on Chelsio Microsoft Validation of iWARP/RDMA
Via SearchStorage: Comments on Server Storage Industry trends and Dell EMC
Via ChannelProSMB: Comments on Hybrid HDD and SSD storage solutions
Via ChannelProNetwork: Comments on What the Future Holds for HDDs
Via HealthcareITnews: Comments on MOUNTAINS OF MOBILE DATA
Via SearchStorage: Comments on Cloudian HyperStore 7 targets multi-cloud complexities
Via GlobeNewsWire: Comments on Cloudian HyperStore 7
Via GizModo: Comments on Intel Optane 800P NVMe M.2 SSD
Via DataCenterKnowledge: Comments on getting data centers ready for IoT
Via DataCenterKnowledge: Comments on Beyond the Hype: AI in the Data Center
Via DataCenterKnowledge: Comments on Data Center and Cloud Disaster Recovery
Via SearchStoragae: Comments on Cloudian HyperFile marries NAS and object storage
Via SearchStoragae: Comments on Top 10 Tips on Solid State Storage Adoption Strategy
Via SearchStoragae: Comments on 8 Top Tips for Beating the Big Data Deluge

View more Server, Storage and I/O trends and perspectives comments here.

Data Infrastructure Server StorageIOblog posts

Server StorageIOblog Data Infrastructure Posts

Recent and popular Server StorageIOblog posts include:

Application Data Value Characteristics Everything Is Not The Same
Application Data Availability 4 3 2 1 Data Protection
AWS Cloud Application Data Protection Webinar
Microsoft Windows Server 2019 Insiders Preview
Application Data Characteristics Types Everything Is Not The Same
Application Data Volume Velocity Variety Everything Is Not The Same
Application Data Access Lifecycle Patterns Everything Is Not The Same
Veeam GDPR preparedness experiences Webinar walking the talk
VMware continues cloud construction with March announcements
Benefits of Moving Hyper-V Disaster Recovery to the Cloud Webinar
World Backup Day 2018 Data Protection Readiness Reminder
Use Intel Optane NVMe U.2 SFF 8639 SSD drive in PCIe slot
Data Infrastructure Resource Links cloud data protection tradecraft trends
How to Achieve Flexible Data Protection Availability with All Flash Storage Solutions
November 2017 Server StorageIO Data Infrastructure Update Newsletter
IT transformation Serverless Life Beyond DevOps Podcast
Data Protection Diaries Fundamental Topics Tools Techniques Technologies Tips
HPE Announces AMD Powered Gen 10 ProLiant DL385 For Software Defined Workloads
AWS Announces New S3 Cloud Storage Security Encryption Features
Introducing Windows Subsystem for Linux WSL Overview #blogtober
Hot Popular New Trending Data Infrastructure Vendors To Watch

View other recent as well as past StorageIOblog posts here

Server StorageIO Recommended Reading (Watching and Listening) List

Software-Defined Data Infrastructure Essentials SDDI SDDC

In addition to my own books including Software Defined Data Infrastructure Essentials (CRC Press 2017) available at (check out special sale price), the following are Server StorageIO data infrastructure recommended reading, watching and listening list items. The Server StorageIO data infrastructure recommended reading list includes various IT, Data Infrastructure and related topics including Intel Recommended Reading List (IRRL) for developers is a good resource to check out. Speaking of my books, Didier Van Hoye (@WorkingHardInIt) has a good review over on his site you can view here, also check out the rest of his great content while there.

In case you may have missed it, here is a good presentation from AWS re:invent 2017 by Brendan Gregg (@brendangregg) about how Netflix does EC2 and other AWS tuning along with plenty of great resource links. Keith Tenzer (@keithtenzer) provides a good perspective piece about containers in a large IT enterprise environment here including various options.

Speaking of IT data centers and data infrastructure environments, checkout the list of some of the worlds most extreme habitats for technology here. Mark Betz (@markbetz) has a series of Docker and Kubernetes networking fundamentals posts on his site here, as well as over at Medium including mention of Google Cloud (@googlecloud). The posts in Marks series are good refresher or intros to how Docker and Kubernetes handles basic networking between containers, pods, nodes, hosts in clusters. Check out part I here and part II here.

Blockchain elements
Image via

Steve Todd (@Stevetodd) has some good perspectives about Trusted Data Exchanges e.g. life beyond blockchain and bitcoin here along with core element considerations (beyond the product pitch) here, along with associated data infrastructure and storage evolution vs. revolution here.

Watch for more items to be added to the recommended reading list book shelf soon.

Data Infrastructure Server StorageIO event activities

Events and Activities

Recent and upcoming event activities.

March 27, 2018 – Webinar – Veeams Road to GDPR Compliancy The 5 Lessons Learned

Feb 28, 2018 – Webinar – Benefits of Moving Hyper-V Disaster Recovery to the Cloud

Jan 30, 2018 – Webinar – Achieve Flexible Data Protection and Availability with All Flash Storage

Nov. 9, 2017 – Webinar – All You Need To Know about ROBO Data Protection Backup

See more webinars and activities on the Server StorageIO Events page here.

Data Infrastructure Server StorageIO Industry Resources and Links

Various useful links and resources:

Data Infrastructure Recommend Reading and watching list
Microsoft TechNet – Various Microsoft related from Azure to Docker to Windows – Various industry links (over 1,000 with more to be added soon) – Cloud and object storage topics, tips and news items – Various OpenStack related items – Various presentations and other download material – Various data protection items and topics – Focus on NVMe trends and technologies – NVM and Solid State Disk topics, tips and techniques – Various CI, HCI and related SDS topics – Various server, storage and I/O benchmark and tools
VMware Technical Network – Various VMware related items

Read more »

Data Protection Recovery Life Post World Backup Day Pre GDPR
Posted by Thang Le Toan on 30 August 2018 03:46 PM

It’s time for Data Protection Recovery Life Post World Backup Day Pre GDPR Start Date.

The annual March 31 world backup day focus has come and gone once again.

However, that does not mean data protection including backup as well as recovery along with security gets a 364-day vacation until March 31, 2019 (or the days leading up to it).

Granted, for some environments, public relations, editors, influencers and other industry folks backup day will take some time off while others jump on the ramp up to GDPR which goes into effect May 25, 2018.

Expanding Focus Data Protection and GDPR

As I mentioned in this post here, world backup day should be expanded to include increased focus not just on backup, also recovery as well as other forms of data protection. Likewise, May 25 2018 is not the deadline or finish line or the destination for GDPR (e.g. Global Data Protection Regulations), rather, it is the starting point for an evolving journey, one that has global impact as well as applicability. Recently I participated in a fireside chat discussion with Danny Allan of Veeam who shared his GDPR expertise as well as experiences, lessons learned, tips of Veeam as they started their journey, check it out here.

Expanding Focus Data Protection Recovery and other Things that start with R

As part of expanding the focus on Data Protection Recovery Life Post World Backup Day Pre GDPR, that also means looking at, discussing things that start with R (like Recovery). Some examples besides recovery include restoration, reassess, review, rethink protection, recovery point, RPO, RTO, reconstruction, resiliency, ransomware, RAID, repair, remediation, restart, resume, rollback, and regulations among others.

Data Protection Tips, Reminders and Recommendations

    • There are no blue participation ribbons for failed recovery. However, there can be pink slips.
    • Only you can prevent on-premises or cloud data loss. However, it is also a shared responsibility with vendors and service providers
    • You can’t go forward in the future when there is a disaster or loss of data if you can’t go back in time for recovery
    • GDPR appliances to organizations around the world of all size and across all sectors including nonprofit
    • Keep new school 4 3 2 1 data protection in mind while evolving from old school 3 2 1 backup rules

4 3 2 1 backup data protection rule

  • A Fundamental premise of data infrastructures is to enable applications and their data, protect, preserve, secure and serve
  • Remember to protect your applications, as well as data including metadata, settings configurations
  • Test your restores including can you use the data along with security settings
  • Don’t cause a disaster in the course of testing your data protection, backups or recovery
  • Expand (or refresh) your data protection and data infrastructure education tradecraft skills experiences

Where to learn more

Learn more about data protection, world backup day, recovery, restoration, GDPR along with related data infrastructure topics for cloud, legacy and other software defined environments via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Data protection including business continuance (BC), business resiliency (BR), disaster recovery (DR), availability, accessibility, backup, snapshots, encryption, security, privacy among others is a 7 x 24 x 365 day a year focus. The focus of data protection also needs to evolve from an after the fact cost overhead to proactive, business enabler Meanwhile, welcome to Data Protection Recovery Post World Backup Day Pre GDPR Start Date.

Ok, nuff said, for now.

Read more »

Have you heard about the new CLOUD Act data regulation?
Posted by Thang Le Toan on 30 August 2018 03:41 PM

The new CLOUD Act data regulation became law as part of the recent $1.3 Trillion (USD) omnibus U.S. government budget spending bill passed by Congress on March 23, 2018 and signed by President of the U.S. (POTUS) Donald Trump in March.

CLOUD Act is the acronym for Clarifying Lawful Overseas Use of Data, not to be confused with initiatives such as U.S. federal governments CLOUD First among others which are focused on using cloud, securing and complying (e.g. FedRAMP among others). In other words, the new CLOUD Act data regulation pertains to how data stored by cloud or other service providers can be accessed by law environment officials (LEO).

U.S. Supreme court
Supreme Court of the U.S. (SCOTUS) Image via

CLOUD Act background and Stored Communications Act

After the signing into law of CLOUD Act, the US Department of Justice (DOJ) has asked the Supreme Court of the U.S. (SCOTUS) to dismiss the pending case against Microsoft (e.g., Azure Cloud). The case or question in front of SCOTUS pertained to whether LEO can search as well as seize information or data that is stored overseas or in foreign counties.

As a refresher, or if you had not heard, SCOTUS was asked to resolve if a service provider who is responding to a warrant based on probable cause under the 1986 era Stored Communications Act, is required to provide data in its custody, control or possession, regardless of if stored inside, or, outside the US.

Microsoft Azure Regions and software defined data infrastructures
Microsoft Azure Regions via

This particular case in front of SCOTUS centered on whether Microsoft (a U.S. Technology firm) had to comply with a court order to produce emails (as part of an LEO drug investigation) even if those were stored outside of the US. In this particular situation, the emails were alleged to have been stored in a Microsoft Azure Cloud Dublin Ireland data center.

For its part, Microsoft senior attorney Hasan Ali said via FCW “This bill is a significant step forward in the larger global debate on what our privacy laws should look like, even if it does not go to the highest threshold". Here are some additional perspectives via Microsoft Brad Smith on his blog along with a video.

What is CLOUD Act

Clarifying Lawful Overseas Use of Data is the new CLOUD Act data regulation approved by Congress (House and Senate) details can be read here and here respectively with additional perspectives here.

The new CLOUD Act law allows for POTUS to enter into executive agreements with foreign governments about data on criminal suspects. Granted what is or is not a crime in a given country will likely open Pandora’s box of issues. For example, in the case of Microsoft, if an agreement between the U.S. and Ireland were in place, and, Ireland agreed to release the data, it could then be accessed.

Now, for some who might be hyperventilating after reading the last sentence, keep this in mind that if you are overseas, it is up to your government to protect your privacy. The foreign government must have an agreement in place with the U.S. and that a crime has or had been committed, a crime that both parties concur with.

Also, keep in mind that is also appeal processes for providers including that the customer is not a U.S. person and does not reside in the U.S. and the disclosure would put the provider at risk of violating foreign law. Also, keep in mind that various provisions must be met before a cloud or service provider has to hand over your data regardless of what country you reside, or where the data resides.

Where to learn more

Learn more about CLOUD Act, cloud, data protection, world backup day, recovery, restoration, GDPR along with related data infrastructure topics for cloud, legacy and other software defined environments via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Is the new CLOUD Act data regulation unique to Microsoft Azure Cloud?

No, it also applies to Amazon Web Services (AWS), Google, IBM Softlayer Cloud, Facebook, LinkedIn, Twitter and the long list of other service providers.

What about GDPR?

Keep in mind that the new Global Data Protection Regulations (GDPR) go into effect May 25, 2018, that while based out of the European Union (EU), have global applicability across organizations of all size, scope, and type. Learn more about GDPR, Data Protection and its global impact here.

Thus, if you have not heard about the new CLOUD Act data regulation, now is the time to become aware of it.

Ok, nuff said, for now.


Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2018 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Read more »

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch
Posted by Thang Le Toan on 30 August 2018 03:36 PM

Here is the 2018 Hot Popular New Trending Data Infrastructure Vendors To Watch which includes startups as well as established vendors doing new things. This piece follows last year’s hot favorite trending data infrastructure vendors to watch list (here), as well as who will be top of storage world in a decade piece here.

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch
Data Infrastructures Support Information Systems Applications and Their Data

Data Infrastructures are what exists inside physical data centers and cloud availability zones (AZ) that are defined to provide traditional, as well as cloud services. Cloud and legacy data infrastructures are combined by hardware (server, storage, I/O network), software along with management tools, policies, tradecraft techniques (skills), best practices to support applications and their data. There are different types of data infrastructures to meet the needs of various environments that range in size, scope, focus, application workloads, along with Performance and capacity.

Another important aspect of data infrastructures is that they exist to protect, preserve, secure and serve applications that transform data into information. This means that availability and Data Protection including archive, backup, business continuance (BC), business resiliency (BR), disaster recovery (DR), privacy and security among other related topics, technology, techniques, and trends are essential data infrastructure topics.

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch
Different timelines of adoption and deployment for various audiences

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch

Some of those on this year’s list are focused on different technology areas, while others on size or types of vendors, suppliers, service providers. Others on the list are focused on who is new, startup, evolving, or established which varies from if you are an industry insider or IT customer environment. Meanwhile others new and some are established doing new things, mix of some you may not have heard of for those who want or need to have the most current list to rattle off startups for industry adoption (and deployment), as well as what some established players are doing that might lead to customer deployment (and adoption).

AMD – The AMD EPYC family of processors is opening up new opportunities for AMD to challenge Intel among others for a more significant share of the general-purpose compute market in support of data center and data infrastructure markets. An advantage that AMD has and is playing to in the industry speeds feeds, slots and watts price performance game is the ability to support more memory and PCIe lanes per socket than others including Intel. Keep in mind that PCIe lanes will become even more critical as NVMe deployment increases, as well as the use of GPU’s and faster Ethernet among other devices. Name brand vendors including Dell and HPE among others have announced or are shipping AMD EPYC based processors.

Aperion – Cloud and managed service provider with diverse capabilities.

Amazon Web Services (AWS) – Continues to expand its footprint regarding regions, availability zones (AZ) also known as data centers in regions, as well as some services along with the breadth of those capabilities. AWS has recently announced a new Snowball Edge (SBE) which in the past has been a data migration appliance now enhanced with on-prem Elastic Cloud Compute (EC2) capabilities. What this means is that AWS can put on-prem compute capabilities as part of a storage appliance for short-term data movement, migration, conversion, importing of virtual machines and other items.

On the other hand, AWS can also be seen as using SBE as a first entry to placing equipment on-prem for hybrid clouds, or, converged infrastructure (CI), hyper-converged infrastructure (HCI), cloud in a box similar to Microsoft Azure Stack, as well as CI/HCI solutions from others.

My prediction near term, however, is that CI/HCI vendors will either ignore SBE, downplay it, create some new marketing on why it is not CI/HCI or fud about vendor lock-in. In other words, make some popcorn and sit back, watch the show.

Backblaze – Low-cost, high-capacity cloud storage for backup and archiving provider known for their quarterly disk drive reliability ratings (or failure) reports. They have been around for a while, have a good reputation among those who use their services for being a low-cost alternative to the larger providers.

Barefoot networks – Some of you may already be aware of or following Barefoot Networks, while others may not have heard of them outside of the networking space. They have some impressive capabilities, are new, you probably have not heard of them, thus an excellent addition to this list.

Cloudian – Continue to evolve and no longer just another object storage solution, Cloudian has been expanding via organic technology development, as well as acquisitions giving them a broad portfolio of software-defined storage and tiering from on-prem to the cloud, block, file and object access.

Cloudflare – Not exactly a startup, some of you may know or are using Cloudflare, while to others, their role as a web cache, DNS, and other service is transparent. I have been using Cloudflare on my various sites for over a year, and like the security, DNS, cache and analytics tools they provide as a customer.

Cobalt Iron – For some, they might be new, Software-defined Data protection and management is the name of the game over at Cobalt Iron which has been around a few years under the radar compared to more popular players. If you have or are involved with IBM Tivoli aka TSM based backup and data protection among others, check out the exciting capabilities that Cobalt can bring to the table.

CTERA – Having been around for a while, to some they might not be a startup, on the other hand, they may be new to others while offering new data and file management options to others.

DataCore – You might know of DataCore for their software-defined storage and past storage hypervisor activity. However, they have a new piece of software MaxParallel that boost server storage I/O performance. The software installs on your Windows Server instance (bare metal, VM, or cloud instance) and shows you performance with and without acceleration which you can dynamically turn off and off.

DataDirect Networks (DDN) – Recently acquired Lustre assets from Intel, now picking up the storage startup Tintri pieces after it ceased operations. What this means is that while beefing up their traditional High-Performance Compute (HPC) and Super Compute (SC) focus, DDN is also expanding into broader markets.

Dell Technologies – At its recent Dell Technology World event in Las Vegas during late April, early May 2018, several announcements were made, including some tied to emerging Gen-Z along with composability. More recently, Dell Technologies along with VMware announced business structure and finance changes. Changes include VMware declaring a dividend, Dell Technologies being its largest shareholder will use proceeds to fund restricting and debt service. Read more about VMware and Dell Technology business and financial changes here.

Densify – With a name like Densify no surprise they propose to drive densification and automation with AI-powered deep learning to optimize application resource use across on-prem software-defined virtual as well as cloud instances and containers.

FlureDB – If you are into databases (SQL or NoSQL), as well as Blockchain or distributed ledgers, check out FlureDB. – When it comes to data infrastructure and data center networking, Innovium is probably not on your radar, however, keep an eye on these folks and their TERALYNX switching silicon to see where it ends up given their performance claims.

Komprise – File, and data management solutions including tiering along with partners such as IBM.

Kubernetes – A few years ago OpenStack, then Docker containers was the favorite and trending discussion topic, then Mesos and along comes Kubernetes. It’s safe to say, at least for now, Kubernetes is settling in as a preferred open source industry and customer defecto choice (I want to say standard, however, will hold off on that for now) for container and related orchestration management. Besides, do it yourself (DiY) leveraging open source, there are also managed AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Services (AKS), Google Kubernetes Engine (GKE), and VMware Pivotal Container Service (PKS) among others. Besides Azure, Microsoft also includes Kubernetes support (along with Docker and Windows containers) as part of Windows Servers.

ManageEngine (part of Zoho) – Has data infrastructure monitoring technology called OpManager for keeping an eye on networking.

Marvel – Marvel may not be a familiar name (don’t confuse with comics), however, has been a critical component supplier to partners whose server or storage technology you may be familiar with or have yourself. Server, Storage, I/O Networking chip maker has closed on its acquisition of Cavium (who previously bought Qlogic among others). The combined company is well positioned as a key data infrastructure component supplier to various partners spanning servers, storage, I/O networking including Fibre Channel (FC), Ethernet, InfiniBand, NVMe (and NVMeoF) among others.

Mellanox – Known for their InfiniBand adapters, switches, and associated software, along with growing presence in RDMA over Converged Ethernet (RoCE), they are also well positioned for NVMe over Fabrics among other growth opportunities following recent boardroom updates, along with technology roadmap’s.

Microsoft – Azure public cloud continues to evolve similarly to AWS with more region locations, availability zone (AZ) data centers, as well as features and extensions. Microsoft also introduced about a year ago its hybrid on-prem CI/HCI cloud in a box platform appliance Azure Stack (read about my test drive here). However, there is more to Microsoft than just their current cloud first focus which means Windows (desktop), as well as Server, are also evolving. Currently, in public preview, Windows Server 2019 insiders build available to try out many new capabilities, some of which were covered in the recent free Microsoft Virtual Summit held in June. Key themes of Windows Server 2019 include security, performance, hybrid cloud, containers, software-defined storage and much more.

Microsemi – Has been around for a while is the combination of some vendors you may not have heard of or heard about in some time including PMC-Sierra (acquired Adaptec) and Vitesse among others. The reason I have Microsemi on this list is a combination of their acquisitions which might be an indicator of whom they pick up next. Another reason is that their components span data infrastructure topics from servers, storage, I/O and networking, PCIe and many more.

NVIDIA – GPU high performance compute and related compute offload technologies have been accessible for over a decade. More recently with new graphics and computational demands, GPU such as those NVIDIA are in need. Demand includes traditional graphics acceleration for physical and virtual, augmented and virtual reality, as well as cloud, along with compute-intensive analytics, AI, ML, DL along with other cognitive workloads.

NGDSystems (NGD) – Similar to what NVIDIA and other GPU vendors do for enabling compute offload for specific applications and workloads, NGD is working on a variation. That variation is to move offload compute capabilities for the server I/O storage-intensive workloads closer, in fact into storage system components such as SSDs and emerging SCMs and PMEMs. Unlike GPU based applications or workloads that tend to be more memory and compute intensive, NGD is positioned for applications that are the server I/O and storage intensive.

The premise of NGD is that they move the compute and application closer to where the data is, eliminating extra I/O, as well as reducing the amount of main server memory and compute cycles. If you are familiar with other server storage I/O offload engines and systems such as Oracle Exadata database appliance NGD is working at a tighter integration granularity. How it works is your application gets ported to run on the NGD storage platform which is SSD based and having a general-purpose processor. Your application is initiated from a host server, where it then runs on the NGD meaning I/Os are kept local to the storage system. Keep in mind that the best I/O is the one that you do not have to do, the second best is the one with the least resource or user impact.

Opvisor – Performance activity and capacity monitoring tools including for VMware environments.

Pavillon – Startup with an interesting NVMe based hardware appliance.

Quest – Having gained their independence as a free-standing company since divestiture from Dell Technologies (Dell had previously acquired Quest before EMC acquisition), Quest continues to make their data infrastructure related management tools available. Besides now being a standalone company again, keep an eye on Quest to see how they evolve their existing data protection and data infrastructure resource management tools portfolio via growth, acquisition, or, perhaps Quest will be on somebody else’s future growth list.

Retrospect – Far from being a startup, after gaining their independence from when EMC bought them several years ago, they have since continued to enhance their data protection technology. Disclosure, I have been a Retrospect customer since 2001 using it for on-site, as well as cloud data protection backups to the cloud.

Rubrik – Becoming more of a data infrastructure household name given their expanding technology portfolio and marketing efforts. More commonly known in smaller customer environments, as well as broadly within industry insider circles, Rubrik has potential with continued technology evolution to move further upmarket similar to how Commvault did back in the late 90s, just saying.

SkyScale – Cloud service provider that offers dedicated bare metal, as well as private, hybrid cloud instances along with GPU to support AI, ML, DL and other high performance,  compute workloads.

Snowflake – The name does not describe well what they do or who they are. However, they have an interesting cloud data warehouse (old school) large-scale data lakes (new school) technologies.

Strongbox – Not to be confused with technology such as those from Iosafe (e.g., waterproof, fireproof), Strongbox is a data protection storage solution for storing archives, backups, BC/BR/DR data, as well as cloud tiering. For those who are into buzzword bingo, think cloud tiering, object, cold storage among others. The technology evolved out of Crossroads and with David Cerf at the helm has branched out into a private company with keeping an eye on.

Storbyte – With longtime industry insider sales and marketing pro-Diamond Lauffin (formerly Nexsan) involved as Chief Evangelist, this is worth keeping an eye on and could be entertaining as well as exciting. In some ways it could be seen as a bit of Nexsan meets NVme meets NAND Flash meets cost-effective value storage dejavu play.

Talon – Enterprise storage and management solutions for file sharing across organizations, ROBO and cloud environments.

Ubitqui – Also known as UBNT is a data infrastructure networking vendor whose technologies span from WiFi access points (AP), high-performance antennas, routing, switching and related hardware, along with software solutions. UBNT is not as well-known in more larger environments as a Cisco or others. However, they are making a name for themselves moving from the edge to the core. That is, working from the edge with AP and routers, firewalls, gateways for the SMB, ROBO, SOHO as well as consumer (I have several of their APs, switches, routers and high-performance antennas along with management software), these technologies are also finding their way into larger environments. 

My first use of UBNT was several years ago when I needed to get an IP network connection to a remote building separated by several hundred yards of forest. The solution I found was to get a pair of UBNT NANO Apps, put them in secure bridge mode; now I have a high-performance WiFi service through a forest of trees. Since then have replaced an older Cisco router, several Cisco, and other APs, as well as the phased migration of switches.

UpdraftPlus– If you have a WordPress web or blog site, you should also have a UpdraftPlus plugin (go premium btw) for data protection. I have been using Updraft for several years on my various sites to backup and protect the MySQL databases and all other content. For those of you who are familiar with Spanning (e.g., was acquired by EMC then divested by Dell) and what they do for cloud applications, UpdraftPlus does similar for lower-end, smaller cloud-based applications.

Vexata – Startup scale out NVMe storage solution.

VMware – Expanding their cloud foundation from on-prem to in and on clouds including AWS among others. Data Infrastructure focus continues to expand from core to edge, server, storage, I/O, networking. With recent Dell Technologies and VMware declaring a dividend, should be interesting to see what lies ahead for both entities.

What About Those Not Mentioned?

By the way, if you were wondering about or why others are not in the above list, simple, check out last year’s list which includes Apcera, Blue Medora, Broadcom, Chelsio, Commvault, Compuverde, Datadog, Datrium, Docker, E8 Storage, Elastifile, Enmotus, Everspin, Excelero, Hedvig, Huawei, Intel, Kubernetes, Liqid, Maxta, Micron, Minio, NetApp, Neuvector, Noobaa, NVIDA, Pivot3, Pluribus Networks, Portwork, Rozo Systems, ScaleMP, Storpool, Stratoscale, SUSE Technology, Tidalscale, Turbonomic, Ubuntu, Veeam, Virtuozzo and WekaIO. Note that many of the above have expanded their capabilities in the past year and remain, or have become even more interesting to watch, while some might be on the future where are they now list sometime down the road. View additional vendors and service providers via our industry links and resources page here.

What About New, Emerging, Trending and Trendy Technologies

Bitcoin and Blockchain storage startups, some of which claim or would like to replace cloud storage taking on giants such as AWS S3 in the not so distant future have been popping up lately. Some of these have good and exciting stories if they can deliver on the hype along with the premise. A couple of names to drop include among others Filecoin, Maidsafe, Sia, Storj along with services from AWS, Azure, Google and a long list of others.

Besides Blockchain distributed ledgers, other technologies and trends to keep an eye on include compute processes from ARM to SoC, GPU, FPGA, ASIC for offload and specialized processing. GPU, ASIC, and FPGA are appearing in new deployments across cloud providers as they look to offload processing from their general servers to derive total effective productivity out of them. In other words, innovating by offloading to boost their effective return on investment (old ROI), as well as increase their return on innovation (the new ROI).

Other data infrastructure server I/O which also ties into storage and network trends to watch include Gen-Z that some may claim as the successor to PCIe, Ethernet, InfiniBand among others (hint, get ready for a new round of “something is dead” hype). Near-term the objective of Gen-Z is to coexist, complement PCIe, Ethernet, CPU to memory interconnect, while enabling more granular allocation of data infrastructure resources (e.g., composability). Besides watching who is part of the Gen-Z movement, keep an eye on who is not part of it yet, specifically Intel.

NVMe and its many variations from a server internal to networked NVMe over Fabrics (NVMeoF) along with its derivatives continue to gain both industry adoption, as well as customer deployment. There are some early NVMeoF based server storage deployments (along with marketing dollars). However, the server side NVMe customer adoption is where the dollars are moving to the vendors. In other words, it’s still early in the bigger broader NVMe and NVMeoF game.

Where to learn more

Learn more about data infrastructures and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

Let’s see how those mentioned last year as well as this year, along with some new and emerging vendors, service providers who did not get said end up next year, as well as the years after that.

2018 Hot Popular New Trending Data Infrastructure Vendors to Watch
Different timelines of adoption and deployment for various audiences

Keep in mind that there is a difference between industry adoption and customer deployment, granted they are related. Likewise let’s see who will be at the top in three, five and ten years, which means some of the current top or favorite vendors may or may not be on the list, same with some of the established vendors. Meanwhile, check out the 2018 Hot Popular New Trending Data Infrastructure Vendors to Watch.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2018 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Read more »

non-volatile storage (NVS)
Posted by Thang Le Toan on 16 August 2018 05:26 AM

Non-volatile storage (NVS) is a broad collection of technologies and devices that do not require a continuous power supply to retain data or program code persistently on a short- or long-term basis.


Three common examples of NVS devices that persistently store data are tape, a hard disk drive (HDD) and a solid-state drive (SSD). The term non-volatile storage also applies to the semiconductor chips that store the data or controller program code within devices such as SSDs, HDDs, tape drives and memory modules.

Many types of non-volatile memory chips are in use today. For instance, NAND flash memory chips commonly store data in SSDs in enterprise and personal computer systems, USB sticks, and memory cards in consumer devices such as mobile telephones and digital cameras. NOR flash memory chips commonly store controller code in storage drives and personal electronic devices.

Non-volatile storage technologies and devices vary widely in the manner and speed in which they transfer data to and retrieve data or program code from a chip or device. Other differentiating factors that have a significant impact on the type of non-volatile storage a system manufacturer or user chooses include cost, capacity, endurance and latency.

For example, an SSD equipped with NAND flash memory chips can program, or write, and read data faster and at lower latency through electrical mechanisms than a mechanically addressed HDD or tape drive that uses a head to write and read data to magnetic storage media. However, the per-bit price to store data in a flash-based SSD is generally higher than the per-bit cost of an HDD or tape drive, and flash SSDs can sustain a limited number of write cycles before they wear out.

Volatile vs. non-volatile storage devices

The key difference between volatile and non-volatile storage devices is whether or not they are able to retain data in the absence of a power supply. Volatile storage devices lose data when power is interrupted or turned off. By contrast, non-volatile devices are able to keep data regardless of the status of the power source.

Common types of volatile storage include static random access memory (SRAM) and dynamic random access memory (DRAM). Manufacturers may add battery power to a volatile memory device to enable it to persistently store data or controller code.

Enterprise and consumer computing systems often use a mix of volatile and non-volatile memory technologies, and each memory type has advantages and disadvantages. For instance, SRAM is faster than DRAM and well suited to high-speed caching. DRAM is less expensive to produce and requires less power than SRAM, and manufacturers often use it to store program code that a computer needs to operate.

Comparison of non-volatile memory types

By contrast, non-volatile NAND flash is slower than SRAM and DRAM, but it is cheaper to produce. Manufacturers commonly use NAND flash memory to store data persistently in business systems and consumer devices. Storage devices such as flash-based SSDs access data at a block level, whereas SRAM and DRAM support random data access at a byte level.

Like NAND, NOR flash is less expensive to produce than volatile SRAM and DRAM. NOR flash costs more than NAND flash, but it can read data faster than NAND, making it a common choice to boot consumer and embedded devices and to store controller code in SSDs, HDDs and tape drives. NOR flash is generally not used for long-term data storage due to its poor endurance.

Trends and future directions

Manufacturers are working on additional types of non-volatile storage to try to lower the per-bit cost to store data and program code, improve performance, increase endurance levels and reduce power consumption.

For instance, manufacturers developed 3D NAND flash technology in response to physical scaling limitations of two-dimensional, or planar, NAND flash. They are able to reach higher densities at a lower cost per bit by vertically stacking memory cells with 3D NAND technology than they can by using a single layer of memory cells with planar NAND.

NVM use cases

Emerging 3D XPoint technology, co-developed by Intel Corp. and Micron Technology Inc., offers higher throughput, lower latency, greater density and improved endurance over more commonly used NAND flash technology. Intel ships 3D XPoint technology under the brand name Optane in SSDs and in persistent memory modules intended for data center use. Persistent memory modules are also known as storage class memory.

3D XPoint non-volatile technology

Micron Technology Inc.

An image of a 3D XPoint technology die.

Using non-volatile memory express (NVMe) technology over a computer's PCI Express (PCIe) bus in conjunction with flash storage and newer options such as 3D XPoint can further accelerate performance, and reduce latency and power consumption. NVMe offers a more streamlined command set to process input/output (I/O) requests with PCIe-based SSDs than the Small Computer System Interface (SCSI) command set does with Serial Attached SCSI (SAS) storage drives and the analog telephone adapter (ATA) command set does with Serial ATE (SATA) drives.

Everspin Technologies DDR3 ST-MRAM storage.

Everspin Technologies Inc.

Everspin's EMD3D064M 64 Mb DDR3 ST-MRAM in a Ball Grid Array package.

Emerging non-volatile storage technologies currently in development or in limited use include ferroelectric RAM (FRAM or FeRAM), magnetoresistive RAM (MRAM), phase-change memory (PCM), resistive RAM (RRAM or ReRAM) and spin-transfer torque magnetoresistive RAM (STT-MRAM or STT-RAM).

Read more »

SSD write cycle
Posted by Thang Le Toan on 16 August 2018 05:24 AM

An SSD write cycle is the process of programming data to a NAND flash memory chip in a solid-state storage device.

A block of data stored on a flash memory chip must be electrically erased before new data can be written, or programmed, to the solid-state drive (SSD). The SSD write cycle is also known as the program/erase (P/E) cycle.

When an SSD is new, all of the blocks are erased and new, incoming data is directly written to the flash media. Once the SSD has filled all of the free blocks on the flash storage media, it must erase previously programmed blocks to make room for new data to be written. Blocks that contain valid, invalid or unnecessary data are copied to different blocks, freeing the old blocks to be erased. The SSD controller periodically erases the invalidated blocks and returns them into the free block pool.

The background process an SSD uses to clean out the unnecessary blocks and make room for new data is called garbage collection. The garbage collection process is generally invisible to the user, and the programming process is often identified simply as a write cycle, rather than a write/erase or P/E cycle.

Why write cycles are important

A NAND flash SSD is able to endure only a limited number of write cycles. The program/erase process causes a deterioration of the oxide layer that traps electrons in a NAND flash memory cell, and the SSD will eventually become unreliable, wear out and lose its ability to store data.

The number of write cycles, or endurance, varies based on the type of NAND flash memory cell. An SSD that stores a single data bit per cell, known as single-level cell (SLC) NAND flash, can typically support up to 100,000 write cycles. An SSD that stores two bits of data per cell, commonly referred to as multi-level cell (MLC) flash, generally sustains up to 10,000 write cycles with planar NAND and up to 35,000 write cycles with 3D NAND. The endurance of SSDs that store three bits of data per cell, called triple-level cell (TLC) flash, can be as low as 300 write cycles with planar NAND and as high as 3,000 write cycles with 3D NAND. The latest quadruple-level cell (QLC) NAND will likely support a maximum of 1,000 write cycles.

Comparison of NAND flash memory

As the number of bits per NAND flash memory cell increases, the cost per gigabyte (GB) of the SSD declines. However, the endurance and the reliability of the SSD are also lower.

NAND flash writes



Common write cycle problems

Challenges that SSD manufacturers have had to address to use NAND flash memory to store data reliably over an extended period of time include cell-to-cell interference as the dies get smaller, bit failures and errors, slow data erases and write amplification.

Manufacturers have enhanced the endurance and reliability of all types of SSDs through controller software-based mechanisms such as wear-leveling algorithms, external data buffering, improved error correction code (ECC) and error management, data compression, overprovisioning, better internal NAND management and block wear-out feedback. As a result, flash-based SSDs have not worn out as quickly as users once feared they would.

Vendors commonly offer SSD warranties that specify a maximum number of device drive writes per day (DWPD) or terabytes written (TBW). DWPD is the number of times the entire capacity of the SSD can be overwritten on a daily basis during the warranty period. TBW is the total amount of data that an SSD can write before it is likely to fail. Vendors of flash-based systems and SSDs often offer guarantees of five years or more on their enterprise drives.

Manufacturers sometimes specify the type of application workload for which an SSD is designed, such as write-intensive, read-intensive or mixed-use. Some vendors allow the customer to select the optimal level of endurance and capacity for a particular SSD. For instance, an enterprise user with a high-transaction database might opt for a greater DWPD number at the expense of capacity. Or a user operating a database that does infrequent writes might choose a lower DWPD and a higher capacity.

Read more »

cache memory
Posted by Thang Le Toan on 16 August 2018 05:11 AM

Cache memory, also called CPU memory, is high-speed static random access memory (SRAM) that a computer microprocessor can access more quickly than it can access regular random access memory (RAM). This memory is typically integrated directly into the CPU chip or placed on a separate chip that has a separate bus interconnect with the CPU. The purpose of cache memory is to store program instructions and data that are used repeatedly in the operation of programs or information that the CPU is likely to need next. The computer processor can access this information quickly from the cache rather than having to get it from computer's main memory. Fast access to these instructions increases the overall speed of the program.

As the microprocessor processes data, it looks first in the cache memory. If it finds the instructions or data it's looking for there from a previous reading of data, it does not have to perform a more time-consuming reading of data from larger main memory or other data storage devices. Cache memory is responsible for speeding up computer operations and processing.

Once they have been opened and operated for a time, most programs use few of a computer's resources. That's because frequently re-referenced instructions tend to be cached. This is why system performance measurements for computers with slower processors but larger caches can be faster than those for computers with faster processors but less cache space.

This CompTIA A+ video tutorial explains
cache memory.

Multi-tier or multilevel caching has become popular in server and desktop architectures, with different levels providing greater efficiency through managed tiering. Simply put, the less frequently certain data or instructions are accessed, the lower down the cache level the data or instructions are written.

Implementation and history

Mainframes used an early version of cache memory, but the technology as it is known today began to be developed with the advent of microcomputers. With early PCs, processor performance increased much faster than memory performance, and memory became a bottleneck, slowing systems.

In the 1980s, the idea took hold that a small amount of more expensive, faster SRAM could be used to improve the performance of the less expensive, slower main memory. Initially, the memory cache was separate from the system processor and not always included in the chipset. Early PCs typically had from 16 KB to 128 KB of cache memory.

With 486 processors, Intel added 8 KB of memory to the CPU as Level 1 (L1) memory. As much as 256 KB of external Level 2 (L2) cache memory was used in these systems. Pentium processors saw the external cache memory double again to 512 KB on the high end. They also split the internal cache memory into two caches: one for instructions and the other for data.

Processors based on Intel's P6 microarchitecture, introduced in 1995, were the first to incorporate L2 cache memory into the CPU and enable all of a system's cache memory to run at the same clock speed as the processor. Prior to the P6, L2 memory external to the CPU was accessed at a much slower clock speed than the rate at which the processor ran, and slowed system performance considerably.

Early memory cache controllers used a write-through cache architecture, where data written into cache was also immediately updated in RAM. This approached minimized data loss, but also slowed operations. With later 486-based PCs, the write-back cache architecture was developed, where RAM isn't updated immediately. Instead, data is stored on cache and RAM is updated only at specific intervals or under certain circumstances where data is missing or old.

Cache memory mapping

Caching configurations continue to evolve, but cache memory traditionally works under three different configurations:

  • Direct mapped cache has each block mapped to exactly one cache memory location. Conceptually, direct mapped cache is like rows in a table with three columns: the data block or cache line that contains the actual data fetched and stored, a tag with all or part of the address of the data that was fetched, and a flag bit that shows the presence in the row entry of a valid bit of data.
  • Fully associative cache mapping is similar to direct mapping in structure but allows a block to be mapped to any cache location rather than to a prespecified cache memory location as is the case with direct mapping.
  • Set associative cache mapping can be viewed as a compromise between direct mapping and fully associative mapping in which each block is mapped to a subset of cache locations. It is sometimes called N-way set associative mapping, which provides for a location in main memory to be cached to any of "N" locations in the L1 cache.

Format of the cache hierarchy

Cache memory is fast and expensive. Traditionally, it is categorized as "levels" that describe its closeness and accessibility to the microprocessor.

cache memory diagram

L1 cache, or primary cache, is extremely fast but relatively small, and is usually embedded in the processor chip as CPU cache.

L2 cache, or secondary cache, is often more capacious than L1. L2 cache may be embedded on the CPU, or it can be on a separate chip or coprocessor and have a high-speed alternative system bus connecting the cache and CPU. That way it doesn't get slowed by traffic on the main system bus.

Level 3 (L3) cache is specialized memory developed to improve the performance of L1 and L2. L1 or L2 can be significantly faster than L3, though L3 is usually double the speed of RAM. With multicore processors, each core can have dedicated L1 and L2 cache, but they can share an L3 cache. If an L3 cache references an instruction, it is usually elevated to a higher level of cache.

In the past, L1, L2 and L3 caches have been created using combined processor and motherboard components. Recently, the trend has been toward consolidating all three levels of memory caching on the CPU itself. That's why the primary means for increasing cache size has begun to shift from the acquisition of a specific motherboard with different chipsets and bus architectures to buying a CPU with the right amount of integrated L1, L2 and L3 cache.

Contrary to popular belief, implementing flash or more dynamic RAM (DRAM) on a system won't increase cache memory. This can be confusing since the terms memory caching (hard disk buffering) and cache memory are often used interchangeably. Memory caching, using DRAM or flash to buffer disk reads, is meant to improve storage I/O by caching data that is frequently referenced in a buffer ahead of slower magnetic disk or tape. Cache memory, on the other hand, provides read buffering for the CPU.

Specialization and functionality

In addition to instruction and data caches, other caches are designed to provide specialized system functions. According to some definitions, the L3 cache's shared design makes it a specialized cache. Other definitions keep instruction caching and data caching separate, and refer to each as a specialized cache.

Translation lookaside buffers (TLBs) are also specialized memory caches whose function is to record virtual address to physical address translations.

Still other caches are not, technically speaking, memory caches at all. Disk caches, for instance, can use RAM or flash memory to provide data caching similar to what memory caches do with CPU instructions. If data is frequently accessed from disk, it is cached into DRAM or flash-based silicon storage technology for faster access time and response.

SSD caching vs. primary storage
SSD caching vs. primary storage
Current Time 0:00
Duration Time 3:00
SSD caching vs. primary storage

Dennis Martin, founder and president of Demartek LLC, explains the pros and cons of using solid-state drives as cache and as primary storage.

Specialized caches are also available for applications such as web browsers, databases, network address binding and client-side Network File System protocol support. These types of caches might be distributed across multiple networked hosts to provide greater scalability or performance to an application that uses them.


The ability of cache memory to improve a computer's performance relies on the concept of locality of reference. Locality describes various situations that make a system more predictable, such as where the same storage location is repeatedly accessed, creating a pattern of memory access that the cache memory relies upon.

There are several types of locality. Two key ones for cache are temporal and spatial. Temporal locality is when the same resources are accessed repeatedly in a short amount of time. Spatial locality refers to accessing various data or resources that are in close proximity to each other.

Cache vs. main memory

DRAM serves as a computer's main memory, performing calculations on data retrieved from storage. Both DRAM and cache memory are volatile memories that lose their contents when the power is turned off. DRAM is installed on the motherboard, and the CPU accesses it through a bus connection.

Dynamic RAM


An example of dynamic RAM.

DRAM is usually about half as fast as L1, L2 or L3 cache memory, and much less expensive. It provides faster data access than flash storage, hard disk drives (HDDs) and tape storage. It came into use in the last few decades to provide a place to store frequently accessed disk data to improve I/O performance.

DRAM must be refreshed every few milliseconds. Cache memory, which also is a type of random access memory, does not need to be refreshed. It is built directly into the CPU to give the processor the fastest possible access to memory locations, and provides nanosecond speed access time to frequently referenced instructions and data. SRAM is faster than DRAM, but because it's a more complex chip, it's also more expensive to make.

Comparison of memory types

Cache vs. virtual memory

A computer has a limited amount of RAM and even less cache memory. When a large program or multiple programs are running, it's possible for memory to be fully used. To compensate for a shortage of physical memory, the computer's operating system (OS) can create virtual memory.

To do this, the OS temporarily transfers inactive data from RAM to disk storage. This approach increases virtual address space by using active memory in RAM and inactive memory in HDDs to form contiguous addresses that hold both an application and its data. Virtual memory lets a computer run larger programs or multiple programs simultaneously, and each program operates as though it has unlimited memory.

Virtual memory in the memory hierarchy
Where virtual memory fits in the memory hierarchy.

In order to copy virtual memory into physical memory, the OS divides memory into pagefiles or swap files that contain a certain number of addresses. Those pages are stored on a disk and when they're needed, the OS copies them from the disk to main memory and translates the virtual addresses into real addresses.

Read more »

all-flash array (AFA)
Posted by Thang Le Toan on 16 August 2018 05:09 AM

An all-flash array (AFA), also known as a solid-state storage disk system, is an external storage array that uses only flash media for persistent storage. Flash memory is used in place of the spinning hard disk drives (HDDs) that have long been associated with networked storage systems.

Vendors that sell all-flash arrays usually allow customers to mix flash and disk drives in the same chassis, a configuration known as a hybrid array. However, those products often represent the vendor's attempt to retrofit an existing disk array by replacing some of the media with flash.

All-flash array design: Retrofit or purpose-built

Other vendors sell purpose-built systems designed natively from the ground up to only support flash. These models also embed a broad range of software-defined storage features to manage data on the array.

A defining characteristic of an AFA is the inclusion of native software services that enable users to perform data management and data protection directly on the array hardware. This is different from server-side flash installed on a standard x86 server. Inserting flash storage into a server is much cheaper than buying an all-flash array, but it also requires the purchase and installation of third-party management software to supply the needed data services.

Leading all-flash vendors have written algorithms for array-based services for data management, including clones, compression and deduplication -- either an inline or post-process operation -- snapshots, replication, and thin provisioning.

As with its disk-based counterpart, an all-flash array provides shared storage in a storage area network (SAN) or network-attached storage (NAS) environment.

How an all-flash array differs from disk

Flash memory, which has no moving parts, is a type of nonvolatile memory that can be erased and reprogrammed in units of memory called blocks. It is a variation of erasable programmable read-only memory (EEPROM), which got its name because the memory blocks can be erased with a single action, or flash. A flash array can transfer data to and from solid-state drives (SSDs) much faster than electromechanical disk drives.

The advantage of an all-flash array, relative to disk-based storage, is full bandwidth performance and lower latency when an application makes a query to read the data. The flash memory in an AFA typically comes in the form of SSDs, which are similar in design to an integrated circuit.

Pure FlashBlade

Pure Storage

Image of a Pure Storage FlashBlade enterprise storage array

Flash is more expensive than spinning disk, but the development of multi-level cell (MLC) flash, triple-level cell (TLC) NAND flash and 3D NAND flash has lowered the cost. These technologies enable greater flash density without the cost involved in shrinking NAND cells.

MLC flash is slower and less durable than single-level cell (SLC) flash, but companies have developed software that improves its wear level to make MLC acceptable for enterprise applications. SLC flash remains the choice for applications with the highest I/O requirements, however. TLC flash reduces the price more than MLC, although it also comes with performance and durability tradeoffs that can be mitigated with software. Vendor products that support TLC SSDs include the Dell EMC SC Series and Kaminario K2 arrays.

Considerations for buying an all-flash array

Deciding to buy an AFA involves more than simple comparisons of vendor products. An all-flash array that delivers massive performance increases to a specific set of applications may not provide equivalent benefits to other workloads. For example, running virtualized applications in flash with inline data deduplication and compression tends to be more cost-effective than flash that supports streaming media in which unique files are uncompressible.

An all-SSD system will produce smaller variations than that of an HDD array in maximum, minimum and average latencies. This makes flash a good fit for most read-intensive applications.

The tradeoff comes in write amplification, which relates to how an SSD will rewrite data to erase an entire block. Write-intensive workloads require a special algorithm to collect all the writes on the same block of the SSD, thus ensuring the software always writes multiple changes to the same block.

Garbage collection can present a similar issue with SSDs. A flash cell can only withstand a limited number of writes, so wear leveling can be used to increase flash endurance. Most vendors design their all-flash systems to minimize the impact of garbage collection and wear leveling, although users with write-intensive workloads may wish to independently test a vendor's array to determine the best configuration.

Despite paying a higher upfront price for the system, users who buy an AFA may see the cost of storage decline over time. This is tied to an all-flash array's increased CPU utilization, which means an organization will need to buy fewer application servers.

The physical size of an AFA is smaller than that of a disk array, which lowers the rack count. Having fewer racks in a system also reduces the heat generated and the cooling power consumed in the data center.

All-flash array vendors, products and markets

Flash was first introduced as a handful of SSDs in otherwise all-HDD systems with the purpose to create a small flash tier to accelerate a few critical applications. Thus was born the hybrid flash array.

The next phase of evolution arrived with the advent of software that enabled an SSD to serve as a front-end cache for disk storage, extending the benefit of faster performance across all the applications running on the array.

The now-defunct vendor Fusion-io was an early pioneer of fast flash. Launched in 2005, Fusion-io sold Peripheral Component Interface Express (PCIe) cards packed with flash chips. Inserting the PCIe flash cards in server slots enabled a data center to boost the performance of traditional server hardware. Fusion-io was acquired by SanDisk in 2014, which itself was subsequently acquired by Western Digital Corp.

Also breaking ground early was Violin, whose systems -- designed with custom-built silicon -- gained customers quickly, fueling its rise in public markets in 2013. By 2017, Violin was surpassed by all-flash competitors whose arrays integrated sophisticated software data services. After filing for bankruptcy, the vendor was relaunched by private investors as Violin Systems in 2018, with a focus on selling all-flash storage to managed service providers.

comparison of all-flash storage arrays
Independent analyst Logan G. Harbaugh compares various all-flash arrays. This chart was created in August 2017.

All-flash array vendors, such as Pure Storage and XtremIO -- part of Dell EMC -- were among the earliest to incorporate inline compression and data deduplication, which most other vendors now include as a standard feature. Adding deduplication helped give AFAs the opportunity for price parity with storage based on cheaper rotating media.

A sampling of leading all-flash array products includes the following:

  • Dell EMC VMAX
  • Dell EMC Unity
  • Dell EMC XtremIO
  • Dell EMC Isilon NAS
  • Fujitsu Eternus AF
  • Hewlett Packard Enterprise (HPE) 3PAR StoreServ
  • HPE Nimble Storage AF series
  • Hitachi Vantara Virtual Storage Platform
  • Huawei OceanStor
  • IBM FlashSystem V9000
  • IBM Storwize 5000 and Storwize V7000F
  • Kaminario K2
  • NetApp All-Flash Fabric-Attached Array (NetApp AFF)
  • NetApp SolidFire family -- including NetApp HCI
  • Pure Storage FlashArray
  • Pure FlashBlade NAS/object storage array
  • Tegile Systems T4600 -- bought in 2017 by Western Digital
  • Tintri EC Series

Impact on hybrid arrays use cases

Falling flash prices, data growth and integrated data services have increased the appeal of all-flash arrays for many enterprises. This has led to industry speculation that all-flash storage can supplant hybrid arrays, although there remain good reasons to consider using a hybrid storage infrastructure.

HDDs offer predictable performance at a fairly low cost per gigabyte, although they use more power and are slower than flash, resulting in a high cost per IOPS. All-flash arrays also have a lower cost per IOPS, coupled with the advantages of speed and lower power consumption, but they carry a higher upfront acquisition price and per-gigabyte cost.

AFA vs. hybrid array

A hybrid flash array enables enterprises to strike a balance between relatively low cost and balanced performance. Since a hybrid array supports high-capacity disk drives, it offers greater total storage than an AFA.

All-flash NVMe and NVMe over Fabrics

All-flash arrays based on nonvolatile memory express (NVMe) flash technologies represent the next phase of maturation. The NVMe host controller interface speeds data transfer by enabling an application to communicate directly with back-end storage.

NVMe is meant to be a faster alternative to the Small Computer System Interface (SCSI) standard that transfers data between a host and a target device. Development of the NVMe standard is under the auspices of NVM Express Inc., a nonprofit organization comprising more than 100 member technology companies.

The NVMe standard is widely considered to be the eventual successor to the SAS and SATA protocols. NVMe form factors include add-in cards, U.2 2.5-inch and M.2 SSD devices.

Some of the NVMe-based products available include:

  • DataDirect Networks Flashscale
  • Datrium DVX hybrid system
  • HPE Persistent Memory
  • Kaminario K2.N
  • Micron Accelerated Solutions NVMe reference architecture
  • Micron SolidScale NVMe over Fabrics appliances
  • Pure Storage FlashArray//X
  • Tegile IntelliFlash

A handful of NVMe-flash startups are bringing products to market, as well, including:

  • Apeiron Data Systems combines NVMe drives with data services housed in field-programmable gate arrays instead of servers attached to storage arrays.
  • E8 Storage E8-D24 NVMe flash arrays replicate snapshots to attached compute servers to reduce management overhead on the array.
  • Excelero software-defined storage runs on any x86 server.
  • Mangstor MX6300 NVMe over Fabrics (NVMe-oF) storage is branded PCIe NVMe add-in cards on Dell PowerEdge servers.
  • Pavilion Data Systems-branded Pavilion Memory Array.
  • Vexata VX-100 is based on the software-defined Vexata Active Data Fabric.

Industry experts expect 2018 to usher in more end-to-end, rack-scale flash storage systems based on NVMe-oF. These systems integrate custom NVMe flash modules as a fabric in place of a bunch of NVMe SSDs.

The NVMe-oF transport mechanism enables a long-distance connection between host devices and NVMe storage devices. IBM, Kaminario and Pure Storage have publicly disclosed products to support NVMe-oF, although most storage vendors have pledged support.

All-flash storage arrays in hyper-converged infrastructure

Hyper-converged infrastructure (HCI) systems combine computing, networking, storage and virtualization resources as an integrated appliance. Most hyper-convergence products are designed to use disk as front-end storage, relying on a moderate flash cache layer to accelerate applications or to use as cold storage. For reasons related to performance, most HCI arrays were not traditionally built primarily for flash storage, although that started to change in 2017.

Now the leading HCI vendors sell all-flash versions. Among these vendors are Cisco, Dell EMC, HPE, Nutanix, Pivot3 and Scale Computing. NetApp launched an HCI product in October 2017 built around its SolidFire all-flash storage platform.

Read more »

How to use Iometer to Simulate a Desktop Workload
Posted by Thang Le Toan on 06 September 2015 10:54 PM

To deliver VDI Performance it is key to understand I/O performance  when creating your VDI architecture. Iometer is treated as the industry standard tool when you want to test load upon a storage subsystem.   While there are many tools available, Iometer’s balance between usability and function sets it out.  However, Iometer has its quirks and I’ll attempt to show exactly how you should use Iometer to get the best results, especially when testing for VDI environments. I’ll also show you how to stop people using Iometer to fool you.

 As Iometer requires almost no infrastructure, you can use it to very quickly determine the storage subsystem performance.  In steady state a desktop (VDI or RDS) I/O profile will be approximately 80/20 write/read, 80/20 random/sequential and the block size of the reads and writes will be in 4k blocks.  The block size in a real windows workload does vary between 512B and 1MB, but the vast majority will be at 4K, as Iometer does not allow a mixed block size during testing we will use a constant 4K.

That said, while Iometer is great for analysing storage subsystem performance,if you need to simulate a real world workload for your VDI environment I would recommend using tools from the likes of Login VSI or DeNamik.

 Bottlenecks for Performance in VDI

Iometer is usually run from within a windows guest which is sitting upon the storage subsystem. This means that there are many layers between it and the storage as we see below:

If we are to test the performance of the storage, the storage must be the bottleneck. This means there must be sufficient resource in all the other layers to handle the traffic.

 Impact of Provisioning Technologies on Storage Performance

If your VM is provisioned using Citrix Provisioning Services (PVS), Citrix Machine Creation Services (MCS) or VMware View Linked Clones, you will be bottlenecked by the provisioning technology.  If you test with Iometer against the C: drive of a provisioned VMs you will not get full insight of the storage performance as these three technologies fundamentally change the way I/O is treated.

 You cannot drive maximum IOPS from a single VM, it is therefore not recommended to run Iometer against these VMs when attempting to stress-test storage.

I would always add a second drive to the VM and test Iometer against a second hard drive as this by-passes the issue with PVS/MCS/Linked Clones.

In 99% of cases I would actually rather test against a ‘vanilla’ Windows 7 VM. By  this I mean a new VM installed from scratch, without it joining the domain and only having the appropriate hypervisor tools installed. Remember, Iometer is designed to test storage. By testing with a ‘vanilla’ VM environment you baseline core performance delivery. From that you can go to test a fully configured VM; and now you can understand the impact of AV filter drivers, provisioned by linked clones, or other software/agents etc. has on storage performance.

Using Iometer for VDI testing: advantages and disadvantages

Before we move on to the actual configuration setting within Iometer, I want to talk a little bit about the test file that Iometer creates to throw I/O against.  This file is called iobw.tst and is why I both love and hate Iometer.  It’s the source of Iometers biggest bugs and also it’s biggest advantage.

First, the advantage; Iometer can create any size of test file you like in order to represent the test scenario that you need.  When we talk about a single host with 100 Win 7 VMs, or 8 RDS VMs, the size of the I/O ‘working set’ must be, at a minimum, the aggregate size of the pagefiles: as this is will be the a set of unique data that will consistently be used.  So for the 100 Win 7 VMs, with 1GB RAM, this test file will be at least 100GB and for the 8 RDS VMs, with 10GB RAM, it would be at least 80GB.  The actual working set of data will probably be much higher than this, but I’m happy to recommend this as a minimum.  This means that it would be very hard for a storage array or RAID card to hold the working set in cache.  Iometer allows us to set the test file to a size that will mimic such a working set.  In practice, I’ve found that a 20GB test file is sufficient to accurately mimic a single host VDI load.  If you are still getting unexpected results from your storage, I’d try and increase the size of this test file.

Second, the disadvantage; iobw.tst is buggy.  If you resize the file without deleting, it fails to resize (without error) and if you delete the file without closing Iometer, Iometer crashes.  In addition, if you do not run Iometer as administrator, Windows 7 will put the iobw.tst file in the profile instead of the root of C:.  OK, that’s not technically Iometer’s fault, but it’s still annoying.

Recommended Configuration of Iometer for VDI workloads

 First tab (Disk Targets)

The number of workers is essentially the number of threads used to create the I/O requests, adding workers will add latency, it will also add a small amount of total I/O.  I consider 4 workers to be the best balance between latency and IOPS.

 Highlighting the computer icon means that all workers are configured simultaneously, you can check that the workers are configured correctly by highlighting the individual workers.

The second drive should be used to avoid issues with filter drivers/provisioning etc on C: (although Iometer should always be run in a ‘vanilla’ installation).

The number of sectors gives you the size of the test file, this is extremely important as is mentioned above. You can use the following website to determine the sectors/GB:

The size used in the example to get 20GB is 41943040 sectors.

 The reason for configuring 16 outstanding I/Os is similar to the number of workers as increasing I/Os will increase Latency while slightly increasing IOPS. As with workers, I think 16 is a good compromise. You can also refer to the following article regarding outstanding I/Os:

Second tab (Network Targets)

No changes are needed on the network Targets tab

 Third tab (Access Specifications)

To configure a workload that mimics a desktop, we need to create a new specification.

The new Access specification should have the following settings. This is to ensure that the tests model as closely as possible a VDI workload. The settings are:

  • 80% Write

  • 80% Random

  • 4K blocks

  • Sector Boundaries at 128K, this is probably overkill and 4K would be fine, but should eliminate any disk alignment issues.

The reason for choosing these values are too detailed to go into here, but you can refer to the following document on Windows 7 I/O:

You should then Add the access specification to the manager.

 Fifth tab (Test Setup)

I’d advise only configuring the test to run for 30 seconds, the results should be representative after that amount of time. More importantly, if you are testing your production SAN, Iometer once configured correctly will eat all of your SAN performance. Therefore, if you have other workloads on your SAN, running Iometer for a long time will severely impact them.

 Fourth tab (Results Display)

Set the Update Frequency (seconds) slider to the left so you can see the results as they happen.

 Set the ‘Results Since’ to ‘Start of Test’ which will give you a reliable average.

Both Read and Write avg. response times (Latency) are essential.

It should be noted that the csv file Iometer creates will capture all metrics while the GUI will only show six.

Save Configuration

It is recommended that you save the configuration for later use by clicking the disk icon. This will save you having to re-configure Iometer each test run you do. The file is saved as *.icf in a location of your choosing.  Or to save some time, download a preconfigured Iometer Desktop Virtualization Configuration file and load it into Iometer.

Start the test using the green flag.

Interpreting Results

Generally the higher the IOPs the better, indicated by ‘total IOPS per second’ counter above, but this must be delivered at a reasonable latency, anything under 5ms will provide a good user experience.

Given the max possible IOPS for a single spindle is 200, you should sanity check your results against predicted values. For an SSD you can get 3-15,000 IOPS depending on how empty it is and how expensive it is, so again you can sanity check your results.

You don’t need to split IOPS or throughput out into read and write because we know Iometer will be working at 80/20, as we configured in the access specification.

 How can I check someone isn’t using Iometer to trick me?

To the untrained eye Iometer can be used to show very unrepresentative results.  Here is a list of things to check when someone is showing you an Iometer result.

  • What size is the test file in Explorer? it needs to be very large (minimum 20GB), don’t check in the Iometer gui.
  • How sequential is the workload?  The more sequential, the easier it is to show better IOPS and throughput. (It should be set to a minimum of 75% random)
  • What’s the block size?  Windows has a block size of 4K, anything else is not a relevant test and probably helps out the vendor.

Read more »

Help Desk Software by Kayako