Live Chat Software by Kayako
 News Categories
(19)Microsoft Technet (2)StarWind (4)TechRepublic (3)ComuterTips (1)SolarWinds (1)Xangati (1) (27)VMware (5)NVIDIA (9)VDI (1)pfsense vRouter (3)VEEAM (3)Google (2)RemoteFX (1) (1)MailCleaner (1)Udemy (1)AUGI (2)AECbytes Architecture Engineering Constrution (7)VMGuru (2)AUTODESK (1) (1)Atlantis Blog (7)AT.COM (2) (1) (14) (2)hadoop360 (3)bigdatastudio (1) (1) (3)VECITA (1) (1)Palo Alto Networks (4) (2) (1)Nhịp Cầu đầu tư (3)VnEconomy (1)Reuters (1)Tom Tunguz (1) (1)Esri (1) (1)tweet (1)Tesla (1) (6)ITCNews (1) (1) Harvard Business Review (1)Haravan (2) (1) (3) (3)IBM (1) (2) (1) (6) (1) (1) (4) (1) (1) (1) (1) (1) (1) (1) (5) (4) (1) (1) (1) (1) (2) (7) (1) (1) (1) (1) (2) (1) (2) (2) (2) (1) (7) (1) (1) (1) (1) (1) (1) (1)Engenius (1) (1) (1) (1) (1) (3) (6)
RSS Feed
Big Data’s Missing Component: Data Suction Appliance
Posted by Thang Le Toan on 16 September 2015 03:33 PM
Hadoop Data

Hadoop Data


Fari Payandeh

Fari Payandeh


May I say at the outset that I know the phrase “Data Suction Appliance” sounds awkward at its best and downright awful at its worst. Honestly, I don’t feel that bad! These are some of the words used in Big Data products or company names: story, genie, curve, disco, rhythm, deep, gravity, yard, rain, hero, opera, karma… I won’t be surprised if I come across a  start-up named WeddingDB next week.

 Although there is so much hype surrounding social media data, the real goldmine is in the existing RDBMS Databases and to a lesser degree in Mainframes. The reason is obvious. Generally speaking data capture has been driven by business requirements, and not by some random tweets about where to meet for dinner.  In short, the Database vendors are sitting on top of the most valuable data.

 Oracle, IBM, and Microsoft “own” most of the data in the world. By that I mean if you run a query in any part of the world,  it’s very likely that you are reading the data from a Database owned by them. The larger the volume of data, the greater the degree of ownership; just ask anyone who has attempted to migrate 20 TB of data from Oracle to DB2. In short, they own the data because the customers are locked-in. Moreover, the real value of data is much greater than the revenues generated from the Database licenses. In all likelihood the customer will buy other software/applications from the same vendor since it’s a safe choice. From the Database vendors’ standpoint the Database is a gift that keeps on giving. Although they have competed for new customers, due to absence of external threats (Non-RDBMS technology), they have enjoyed being in a growing market that has kept them happy. Teradata, MySql (Non-Oracle flavors), Postgres, and Sybase have a small share of the overall Database market.

 The birth of Hadoop and NoSql technology represented a seismic shift that shook the RDBMS market not in terms of revenue loss/gain, but in offering an alternative to businesses . The Database vendors moved quickly to jockey for position and contrary to what some believe, I don’t think they were afraid of a meltdown. After all who was going to take their data? They responded to the market lest they be deprived of the Big Data windfall.

 IBM spent $16 billion on its Big Data portfolio and launched PureData for Hadoop; a hardware/software system composed of IBM Big Data stack. It introduced SmartCloud and recently backed Pivotal’s Cloud Foundry.  Cloud Foundry is “like an operating system for the cloud,” Andy Piper, developer advocate for Cloud Foundry at Pivotal.

 Microsoft HDInsight products integrate with Sql Server 2012, System Center, and other Microsoft products; the Azure cloud-based version integrates with Azure cloud storage and Azure Database.

 Oracle introduced Big Data appliance bundle comprising Oracle NoSql Database, Oracle Linux, Cloudera Hadoop, and Hotspot Java Virtual Machine. It also offers Oracle Cloud Computing.

 What is Data Suction Appliance? There is a huge market for a high performance data migration tool that can copy the data stored in RDBMS  Databases to Hadoop.  Currently there are no fast ways of transferring data  to Hadoop; Performance is sluggish. What I envision is data transfer at the storage layer and not Database layer. Storage vendors such as EMC and NetApp  have an advantage in finding a solution while working with Data Integration vendors like Informatica. Informatica recently partnered with VelociData, provider of hyper-scale/hyper-speed engineered solutions. Is it possible? I would think so. I know that I am simplifying the process, but this is a high level view of what I see as a possible solution. Database objects are stored at specific disk addresses. It starts with the address of an instance within which the information about the root Tablespace or Dbspace is kept. Once the root Tablespace is identified, the information about the rest of the objects (Non-root Tablespaces, tables, indexes, …) is available in Data Dictionary tables and views. This information includes the addresses of the data files. Data file headers store the addresses of free/used extents and we continue on that path until data blocks containing the target rows are identified. Next, the Data Suction Appliance bypasses the Database and bulk copies the data blocks from storage to Hadoop. Some transformations may be needed during data transfer in order to bring in the data in a way that NoSql Databases can understand, but that can be achieved through an interface which will allow the Administrators to specify the data transfer options.  The future will tell if I am dreaming or as cousin Vinny said, “The argument holds water”.

Read more »

Hadoop vs. NoSql vs. Sql vs. NewSql By Example
Posted by Thang Le Toan on 16 September 2015 03:30 PM



Fari Payandeh

Sept 8, 2013

Fari Payandeh

Although Mainframe Hierarchical Databases are very much alive today, the Relational Databases (RDBMS) (SQL) have dominated the Database market, and they have done a lot of good. The reason the money we deposit doesn’t go to someone else’s account, our airline reservation ensures that we have a seat on the plane, or we are not blamed for something we didn’t do, etc… RDBMS’ data integrity is due to its adherence to ACID (atomicity, consistency, isolation, and durability) principles. RDBMS technology dates back to the 70’s.

So what changed? Web technology started the revolution. Today, many people shop on Amazon. RDBMS was not designed to handle the number of transactions that take place on Amazon every second. The primary constraining factor was RDBMS’ schema.

NoSql Databases offered an alternative by eliminating schemas at the expense of relaxing ACID principles. Some NoSql vendors have made great strides towards resolving the issue; the solution is called eventual consistency. As for NewSql, why not create a new RDBMS minus RDBMS’ shortcomings utilizing modern programming languages and technology. That is how some of the NewSql vendors came to life.  Other NewSql companies created augmented solutions for MySql.

Hadoop is a different animal altogether. It’s a file system and not a database. Hadoop’s roots are in  internet search engines. Although Hadoop and associates (HBase, Mapreduce, Hive, Pig, Zookeeper) have turned it into a mighty database, Hadoop is an inexpensive, scalable,  distributed filesystem with fault tolerance. Hadoop’s specialty at this point in time is in batch processing, hence suitable for Data Analytics.

Now let’s start with our example: My imaginary video game company recently put our most popular game online after ten years of being in business, shipping our games to retailers around the globe. Our customer information is currently stored in a Sql Server Database  and we have been happy with it. However, since the players started playing the game online, the database is not able to keep up and the users are experiencing delays. As our user base grows rapidly, we spend money buying more and more Hardware/Software but to no avail. Losing customers is our primary concern. Where do we go from here?

We decide to run our online game application in NoSql and NewSql simultaneously by segmenting our online user base. Our objective is to find the optimal solution. The IT department selects NoSql Couchbase (document oriented like MongoDB) and NewSql VoltDB.

Couchbase is open source, has an integrated caching mechanism, and it can automatically spread data across multiple nodes. VoltDB is an ACID compliant RDBMS, fault tolerant, scales horizontally, and possesses a shared-nothing & in-memory architecture. At the end, both systems are able to deliver. I won’t go into the intricacies of each solution because this is an example and comparing these technologies in the real-world will require testing, benchmarking, and in-depth analyses.

Now that the online operations are running smoothly, we want to analyze our data to find out where we should expand our territory. Which are the most suitable countries for marketing our products?  In doing so, we need to merge the Sql Server customer Data Warehouse with the data from the online gaming database and run analytical reports. That’s where Hadoop comes in. We configure a Hadoop system and merge the data from the two data sources. Next, we use Hadoop’s  Mapreduce in conjunction with the open source R  programming language to generate the analytics reports.

Read more »

Sql Server Parallel Data Warehouse (PDW)
Posted by Thang Le Toan on 16 September 2015 03:28 PM

Sql Server Parallel Data Warehouse (PDW)

Fari Payandeh

Fari Payandeh




Unlike Hadoop and NoSql Databases, MPP is not a new technology. Yet, it is a strong contender in the “Big Data” space. Sql Server PDW appliance boasts up to 100x performance gains over legacy data warehouses. Moreover, it is a fault tolerant, horizontally scalable, high capacity RDBMS. Simply put, it is an excellent solution for companies that are wholly vested in RDBMS but need to break free of its constraining factors. We will narrow down our comparison by juxtaposing Sql Server SMP with Sql Server MPP.


Performance (Velocity)
Scalability: Throwing more hardware at Sql Server SMP will eventually hit a point of diminishing return as the size of the data sets grow. By contrast, Sql Server MPP architecture is horizontally scalable and performance grows linearly as we add more nodes (Physical Servers) to the appliance– Up to 100x performance gains over legacy data warehouses.
CPU Utilization: A Database task in Sql Server SMP is bound to only one Cpu whereas a task runs on multiple Cpu’s in Sql Server MPP
Resource Sharing: Sql Server MPP has a shared nothing architecture which allows each node to dedicate its resources to processing queries thereby avoiding resource contention and I/O bottlenecks that are caused by resource sharing.
Distributed Queries: Query execution time is reduced significantly. Each query is broken down into pieces and fed to different nodes enabling parallel processing.
Data Distribution: Sql Server MPP automatically distributes the data among different nodes. Each node processes its own data set before sending the output to the control process which in turn merges the results.
Parallel Load: Data is automatically loaded in parallel.
In-Memory Operations and Columnar Data Store : Both Sql Server SMP and Sql Server MPP support In-Memory Operations and Columnar Data Stores.

Capacity (Volume)
Sql Server SMP can handle a few Terabytes whereas Sql Server MPP can linearly scale-out to 6 Petabytes.

High Availability
Sql Server MPP is fault tolerant. Redundancy is applied to all hardware and software components of the appliance. Moreover, the appliance runs on Microsoft Hyper-V which gives the nodes failover capabilities.

Analytics Capabilities (Variety and Variability)
PDW is part of Microsoft Analytics Platform System, which supports connectivity and query access to Hadoop and unstructured data via PolyBase data querying technology.

Sql Server SMP Databases are simpler and less costly to maintain.
Sql Server MPP: Upgrades are not seamless and may require down time. Patches are applied by Microsoft.



Read more »

Help Desk Software by Kayako