Wednesday, April 1, 2015

Moore’s Law and Data Warehouse

Gordon Moore, founder of Intel, made an observation in 1965 which stated that the number of transistors per square inch on integrated circuits had doubled every year since the integrated circuit had been invented. He predicted that this trend will continue in the foreseeable future. After more than 45 years, one can say he predicted correctly since there has been two folds increase in the processing power of computers every year.



It is a common misconception that the economics of data warehousing is possible today because of Moore’s law. It is believed that data warehousing is possible now because everything is less costly because of Moore’s law. But experts believe that the concepts of data warehousing and analytics, and not the economics, is feasible today only because of Moore’s law.


Back in 1990s, when the concept of data warehouses were emerging and being implemented, the data was just terabyte in size. With the increase in processing power, more and more data could be processed and today with the strong buzz about big data, the size of processed data has increased to petabytes. Data warehouses aren’t just bigger than a generation ago; they’re faster, support new data types, serve a wider range of business-critical functions, and are capable of providing actionable insights to anyone in the enterprise at any time or place. All of which makes the modern data warehouse more important than ever to business agility, innovation, and competitive advantage.

Below are some changes in the world of Data Warehouse, Business Intelligence and Big Data in recent times.

    1. Big data analytics in the cloud


     Hadoop, a framework and set of tools for processing very large data sets, was originally designed to work on clusters of physical machines. That has changed. Now an increasing number of technologies are available for processing data in the cloud. Examples include Amazon’s Redshift hosted BI data warehouse, Google’s BigQuery data analytics service, IBM’s Bluemix cloud platform and Amazon’s Kinesis data processing service. The future state of big data could be a hybrid of on-premises and cloud.

     2. Hadoop: The new enterprise data operating system


    Distributed analytic frameworks, such as MapReduce, are evolving into distributed resource managers that are gradually turning Hadoop into a general-purpose data operating system. With these systems enterprises can perform many different data manipulations and analytics operations by plugging them into Hadoop as the distributed file storage system. As SQL, MapReduce, in-memory, stream processing, graph analytics and other types of workloads are able to run on Hadoop with adequate performance, more businesses will use Hadoop as an enterprise data hub. The ability to run many different kinds of queries and data operations against data in Hadoop will make it a low-cost, general-purpose place to put data that enterprises want to be able to analyze.

    3. In-memory analytics


    The use of in-memory databases to speed up analytic processing is increasingly popular and highly beneficial in the right setting. Many businesses are already leveraging hybrid transaction/analytical processing (HTAP) — allowing transactions and analytic processing to reside in the same in-memory database. For systems where the user needs to see the same data in the same way many times during the day — and there’s no significant change in the data — in-memory is a waste of money. And while you can perform analytics faster with HTAP, all of the transactions must reside within the same database. The problem is that most analytics efforts today are about putting transactions from many different systems together. Just putting it all on one database goes back to this disproven belief that if you want to use HTAP for all of your analytics, it requires all of your transactions to be in one place. You still have to integrate diverse data. Moreover, bringing in an in-memory database means there’s another product to manage, secure, and figure out how to integrate and scale.

    To conclude, data warehouses have had staying power because the concept of a central data repository which is fed by dozens or hundreds of databases, applications, and other source systems. It continues to be the best, most efficient way for companies to get an enterprise-wide view of their customers, supply chains, sales and operations. For this reason, businesses that have data warehouses are  upgrading and augmenting them with technologies such as Hadoop and in-memory processing, which help the 'big data' workloads that are much more bigger than before.

    1 comment:

    1. This is an excellent information thanks for sharing latest updates please keep share more content on MSBI Online Course

      ReplyDelete