A walk-through of Microsoft BI & Big Data

In the atmosphere of the upcoming SQL Server 2012 Virtual Launch Event in March 7, 2012, I was so HAPPY to receive a box of AWESOME goodies from Microsoft BI & SQL Server teams in Redmond. This was after winning several online contests, which they were organizing.

My “Data Love” will be witnessed merely when I enjoy utilizing these SQL Server & MS BI branded goodies. Yes! Keeping in mind that Microsoft BI is now considered a major player in the BI market.

So, what are the great things happening in the Microsoft BI & Big Data world ?

I’ll answer this question my giving a quick walk-through of the new and interesting features that’s coming with SQL Server 2012 and the new Microsoft Big Data products.

Big Data & Hadoop on Azure:

OK, now you can see below the Big Picture of Microsoft Big Data and I will walk-through to show the new Big Data products from Microsoft. As per my previous blog post, I stated that Big Data architecture is never Silo and it will work together with the Data warehouse and traditional data sources in a single information supply chain and in the same Enterprise architecture.

In the architecture above, it’s clear that Microsoft adapted this vision and released a Hadoop connector for SQL Server and Hadoop connector for SQL Server Parallel Data Warehouse. This will ensure a movement of large data volumes of different types (structured & unstructured) in the enterprise.

Furthermore, Microsoft announced in the latest PASS conference 2011 their plans to deliver a Hadoop based distribution for Windows Sever & Hadoop based service for Windows Azure. Hadoop on Azure has a user-friendly Metro-style UI which facilitate executing JavaScript MapReduce, Pig-Latin or Hive jobs against your Hadoop cluster right from your browser. In addition, you can analyze Hadoop data with familiar tools such as Excel, thanks to a Hive ODBC Driver and Hive Add-in for Excel. As I mentioned earlier, Hadoop is integrated with the enterprise architecture in a way that helps in building corporate BI solutions including Hadoop data, through integration of Hive and leading BI tools such as SQL Server Analysis Services , Reporting Services and self-service tools like PowerPivot and Power View.

What’s interesting about that is a commitment from Microsoft to engage with the Hadoop open source community. Also, they made a strategic partnership with Hortonworks. Hortonworks was formed by the key architects and core Hadoop committers from the Yahoo! Hadoop software engineering team in June 2011 and the team is a major driving force behind the next generation of Apache Hadoop. Adding to this, they also announced an upcoming SQL Server ODBC Driver for Linux. YES, now you can access SQL Server straight from your Linux OS! That’s absolutely a smart move in Microsoft’s strategy. It’s also showing a potential in interoperability of Microsoft platforms.

Hadoop proved that it’s good as an ETL for huge volumes of unstructured data, as well as petabyte-scale log processing of event data (i.e. social media sentiment analysis, event-based marketing etc…), and perfect for running MapReduce jobs. However, a data warehouse is still the perfect solution in providing interactive performance through BI tools and managing structured data. I won’t explore more in this topic and I’d rather postpone it to upcoming blog posts. To summarize, both Hadoop and a data warehouse are great for our enterprise information management strategy but we should learn when to use which. You can try the Hadoop on Azure CTP now by filling this survey.

Self-service BI & BigData tools ( Power View, code-name Data Explorer, Data mining add-in) :

Now I’ll move from Hadoop to the end-user tools and the self-service BI trend. I will start with data visualization because I simply love this field! Microsoft released recently Power View (formerly code-name Crescent) which is an interactive data exploration and visual presentation experience. It’s fully integrated with PowerPivot & Business Intelligence Semantic Model. It’s also interactive in a way that enables you to play full screen live & interactive boardroom presentations integrated in PowerPoint! You can view below a demo I created in October 2011 for Power View, however, several additional features were added and you will see the final version in SQL Server 2012 soon.

Another awesome Big Data product is Microsoft code-name Data Explorer which is a data mash-up ETL self-service tool. It provides an innovative way to aggregate data from different sources (e.g. SQL Server databases, Windows Azure Marketplace,spreadsheets) in order to provide meaningful insights. I designed this funny poster in the left to perceive the powerful & rich insights we can get after a data mash-up experiment 😉

The idea of data mash-up is quite important and can be modeled as “Who else has data that would make your data better? “ If you checked the new Google Privacy Policy you could notice that their business model is about aggregating or mashing-up everything they know about you in order to provide you tailored features.

As you can see, we’re all participating in creating data especially in blogs, Facebook, twitter, etc… This data participation will definitely change the world in terms of health care, education, environment, business, etc… To illustrate, a doctor in the future will be able to explore a visualization comparing your unique health profile to a database of millions of worldwide patients and correlate the symptoms, traits, diseases. Thus, he can identify what will mostly work for your condition while you are in the operating table.

Finally, another great self-service tool is the Data mining Excel add-in which is great move in predictive analytics. It’s about analyzing patterns that happened in the past that tell me about the future. This product is useful for several scenarios like market basket analysis, churn analysis, campaign analysis, text analysis etc…

In-Memory Computing:

In a Microsoft atmosphere, excuse me to go back 3 months ago when I watched an SAP keynote introducing a new equation inspired by the famous Einstein formula E= mc ² . SAP’s cool equation is

E= mc (i mc) ² where E: Enterprise, m: mobile, c: cloud computing, and i mc: in-memory computing.

Interesting right? Well, that’s almost the future and SAP is doing good job with their SAP HANA in-memory appliance.

You will see two products in SQL Server 2012 which are based on in-memory computing.

Vertipaq, a SQL Server Analysis Services deployment utilizing in-memory processing RAM to simulate traditional disk based UDM cubes. It’s utilized by PowerPivot for SharePoint & Excel , the upcoming Business Intelligence Semantic Model and Power View in SQL Server 2012. The advantage of this new in-memory storage architecture over traditional disk based SSAS storage is that data retrieval and calculations happen at a much faster rate, as disk I/O processing is omitted. We are now able to hold in-memory an entire database table representing millions of rows due to the strict decrease in RAM costs and increase in its specification among personal computers.

Another product is Business Intelligence Semantic Model (BISM) which is a relational based model relying on in-memory processing to replicate the functionality of a traditional UDM analysis cube. However, it won’t replace the UDM cube.

Let’s think of the future of in-memory computing where we can have instance analysis of massive & Nano-second fast changing data.

Not only Big Data will be a focus for Microsoft BI, they are also showing a promise for Mobile BI. Their vision is : ” Deliver highly interactive and immersive BI experiences across different devices to all users wherever they are” . They announced 3 phases to deliver their BI solutions in iOS systems, WP7 devices, and Windows 8. Here is the detailed list of the phases :

Phase 1 – 1HCY12:

Enable their existing SharePoint based BI assets (SSRS Operational report, Excel Services,
Performance Point) to run on various browsers (including IOS).

Phase 2 – 2HCY12:

Provide touch based and touch optimized experiences on multiple devices. Prototypes
of these were shown in PASS conference 2011 on a set of WP7, Android and iPad devices.

Phase 3 – Windows 8 Wave:

Provide an immersive experience on Windows 8 device.

Microsoft released last week the Kinect for Windows SDK. But how can we imagine the possibilities of this release in the BI sector? I will leave you with the inspiring video below which shows how to explore Power View in any wall after we turn it into a touch screen. It was developed by using the Skeletal Tracking feature of the SDK & Microsoft Speech for recognition.