Scaling Storage and Analysis of Data Using Distributed Data Grids

One of the most important new methods for overcoming performance bottlenecks for a large class of applications is data parallel programming on a distributed data grid. This method is predicted to have important applications in cloud computing over the next couple years, and eWeek Knowledge Center contributor William L. Bain describes ways in which a distributed data grid can be used to implement powerful, Java-based applications for parallel data analysis.

In current Information Age, companies must store and analyze a large amount of business data. Companies that have the ability to efficiently search data for important patterns will have a competitive edge over others. An e-commerce Web site, for example, needs to be able to monitor online shopping carts in order to see which products are selling faster than others. Another example is a financial services company, which needs to hone its equity trading strategy as it optimizes its response to rapidly changing market conditions.

Businesses facing these challenges have turned to distributed data grids (also called distributed caches) in order to scale their ability to manage rapidly changing data and sort through data to identify patterns and trends that require a quick response. A few key advantages are offered by distributed data grids.

Distributed data grids store memory instead of on a disk for quick access. Additionally, they run seamlessly across various servers to scale performance. Lastly, they provide a quick, easy-to-use platform for running “what if” analyses on the data they store. They can take performance to a level unable to be matches by stand-alone database serves by breaking the sequential bottleneck.

Three simple steps for building a fast, scalable data storage and analysis solution:

1. Store rapidly changing business data directly in a distributed data grid rather than on a database server

Distributed data grids are designed to plug directly into the business logic of today’s enterprise application and services. They match the in-memory view of data already used by business logic by storing data as collections of objects rather than relational database tables. Because of this, distributed data grids are easy to integrate into existing applications using simple APIs (which are available for most modern languages like Java, C# and C++).

Distributed data grids run on server farms, thus their storage capacity and throughput scale just by adding more grid servers. A distributed data grid’s ability to store and quickly access large quantities of data can expand beyond a stand-alone database server when hosted on a large server farm or in the cloud.

2. Integrate the distributed data grid with database servers in an overall storage strategy

Distributed data grids are used to complement, not replace data servers, which are the authoritative repositories for transactional data and long-term storage. With an e-commerce Web site, for example, a distributed data grid would hold shopping carts to efficiently manage a large workload of online shopping traffic. A back-end database server would meanwhile store completed transactions, inventory and customer records.

Carefully separating application code used for business logic from other code used for data access is an important factor to integrating a distributed data grid into an enterprise application’s overall strategy. Distributed data grids naturally fit into business logic, which manages data as objects. This code is where rapid access to data is required and also where distributed data grids provide the greatest benefit. The data access layer, in contract, usually focuses on converting objects into a relational form for storage in database servers (or vice versa).

A distributed data grid can be integrated with a database server so that it can automatically access data from the database server if it is missing from the distributed data grid. This is incredibly useful for certain types of data such as product or customer information (stored in the database server and retrieved when needed by the application). Most types of rapidly changing, business logic data, however, can be stored solely in a distributed data grid without ever being written out to a database server.

3. Analyze grid-based data by using simple analysis codes as well as the MapReduce programming pattern

After a collection of objects, such as a Web site’s shopping carts, has been hosted in a distributed data grid, it is important to be able to scan this data for patterns and trends. Researchers have developed a two-step method called MapReduce for analyzing large volumes of data in parallel.

As the first step, each object in the collection is analyzed for a pattern of interest by writing and running a simple algorithm that assesses each object one at a time. This algorithm is run in parallel on all objects to analyze all of the data quickly. The results that were generated by running this algorithm are next combined to determine an overall result (which will hopefully identify an important trend).

Take an e-commerce developer, for example. The developer could write a simple code which analyzes each shopping cart to rate which product categories are generating the most interest. This code could be run on all shopping carts throughout the day in order to identify important shopping trends.

Using this MapReduce programming pattern, distributed data grids offer an ideal platform for analyzing data. Distributed data grids store data as memory-based objects, and thus the analysis code is easy to write and debug as a simple “in-memory” code. Programmers don’t need to learn parallel programming techniques nor understand how the grid works. Distributed data grids also provide the infrastructure needed to automatically run this analysis code on all grid servers in parallel and then combine the results. By using a distributed data grid, the net result is that the application developer can easily and quickly harness the full scalability of the grid to quickly discover data patterns and trends that are important to the success of an enterprise. For more information, please visit www.nubifer.com.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: