Top Cluster Considerations

One of the greatest strengths of Java Platform 2, Enterprise Edition (J2EE) application servers is the ability to scale solutions to meet increased performance and availability demands. The inherent clustering and failover capabilities built into products such as the IBM WebSphere Application Server Network Deployment Edition take care of most of the dirty work; however, there are important application considerations that can't be overlooked. If an application isn't designed for cluster awareness, functional or performance issues can surface when deploying your application to a cluster.

Expecting your application to run only in a single server Java Virtual Machine (JVM) is self-limiting, and with a little more upfront work, you can ensure cluster readiness, and be more confident that your application will scale when the need arises. This paper discusses design and implementation considerations for cluster awareness. While many of these best practices are the tried-and-true best practices for any J2EE application, you'll see how essential they become when you move to a cluster. In addition, there are more subtle considerations - often realized the hard way - that cause an application to fail in a cluster.

When an application is moved into a clustered environment, multiple versions of the application are now executing, with some type of workload management distributing and balancing the work. The application may be running on different JVMs on the same physical server or across different servers. Data, resource sharing, and timing have to be carefully designed. Subsequent requests to the same application may end up executing on different servers.

This article discusses the following areas that require special consideration when designing your application for cluster awareness.

  1. Follow user session state best practices
  2. Design configuration data access to avoid local files and in-memory updates
  3. Design application and database for data concurrency and idempotence
  4. Maximize local object accesses within tiers
  5. Leverage messaging to enable background and/or batch processing
  6. Design and partition caches
  7. Configure shared resources appropriately
  8. Use application server provided workload management and replication
  9. TEST, TEST, TEST
Follow User Session State Best Practices
HTTP is a stateless protocol requiring the user state to be stored on the server side using HTTP sessions and other J2EE components such as Stateful Session EJBs. When your application is moved into a clustered environment, different requests from the same user may go to different application servers. This means that all application servers need to have access to these types of session data to maintain availability across user requests. Application servers like WebSphere provide a sophisticated infrastructure to support sharing HTTP session data across servers. This includes performance optimizations in the workload routing and caching so that under normal operating situations, the HTTP session object is actually resident on the server that the request is sent to, eliminating the need for the runtime to repopulate the session object from the database or shared memory.

Though your application server may provide algorithms that minimize the need for a session to be repopulated from the database, any HTTP session replication and failover capability requires adherence to several best practices to perform effectively:

  1. A session should be small and avoid complex object graphs
  2. Session objects must be serializable
  3. References to unnecessary objects that are dependent on the current runtime environment should be avoided
  4. Session wrappers must call the set attribute appropriately
Some techniques that can be helpful in keeping your session small: An example of a technique using transient variables to enable WebSphere to serialize objects selectively and re-create data is discussed by Brown and Botzum in "Improving HttpSession Performance with Smart Serialization."1

Design Configuration Data to Avoid Local Files and In-Memory Updates
Another type of data that's common is application configuration data or properties. Simple single-server solutions often read configuration data from a local file and store it in a Java object. This is commonly a set of static fields or a singleton. The configuration data is then accessed and updated by the application and may be persisted back to the local file.

These techniques don't work in a cluster. Remember, the configuration data needs to be accessible by the application running on different servers, and any changes have to be visible to all running copies of the application.

Common techniques that work in a cluster include using a database or LDAP server to keep the configuration data. If your design uses a file system, you'll have to set up file replication techniques using ftp or a shared file system. Remember, if you use any of these techniques, you'll need to document the mechanism your application requires since these are not capabilities supplied by the application server runtime.

Design Application and Database for Data Concurrency and Idempotence
Once an application is clustered, there are now multiple copies of the application working against the same shared runtime data. This will usually increase the amount of concurrent access to data. It can also mean that the same user can login and make requests simultaneously from multiple browsers. This increased data concurrency can lead to invalid requests and potential data corruption. When you design your application for database access, appropriate levels of locking within transactions are required to prevent data corruption while also minimizing the lock levels to prevent deadlocks. Concurrent access deadlock problems may not show up until you move to a cluster.

For example, Message Driven Beans (MDBs) that access data in a random order with exclusive locks will work in a single application server environment, but will often be deadlocked when moved to a cluster. MDBs need to have fixed order for any data or tables that are locked exclusively. This can be done by sorting the data before accessing. Another alternative when record processing order doesn't matter is to partition the queues, allowing multiple consumers of the requests.

Another challenge for a clustered application is ID generation for database keys. If high performance is needed, one common design pattern is to add a table to the database that's used to allocate pools of keys for each cluster.


Maximize Local Object Accesses Within Tiers
When an application is running as a single instance all accesses are local. Moving to a clustered environment means, however, that accesses to EJBs, Web Services, and other "remotable" components may no longer be local. Remote object invocations are expensive compared to local calls, requiring request and return object serialization and de-serialization.

Care should be taken to ensure as much local object access as possible:

  1. Deploy all application components in the same tier, when feasible. For example, Web and EJB components in the same tier will ensure local access to EJBs when feasible. Sometimes, security or other requirements preclude this.
  2. Use the EJB "local" APIs to avoid a request and response object pass by value semantics (requiring serialization and de-serialization).
  3. Design coarse grain APIs between components to reduce application "chattiness", but at the same time ensure that only the minimum required data is sent and retrieved between components.
Leverage Messaging to Enable Background and/or Batch Processing
Performance is often the requirement that motivates moving from a single instance to a clustered environment. To further improve performance and reduce the load on contentious resources the application can be modified to perform some work asynchronously. For example, an order processing application may choose to process inventory updates in the background. When a new order is being processed, updating the inventory data during the order transaction isn't essential. Processing the inventory data synchronously actually increases the response time for the user and further burdens the database and other resources in the transaction.

In J2EE, JMS can be used in a transaction to create a durable and reliable message for background processing. MDBs are usually used for message processing and provide a high-performing mechanism for asynchronous updates. Often MDBs allow work to pile up and do updates and other processing in "batch" mode (many updates at once) to improve performance further. At times this can even be done when there's a reduced load on the application.

Design and Partition Caches for Cluster Awareness
Another important area to consider for clustered applications is caching. Caching heavily used data can significantly improve response time, as well as reduce processing time and the load on resources such as databases. Most application servers provide caching capabilities in various flavors such as EJB "read-only beans," Java object caching, Web Services caching, servlet/JSP/JSP tag caching, etc.

Some application servers support these caching facilities across a cluster. This requires complex mechanisms that provide key features such as invalidating cached data across systems when it's updated, sharing cached data across a cluster for data that's very expensive to retrieve, etc. Using the capabilities provided by your application server is the preferred approach. Leveraging these facilities can greatly improve performance and reduce interruption in processing due to failovers.

In addition, there are special considerations the designer must be aware of when leveraging application caching in a cluster. As data is updated and invalidated in one cache, there are policies governing when and how often the data is updated across the cluster of caches. Common examples include EOS (end of service) updates and TBW (time-based writes). With an EOS policy, the data is synchronized at the end of a request. With TBW multiple updates are synchronized in a batch mode at certain time intervals. This of course leads to potentially longer time frames where the data isn't synchronized. TBW provides a higher-performance alternative for application data, such as a product catalog that doesn't have high integrity requirements and isn't updated very frequently. Customer or Order data, on the other hand, would generally require a more timely approach such as EOS. So the application design must take into account the caching policies to achieve the desired behavior and performance.

Data partitioning mechanisms are provided by some application servers such as WebSphere Extended Deployment (XD). These facilities can be used to partition data access from HTTP, EJBs, JDBC, etc. to dedicated systems in the cluster. For example, in the order processing example given above, the inventory database can be partitioned and accessed by a dedicated set of EJBs. This type of partitioning can significantly reduce data concurrency and locking while providing extended options and performance for caching. Be aware that leveraging WebSphere XD data partitioning will require changes to your application.

Consider Cluster Impacts on Shared Resources
At deployment time, you have to make sure that the shared resources are configured appropriately to handle the additional load. For example, if you've configured a single application server instance to have 20 pooled connections to the back-end database for optimal performance, this number will be multiplied by each server in the cluster. Thus a cluster of five application servers would result in 100 pooled connections. This is probably too many. Be aware of this multiplicative effect for all shared resources and pools.

Another potential gotcha in a cluster with multiple physical machines is the server clock time. It's important to synchronize the system clocks to avoid unwanted behavior due to session timeouts on failover, data time-stamping, or other application-specific time processing.

Use Application Server-Provided Workload Management and Replication
Application servers like WebSphere provide capabilities for cluster scalability, including workload management and failover, HTTP session affinity, HTTP session and cache replication services, etc. As we discussed in the sections on HTTP session and caching, these services are both complex to implement and can have a significant impact on overall cluster performance. Our recommendation is to leverage the platform cluster capabilities over trying to roll-your-own solutions.

Additionally, conflicts may arise if you try to roll-your-own workload management and affinity techniques and then combine them with your application server capabilities. Except in highly unusual circumstances, use the platform capabilities.

TEST, TEST, TEST
To run a cluster successfully, an application needs to be tested in a clustered environment. This will help expose many of the subtle problems that can arise such as the ones discussed above. It's also important to test edge cases such as simultaneous logins from the same user, machine, or application server. Additionally, you'll want to test failover scenarios as described by David Purcell in "Moving to a Cluster."2

Summary
When designing your application, there is a set of best practices that should be followed so your application works and performs when moved to a cluster environment. Your application server platform may provide powerful capabilities such as workload management and session replication but if your application session data isn't serializable, or your application writes configuration data to the local file system, your application is going to fail in a cluster. So, design your application to work in a cluster and be sure to test it carefully to avoid the cluster gotchas.

Acknowledgements
The authors would like to thank Andrew Spyker, Tom Alcott, Matt Hogstrom, Ken Hygh, and Tony Tuel for sharing their first-hand experiences in how to design applications to work in a clustered environment.

References
1  Improving HttpSession Performance with Smart Serialization, Kyle Brown and Keys Botzum, IBM developerWorks, 11/23/03;
www-128.ibm.com/developerworks/websphere/library/ bestpractices/httpsession_performance_serialization.html

2  Moving to a Cluster, David Purcell, Java Developers Journal, December 2004,
http://jdj.sys-con.com/read/47354.htm

© 2008 SYS-CON Media