One of the greatest strengths of Java Platform 2, Enterprise Edition (J2EE) application servers is the ability to scale solutions to meet increased performance and availability demands. The inherent clustering and failover capabilities built into products such as the IBM WebSphere Application Server Network Deployment Edition take care of most of the dirty work; however, there are important application considerations that can't be overlooked. If an application isn't designed for cluster awareness, functional or performance issues can surface when deploying your application to a cluster.
Expecting your application to run only in a single server Java Virtual Machine (JVM) is self-limiting, and with a little more upfront work, you can ensure cluster readiness, and be more confident that your application will scale when the need arises. This paper discusses design and implementation considerations for cluster awareness. While many of these best practices are the tried-and-true best practices for any J2EE application, you'll see how essential they become when you move to a cluster. In addition, there are more subtle considerations - often realized the hard way - that cause an application to fail in a cluster.
When an application is moved into a clustered environment, multiple versions of the application are now executing, with some type of workload management distributing and balancing the work. The application may be running on different JVMs on the same physical server or across different servers. Data, resource sharing, and timing have to be carefully designed. Subsequent requests to the same application may end up executing on different servers.
This article discusses the following areas that require special consideration when designing your application for cluster awareness.
Though your application server may provide algorithms that minimize the need for a session to be repopulated from the database, any HTTP session replication and failover capability requires adherence to several best practices to perform effectively:
Design Configuration Data to Avoid Local Files and In-Memory Updates
Another type of data that's common is application configuration data or properties. Simple single-server solutions often read configuration data from a local file and store it in a Java object. This is commonly a set of static fields or a singleton. The configuration data is then accessed and updated by the application and may be persisted back to the local file.
These techniques don't work in a cluster. Remember, the configuration data needs to be accessible by the application running on different servers, and any changes have to be visible to all running copies of the application.
Common techniques that work in a cluster include using a database or LDAP server to keep the configuration data. If your design uses a file system, you'll have to set up file replication techniques using ftp or a shared file system. Remember, if you use any of these techniques, you'll need to document the mechanism your application requires since these are not capabilities supplied by the application server runtime.
Design Application and Database for Data Concurrency and Idempotence
Once an application is clustered, there are now multiple copies of the application working against the same shared runtime data. This will usually increase the amount of concurrent access to data. It can also mean that the same user can login and make requests simultaneously from multiple browsers. This increased data concurrency can lead to invalid requests and potential data corruption. When you design your application for database access, appropriate levels of locking within transactions are required to prevent data corruption while also minimizing the lock levels to prevent deadlocks. Concurrent access deadlock problems may not show up until you move to a cluster.
For example, Message Driven Beans (MDBs) that access data in a random order with exclusive locks will work in a single application server environment, but will often be deadlocked when moved to a cluster. MDBs need to have fixed order for any data or tables that are locked exclusively. This can be done by sorting the data before accessing. Another alternative when record processing order doesn't matter is to partition the queues, allowing multiple consumers of the requests.
Another challenge for a clustered application is ID generation for database keys. If high performance is needed, one common design pattern is to add a table to the database that's used to allocate pools of keys for each cluster.
Maximize Local Object Accesses Within Tiers
When
an application is running as a single instance all accesses are local.
Moving to a clustered environment means, however, that accesses to
EJBs, Web Services, and other "remotable" components may no longer be
local. Remote object invocations are expensive compared to local calls,
requiring request and return object serialization and de-serialization.
Care should be taken to ensure as much local object access as possible:
In J2EE, JMS can be used in a transaction to create a durable and reliable message for background processing. MDBs are usually used for message processing and provide a high-performing mechanism for asynchronous updates. Often MDBs allow work to pile up and do updates and other processing in "batch" mode (many updates at once) to improve performance further. At times this can even be done when there's a reduced load on the application.
Design and Partition Caches for Cluster Awareness
Another important area to consider for clustered applications is
caching. Caching heavily used data can significantly improve response
time, as well as reduce processing time and the load on resources such
as databases. Most application servers provide caching capabilities in
various flavors such as EJB "read-only beans," Java object caching, Web
Services caching, servlet/JSP/JSP tag caching, etc.
Some application servers support these caching facilities across a cluster. This requires complex mechanisms that provide key features such as invalidating cached data across systems when it's updated, sharing cached data across a cluster for data that's very expensive to retrieve, etc. Using the capabilities provided by your application server is the preferred approach. Leveraging these facilities can greatly improve performance and reduce interruption in processing due to failovers.
In addition, there are special considerations the designer must be aware of when leveraging application caching in a cluster. As data is updated and invalidated in one cache, there are policies governing when and how often the data is updated across the cluster of caches. Common examples include EOS (end of service) updates and TBW (time-based writes). With an EOS policy, the data is synchronized at the end of a request. With TBW multiple updates are synchronized in a batch mode at certain time intervals. This of course leads to potentially longer time frames where the data isn't synchronized. TBW provides a higher-performance alternative for application data, such as a product catalog that doesn't have high integrity requirements and isn't updated very frequently. Customer or Order data, on the other hand, would generally require a more timely approach such as EOS. So the application design must take into account the caching policies to achieve the desired behavior and performance.
Data partitioning mechanisms are provided by some application servers such as WebSphere Extended Deployment (XD). These facilities can be used to partition data access from HTTP, EJBs, JDBC, etc. to dedicated systems in the cluster. For example, in the order processing example given above, the inventory database can be partitioned and accessed by a dedicated set of EJBs. This type of partitioning can significantly reduce data concurrency and locking while providing extended options and performance for caching. Be aware that leveraging WebSphere XD data partitioning will require changes to your application.
Consider Cluster Impacts on Shared Resources
At
deployment time, you have to make sure that the shared resources are
configured appropriately to handle the additional load. For example, if
you've configured a single application server instance to have 20
pooled connections to the back-end database for optimal performance,
this number will be multiplied by each server in the cluster. Thus a
cluster of five application servers would result in 100 pooled
connections. This is probably too many. Be aware of this multiplicative
effect for all shared resources and pools.
Another potential gotcha in a cluster with multiple physical machines is the server clock time. It's important to synchronize the system clocks to avoid unwanted behavior due to session timeouts on failover, data time-stamping, or other application-specific time processing.
Use Application Server-Provided Workload Management and Replication
Application servers like WebSphere provide capabilities for cluster
scalability, including workload management and failover, HTTP session
affinity, HTTP session and cache replication services, etc. As we
discussed in the sections on HTTP session and caching, these services
are both complex to implement and can have a significant impact on
overall cluster performance. Our recommendation is to leverage the
platform cluster capabilities over trying to roll-your-own solutions.
Additionally, conflicts may arise if you try to roll-your-own workload management and affinity techniques and then combine them with your application server capabilities. Except in highly unusual circumstances, use the platform capabilities.
TEST, TEST, TEST
To run a cluster successfully, an
application needs to be tested in a clustered environment. This will
help expose many of the subtle problems that can arise such as the ones
discussed above. It's also important to test edge cases such as
simultaneous logins from the same user, machine, or application server.
Additionally, you'll want to test failover scenarios as described by
David Purcell in "Moving to a Cluster."2
Summary
When designing your application, there is
a set of best practices that should be followed so your application
works and performs when moved to a cluster environment. Your
application server platform may provide powerful capabilities such as
workload management and session replication but if your application
session data isn't serializable, or your application writes
configuration data to the local file system, your application is going
to fail in a cluster. So, design your application to work in a cluster
and be sure to test it carefully to avoid the cluster gotchas.
Acknowledgements
The authors would like to thank
Andrew Spyker, Tom Alcott, Matt Hogstrom, Ken Hygh, and Tony Tuel for
sharing their first-hand experiences in how to design applications to
work in a clustered environment.
References
1 Improving HttpSession Performance with Smart Serialization, Kyle Brown and Keys Botzum, IBM developerWorks, 11/23/03;
www-128.ibm.com/developerworks/websphere/library/
bestpractices/httpsession_performance_serialization.html
2 Moving to a Cluster, David Purcell, Java Developers Journal, December 2004,
http://jdj.sys-con.com/read/47354.htm