Java and the relational database Part II

In the first article we briefly told you about the basics of Spring AOP and Hibernate Cache; this time we will look at the basics of concurrent access to data. Concurrency is one of the biggest problems, and not only in the context of data access. On the other hand, you cannot imagine web projects without multithreading, and certainly not as big ones as Comarch Corporate Banking. Fortunately, most of the hard work related to multithreading is done for us by a database and a web container together with Spring or an application server (for example IBM WebSphere) – which we did not want to use in our project though. Thanks to this we could focus on writing the business logic, not synchronizing resources. Of course, sometimes there are complications and it is worth knowing how to deal with them.

Database access concurrency

There are two main mechanisms in relational databases that help you deal with problems resulting from simultaneous access to the same objects: isolation and locking (optimistic and pessimistic).

Isolation level

The first one is the isolation level, which is usually defined within the whole application and does not change through its life cycle.  If necessary, however, the default behavior can be redefined, e.g. by using org.springframework.transaction.annotation.Transactional from Spring.

Different database systems may approach isolation in a slightly different way. However, the most commonly used level is READ COMMITTED, which protects against dirty reads, i.e. against reading data not yet approved by other transactions. This is kind of a compromise between data availability, processing speed and consistency between transactions.

Locking

Locking records allows you to eliminate the lost update phenomenon. However, it should be noted that the lost update phenomenon can have two meanings:
  1. When two transactions try to perform an update one after the other (when both are not approved) – the database systems themselves prevent this.
  2. When the first transaction retrieves data, the second transaction modifies the same data and validates the transaction, the first transaction modifies the previously retrieved data and saves it – this problem is addressed further in this article.

Optimistic locking

In most cases, especially when there is a significant advantage of readings, optimistic locking based on entity versioning works better. In this mode, data in the database is still available to other users (transactions) – so it does not cause blocking the process currently being processed. Consistency verification takes place only during saving. Hibernate is able to request a version of an entity (using @Version annotation), and the version boosting takes place just before the transaction is approved. When transactions try to validate outdated data (with a version older than the entity version in the database), the OptimisticLockException will be returned and the transaction itself will be withdrawn. This protects data already stored in the database, but may cause the losing the effects of the current transaction. To deal with this, it would be necessary to retrieve an entity from the database and update it correctly, you can also change the locking method to pessimistic.

Pessimistic locking

What approach should be taken when data modification is critical or we simply expect data to be modified by several processes at once (e.g. handling events concerning the same entity)? In such a case, optimistic locking may lead to a massive withdrawal of transactions, which is detrimental from the point of view of efficiency, or even to data loss if the process cannot be repeated. In such cases, pessimistic locking will work better. In this variant, we assume a worst-case scenario, i.e. that data will be processed by many processes at once.

The locking settings can be passed on using the EntityManager:

entityManager.find(User.class, id, LockModeType.PESSIMISTIC_WRITE)

From the JPA level we are able to define two types of pessimistic locking,

PESSIMISTIC_READ

This is the equivalent of a shared lock – in this mode it is possible to read a locked record, but it is not possible to modify it.

However, not all databases support this lock mode, so you should use it carefully.

PESSIMISTIC_WRITE (and PESSIMISTIC_FORCE_INCREMENT)

Records will be locked for writing and reading, so another process can only access the data when the current one releases the lock. However, it should be remembered that the locking only applies to queries with the for update clause. Ordinary selection queries will continue to be executed without any problems, which in combination with JPA may lead to unexpected problems. This type of locking will work if you want to force a sequential processing by independent threads.

JPA and pessimistic locking

If we use JPA, pessimistic locking can be tricky. Imagine User and Product entities with a relationship between them as below:

 @Entity
@Table(name = "USERS")
public class User {
@Id
@Column(name = "ID")
private String id;
@Column(name = "NAME")
private String name;
@Column(name = "PHONE")
private String phone;
@OneToMany(fetch = FetchType.LAZY, cascade = {CascadeType.ALL},
mappedBy = "user", orphanRemoval = true)
private Set<Product> products;
}

@Entity
@Table(name = "PRODUCTS")
public class Product {
@Id
@Column(name = "ID")
private String id;
@Column(name = "NAME")
private String name;
@Column(name = "AMOUNT")
private long amount;
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "USER_ID", nullable = false)
private User user;
}


We use both entities independently in separate processes and lock them when reading:

entityManager.find(User.class, id, LockModeType.PESSIMISTIC_WRITE)
entityManager.find(Product.class, id, LockModeType.PESSIMISTIC_WRITE);

In order to fetch the entire facility with the relation, in this case Hibernate will make two queries to the database, of which only one will have a clause for update. Below are examples of queries for fetching the User entity together with related products (the situation will be similar for the Product entity):

     Hibernate: select product0_.ID as ID1_11_0_, product0_.AMOUNT as
AMOUNT2_11_0_, product0_.NAME as NAME3_11_0_, product0_.USER_ID as
USER_ID4_11_0_ from PRODUCTS product0_ where product0_.ID=? for update
Hibernate: select user0_.ID as ID1_12_0_, user0_.NAME as NAME2_12_0_,
user0_.PHONE as PHONE3_12_0_ from USERS user0_ where user0_.ID=?

Practically, it means that when fetching the User entity, only the record in the USERS table will be locked, and in case of the Product entity, the record in the PRODUCTS table will. Now let's imagine two processes:

@Transactional
public void updateProduct(String id, String newUserName, long amount) {
Product product = userRepository.loadProduct(id);

product.setAmount(amount);
product.getUser().setName(newUserName);

userRepository.updateProduct(product);
}

@Transactional
public void updateUser(String id, String newUserName, long amount) {
User user = userRepository.loadUser(id);

user.setName(newUserName);
user.getProducts().forEach(product -> product.setAmount(amount));

userRepository.updateUser(user);
}

Basically, they both do almost the same thing, but they come from a different side of the relationship. Let us now consider two threads (T1 and T2) performing the above methods simultaneously on the same data:

PRODUCTS
IDAMOUNTNAMEUSER_ID
121T-Shirt1
231Shoes1
311Skirt1

USERS
IDPHONENAME
1123123123SMITH

Here are the steps taken by the individual threads:

T1T2
Fetch the Product entity (ID=2) and lock the record in the PRODUCTS table.Fetch User entity (ID=1) and lock the record in the USERS table.

Fetch the User entity (ID=1) by Hibernate.

Although the record is locked, fetching it  without the for update clause is still possible.

Fetch the Product entity list (ID=1,2,3) by Hibernate.

Although the record with ID=2 is locked, fetching it  without the for update clause is still possible.

Change the User.name and Product.amount. parameters.Change the User.name and Product.amount. parameters for all list items.
Update the Product and User entities.Update the Product and User entities.
Wait for the lock to be released from the record for the User entity (ID=1).
Wait for the lock to be released from the record for the Product entity (ID=2).
DEADLOCK
The T1 thread will wait forever for resource release in the USERS table record lock, and the T2 thread will do the same in the PRODUCTS table record lock. Most database systems will detect this situation and report an exception, otherwise the thread would never return to the pool.

The described example is obviously very simplified on the one hand, and exaggerated on the other. When fetching the Product entity, we shouldn't touch the User entity at all. What's more, in correctly modelled components, if we treat the Product as an aggregate (in terms of Domain Driven Design), we probably shouldn't be able to modify it from the User entity level. However, the technical possibility of such "shortcuts" exists, so it can be used in certain cases. You should then remember to modify only what you lock.

TransactionSynchronization

TransactionSynchronization was created with the whole Spring Transaction mechanism and is one of its basic elements. Although the TransactionSynchronization interface is not directly related to concurrency, it can be used to exchange information between threads or processes. One of the most common applications is the emission of non-transactional events concerning object state changes.

Usually, we want such an event to be sent only in the case of a successfully completed process (the event is to represent an action performed). TransactionSynchronization allows you to insert the action into a specific point in the transaction life cycle, i.e. you can issue an event just before (beforeCommit) or just after (afterCommit) approving the transaction.

Which place to choose to execute our logic depends on what we expect from it. If we want to be sure that this logic will be executed correctly and is critical from the point of view of the whole transaction, it should be in the beforeCommit. On the other hand, if we want to make sure that the business transaction will be executed and approved before we take further action, we should use afterCommit (or afterCompletion).

The choice between beforeCommit and afterCommit is always a compromise and if the logic triggered by these methods leads to a change in the external states of the objects, this can lead to inconsistent data. It should also be remembered that at the time of executing the afterCommit (and afterCompletion) methods, the transaction resources may not yet be cleared. This means that the code in these methods still participates in the transaction to which it was attached, but will never be approved. Therefore, if we want to execute the transaction code here, it should be marked with the REQUIRES_NEW propagation.

You can use TransactionSynchronizationManager to register TransactionSynchronization:

TransactionSynchronizationManager.registerSynchronization(new 
TransactionSynchronizationAdapter() {
public void afterCommit(){
// do sth
}
});

Or use the Spring 4.2 API based on @EventListener and @TransactionalEventListener.

It is also worthwhile to mention the possibility of defining the sequence of TransactionSynchronization execution using the Ordered interface.

Summary

That's it for today, and the next article will tell you a little bit about JTA.

Szymon Kubicz, Senior Designer – Developer, Comarch

Want to learn more?

Tell us about your business needs. We will find the perfect solution.