Hash code computation and good cache keys

Written on May 4, 2021

This article is from the series about simple yet powerful ways to fix common performance problems in Scala/Java code. Today we will review problem which can arise if you have complex cache keys.

Hashmap

The Problem

Imagine a situation where you want to cache some data to get easy performance improvements:

  val cache = new ConcurrentHashMap[String, ExpensiveComputationData]
  val cacheKey = "foo"
  def getData() = cache.computeIfAbsent(cacheKey, computeFunction)

In this example, we use ConcurrentHashMap to compute Expensive data in a thread-safe manner. In your case, you may have some kind of cache(e.g. Ehcache, Caffeine) that will work similarly.

At this point, we have a simple cache key. Some times you want to have as a key more complex structure, for example, if ExpensiveComputationData depends on lang:

  case class Config(lang: String)
  val cache = new ConcurrentHashMap[Config, ExpensiveComputationData]

This still is ok.

What if in next iteration we add more fields to Config class? Now our ExpensiveComputationData depends on lang, country, and DeviceInfo.

  case class Config(lang: String, country: String, device: DeviceInfo)

What is DeviceInfo? Potentially complex structure that in the worst case can contain hundreds of fields or nested classes.

Is that a problem? Yes. Let’s see why. When we use Config class as a key in hash map or cache -  hash code of a key is being calculated. If the structure of the key is very complex, you can see that the calculation of the hash key can become a hotspot during profiling:

Profiling hotspots

Another problem is that hashCode may be computed every time because your cache key class can be mutable.

Fix

What to do in this situation? Simplify the cache key. Determine which part of the cache key directly affects computed data. Example:

  case class Config(lang: String, country: String, device: DeviceInfo)
  case class DeviceInfo(os: String, platform: String, vendor: String, settings: Map[String, String])

  val cacheKeyConfig: Config = ???

  val cache = new ConcurrentHashMap[Config, ExpensiveComputationData]

  def computeFunction = (key: Config) => {
      externalService.computeExpensiveData(key.lang, key.country, key.device.os)
  }

  def getData() = cache.computeIfAbsent(cacheKey, computeFunction.asJava)

If we analyze this code we can see that computeFunction uses only key.lang , key.country, key.device.os members. To fix the problem of expensive hash key computation we must factor out those direct dependencies to separate structure

  case class ConfigCacheKey(lang: String, country: String, deviceOs: String)

  case class Config(lang: String, country: String, device: DeviceInfo) {
      lazy val cacheKey = ConfigCacheKey(lang, country, device.os)
  }

  def getData() = cache.computeIfAbsent(cacheKeyConfig.cacheKey, computeFunction.asJava)

Or even use String cache key:

  case class Config(lang: String, country: String, device: DeviceInfo) {
      val cacheKey = s"$lang_$country_${device.os})
  }

This kind of key is also good for external introspection, e.g. if you want to have an endpoint to check cached value by key. The downside of having a separate cache key structure is that we must sync computation and cache key, if computation will depend on some new member - that member should be included in the cache key as well.

Results

In my case, this kind of optimization gave CPU usage profit because hash code computation was the bottleneck(main hot spot).

Results CPU

Your results may vary, always profile/benchmark your code before and after optimization.


Tags: scala, caffeine, ConcurrentHashMap, hashCode, performance optimization