Using non-JPA-aware base classes for JPA embeddables

I spent yesterday fighting with Hibernate and JPA. I thought you, dear reader, might benefit from my experience.

My current job uses a monolithic codebase in a single repository. The codebase is split into a few dozen Maven projects, the first step toward breaking it up into multiple repos. Last week we started extracting out all of the data model interfaces and value objects.

One of those value objects, RecordKey, is used everywhere in the codebase, as it’s our primary key representation for every pieces of data that’s stored in any form. It’s really just a UUID with some extra magic, and like a UUID it serializes to a string. For example: "AfJKjssrqYlLQIujIWff4Ay6".

A quick review of JPA embedding

When working with JPA, using a RecordKey isn’t too bad—you just need a few annotations:

@Embeddable
public class RecordKey {
    public static final String COLUMN = "serialized";
    public static final int LENGTH = 24;
    @Column(name = COLUMN)
    protected String serialized;
    @Transient
    protected Composite composite;
    @Deprecated
    private RecordKey() {}
    /* ... implementation follows ... */
}

There are four things happening here:

  • @Embeddable tells JPA that RecordKey isn’t a complete entity and doesn’t get its own table—its column(s) should be included in whatever entity table uses it.

  • @Column does what you’d expect and tells JPA to store this field as a column. The COLUMN constant isn’t necessary, but I’ll show you why it’s helpful later.

  • @Transient is the opposite of @Column: it tells JPA not to store this field. In this case, we only want to store the serialized form of the key, so every other field becomes @Transient.

  • The private RecordKey no-arg constructor is there because JPA requires it. When JPA loads a RecordKey out of the database, it isn’t as smart as Jackson or other serializers—it isn’t going to try to find a constructor that matches the columns. Instead, it uses a no-arg constructor and then sets the fields directly.

Using this RecordKey is then a bit of paperwork. Let’s say you have a base class AKeyedEntity that will be used for all of your entities. This is handy when everything should have a RecordKey for its primary key.

You would think it would be as easy as:

@MappedSuperclass
public abstract class AKeyedEntity {
    @Id
    protected RecordKey id;
}

The @MappedSuperclass tells JPA not to create an entity just yet—let the child classes do that. The @Id marks the field as the primary key, as you would expect.

But because we’re using a custom class for our primary key, that’s not enough. This would work initially, but we’d eventually get crazy errors about duplicate column definitions as our child classes grew in complexity.

What you actually need is this:

@MappedSuperclass
public abstract class AKeyedEntity {
    @Id
    @AttributeOverrides({
        @AttributeOverride(name = RecordKey.COLUMN, column = @Column(name = "id", length = RecordKey.LENGTH))
    })
    protected RecordKey id;
}

Those extra annotations tell JPA how it should handle this particular instance of embedding the RecordKey:

  • The name = RecordKey.COLUMN part says “find the embedded column (not field) with this name”.

  • The @Column definition says “for this embedding, replace that found column with this definition”.

In other words: “where you would create the serialized column, instead call it id“.

Not only is this ability to rename columns nice, it’s also crucial for cases where you’d have multiple RecordKeys embedded in the same table. Since we use RecordKey everywhere, this happens pretty quick.

But this isn’t where the fighting started.

An Innocent Beginning

As I said, we were extracting RecordKey and a few other critical things out to their own Maven project. In doing so, I was reminded of all of those JPA annotations. We use RecordKey in both storage contexts and wire (stringified) contexts, such as across our APIs. The wire contexts don’t really need the JPA annotations, just the serialization annotations (which I’ve elided here). In fact, the JPA annotations introduce a dependency on JPA (or one of its implementations, such as Hibernate’s) to any project that uses RecordKey.

But I didn’t want to tackle removing the JPA annotations altogether—that would make the class unusable in a storage capacity. So I hit a compromise: I refactored the new POM to use the javax-persistence persistence-api artifact—a library that has nothing but the JPA APIs. This is a fairly minimal and lightweight dependency that other projects wouldn’t trip over, but would allow our storage projects to use RecordKey as-is.

This worked … for about 2 days.

Licensing

That persistence-api library gives you two license options: GPL 2.0 or the lesser-known CDDL. I am not a lawyer, but the gist here is that the GPL is very much not allowed in our projects, while the CDDL may be an option once it’s been vetted by our lawyers. Another option might be to use the Hibernate version of the library, hibernate-jpa-2.1-api, which uses the non-viral EDL 1.0 license. (Yes, I know those are two different versions of JPA. But the parts we’re using are covered in both.) But again, until the lawyers have done their thing, I’m not covered. And there’s still the slight ickiness of having storage annotations on a class that isn’t only for storage.

Long story short (too late!), I needed to get the JPA annotations off that class. But I also needed library-level interoperability between our various projects, so I couldn’t just copy RecordKey and remove the JPA annotations.

There were a few options:

  • Subclass RecordKey into one that will be used for storage. This gets me the nice side-effect that the new subclass is an instanceof RecordKey.

  • Create a completely new class used for storage, and convert back and forth. Yuck.

I opted for the first.

The Slow Descent

I thought it should have been easy. Remove the annotations (and icky no-arg constructor) from RecordKey:

public class RecordKey {
    protected String serialized;
    protected Composite composite;
    /* ... implementation follows ... */
}

Then subclass it:

public class PortKey extends RecordKey {
    /* ... add constructors and static builders as necessary ...*/
}

But then I hit my first stumbling block: how could I mark that composite field as transient? I figured I could add a field with the same signature, mark it as private and annotate it, and that would mask the field in the base class. Surprisingly, this worked:

public class PortKey extends RecordKey {
    public static final String COLUMN = "serialized";
    public static final int LENGTH = 24;
    @Transient
    protected Composite composite;
    /* ... implementation follows ... */
}

Well, it kindof worked. The composite field was masked and transient, so no column was created for it. But I started getting weird errors around the serialized field. I tried adding @Column annotations on overloaded getter/setter methods:

@Column(name = COLUMN)
private String getSerialized() {
    return super.serialized;
}
@Column(name = COLUMN)
private String setSerialized(final String serialized) {
    super.serialized = serialized;
}

Nope. No joy. I kept seeing weird behavior, like the serialized field ending up null when fetched or stored.

Okay, so what if we try the same masking trick?

@Column(name = COLUMN)
private String serialized;
private String getSerialized() {
    return super.serialized;
}
private String setSerialized(final String serialized) {
    this.serialized = serialized;
    super.serialized = serialized;
}

Yeah, those look funky, right? I was trying to avoid reimplementing everything from RecordKey inside of PortKey. I thought if I just kept a copy of super.serialized in this.serialized, Hibernate would use the latter for storage since it had the annotations. It wouldn’t be efficient, but it would get the job done.

But stuff still didn’t work quite right. When PortKey instances were recreated on fetch, I’d see this.serialized with the value, but super.serialized would be empty! Yep, that means Hibernate wasn’t bothering to call setSerialized—it was just setting the field directly. I tried moving the @Column annotations to the getter and setter, but that had no effect.

Then I thought of another way I could get Hibernate to play along:

@PostLoad
protected void postLoad() {
    super.serialized = this.serialized;
}

The @PostLoad annotation is called when entities are fetched from the database to do any last-minute initialization. But … it turns out that JPA doesn’t specify that this should happen for embedded objects. Hibernate doesn’t do it.

Sigh.

The Working Solution

Here’s what finally did work. First, RecordKey required some modification.

public class RecordKey {
    protected String serialized;
    protected Composite composite;
    public RecordKey(final String s) {
        setSerialized(s);
        setComposite(/* magic stuff here */);
    }
    /* ... other constructors and static builders here ... */
    public boolean equals(final Object other) {
        if (this == other) return true;
        final String s = getSerialized(); // NOT this.serialized
        if (other instanceof String) return s.equals(other);
        return other instanceof RecordKey && Objects.equals(s, other.toString());
    }
    public int hashCode() {
        return toString().hashCode(); // NOT this.serialized
    }
    protected String getSerialized() {
        return serialized;
    }
    protected String setSerialized(final String s) {
        this.serialized = s;
    }
    public String toString() {
        return getSerialized(); // NOT this.serialized
    }
    /* ... additional implementation follows ... */
}

Note that I never access the serialized field directly outside of the getSerialized and setSerialized methods. Not even in the constructor. I know that this is idiomatic “best practices” Java … but let’s be honest: how often do you do that? Most of the time when you are within the class itself, you’re just going to use the field directly unless there’s some complicated logic.

Then the PortKey subclass:

@Embeddable
public class PortKey extends RecordKey {
    public static final String COLUMN = "serialized";
    public static final int LENGTH = 24;
    @Transient
    @Deprecated
    private RecordKey.Composite composite;  // this is here to override the field in the parent
    @Column(name = COLUMN, length = LENGTH)
    private String serialized;
    public PortKey(final String serialized) {
        super(serialized);
    }
    @Deprecated // This is for JPA only
    private PortKey() {
        super();
    }
    @Override
    protected String getSerialized() {
        // because Hibernate
        return this.serialized == null ? super.serialized : this.serialized;
    }
    @Override
    protected void setSerialized(final String s) {
        super.setSerialized(s); // because Hibernate
        this.serialized = s;
    }
}

It’s definitely heavier-weight than I’d like. And I hate that I have to duplicate the serialized form. But it does work, and it did allow me to remove the JPA dependency from the project with RecordKey.

We have to do a bit of extra work on the storage side to make sure we always upgrade RecordKey to PortKey to ensure it gets stored correctly. But this is done easily enough, as there are only a few places where non-storage code calls directly into storage code with a plain RecordKey.

public static PortKey from(final RecordKey recordKey) {
    if (recordKey == null) return null;
    if (recordKey instanceof PortKey) return (PortKey) recordKey;
    return new PortKey(recordKey.toString());
}

And there you go: how to use non-JPA-aware classes as base classes for JPA embeddables.

Bonus: Jackson JSON Serialization

As mentioned earlier, our RecordKey is also used in wire (JSON serialized) contexts. Getting RecordKey to serialize as a string is actually pretty easy, and I figure if you’ve read this far you probably need to do the same thing.

@JsonDeserialize(using = RecordKey.Deserializer.class)
@JsonSerialize(using = RecordKey.Serializer.class)
public class RecordKey {
    /* ... all of the stuff from above ... */
    public static class Deserializer extends JsonDeserializer<RecordKey> {
        @Override
        public RecordKey deserialize(
            final JsonParser jp,
            final DeserializationContext ctxt
        ) throws IOException {
            if (jp.getCurrentToken() != JsonToken.VALUE_STRING) {
                throw ctxt.instantiationException(RecordKey.class, "Expected a string representation of a RecordKey");
            }
            return new RecordKey(jp.getText());
        }
        @Override
        public Object deserializeWithType(
            final JsonParser jp,
            final DeserializationContext ctxt,
            final TypeDeserializer typeDeserializer
        ) throws IOException {
            return deserialize(jp, ctxt);
        }
    }
    public static class Serializer extends JsonSerializer<RecordKey> {
        @Override
        public void serialize(
            final RecordKey value,
            final JsonGenerator jgen,
            final SerializerProvider provider
        ) throws IOException {
            jgen.writeString(value.toString());
        }
        @Override
        public void serializeWithType(
            final RecordKey value,
            final JsonGenerator jgen,
            final SerializerProvider provider,
            final TypeSerializer typeSer
        ) throws IOException {
            serialize(value, jgen, provider);
        }
    }
}

That’s pretty easy … though I admit I would expect it to be easier. I’d expect you to be able to get away with a single annotation:

@JsonFormat(shape = JsonFormat.Shape.STRING)
public class RecordKey /* ... */

Intuitively, I’d expect that to use toString() to get the serialized value and a single-arg String constructor to be used for deserialization. But that’s never worked when I’ve tried it.

Of course, you’ll then need to copy the deserializer over to the PortKey subclass so that it creates those instead of RecordKey.