Continuing the posts about building components of modern search in my last post, I wrote about ontology. Now, I’d like to discuss why we would even need to know about changes in ontology. Seems like if something was connected to an ontology node and that node moved, or got soft deleted (with the node marked as “deleted” but not actually removed), or renamed there is no need to do anything. The link stays and lets the end user query and traverse new positions in the ontology in any direction. That is certainly true as far as it goes, but it doesn’t consider all uses for ontology in an enterprise. Let’s take a look at ontology’s role at Upwork.
Aside from actual entities (such as Profile or Job Post) that are linked to ontology (for example, in the form of required skills) we have the query engine and search indices, which also operate in terms of ontology. The query engine translates the original query entered by a user with all applicable filters into ontology terms and uses these terms to search in an appropriate index. On the one hand it’s important from a search results quality’s standpoint to have the query engine and the index in the same version of ontology. On the other hand, the index is built offline as converting millions or billions of index entries into query’s version of ontology just isn’t feasible.
There are 2 main ways to announce changes to the ontology itself: we can either publish events every time there is a structural change or we can use an adapter. Publishing events naturally facilitates versioning as it removes the burden of maintaining versioning from the ontology’s team and let’s the team combine the events into complete batches. For example, if the ontology includes splitting one of the top level nodes (“categories” in business parlance) there will be a lot of events related to the positioning of lower level nodes (“skills”, for example) but all that can be nicely packaged into one consistent update.
After such a package is published, the ontology doesn’t even need to remember what the old structure was like. Of course, the downside to this is that every business team (Catalog, Profiles etc) now needs to implement an entity-specific mechanism of applying these patches.
An alternative approach, as mentioned earlier, is building an adapter maintained by the ontology team. The adapter has a simple interface allowing the caller to request ontology information for any version given information for any other version.
Now the burden for version maintenance is on the ontology team. The team needs to implement an adapter as a high throughput, low latency service (see my other blog posts about services). The service will be called from all processes consuming the ontology. However, adding new consumers has now become real easy. Consumers don’t have to worry about implementing ontology change events as they happen. Consumers can even wait through a few versions if their business logic allows that. The search, in particular, doesn’t need to synchronize updates to the query engine with the updates to the indices as the engine can now use the adapter to translate the query’s ontology tags into index ontology tags. We can now have different indexes at different ontology versions.
Upwork implemented the event-based mechanism of change management. At the time we had just a few ontology consumers and didn’t anticipate getting more. The Ontology team was resource-constrained with building the ontology itself and adding the adapter would’ve stretched the implementation beyond reasonable time to market. However, if the implementation happened today, an adapter would be a better choice. We might even implement the adapter yet to decrease the complexity, particularly if more ontology consumers materialize as new Upwork products.