Ontology in a modern website

Anton V Goldberg
6 min readNov 20, 2020

--

Anywhere there are products, ontology provides a meaningful way to understand and organize them. No search or browse experience on any marketplace is possible without ontology giving people and computes shared context. One of the greatest temptations you have to fight when designing an ontology is to create a fixed structure embodying everything you know about your business. And why not? After all, zoologists have their zoological classification, botanists have their classification of plants with their kingdoms,geologists classify their rocks. Every time a new plant is discovered it fits neatly into the classification. Number of known plants increases but the classification itself almost never changes. All classifications represent languages — vocabularies people use to discuss classified things. That’s the key difference between biology and a website selling products consumed by regular people: all biologists use the same vocabulary while a layperson might have no idea how to name or even describe a product they are actually looking for. I most often run into this problem at home improvements stores where the only way for me to find something is to describe it to an employee by a function it performs and have them walk me to the proper shelf.

A website’s internal ontology must connect products and the vocabulary of its users. As such, it changes as often as the language itself. For instance, in the case of job search, changes in some occupations are as frequent as (for example) publishing a new web development framework for web developers. Then it’s the task of our ontology to put it in the right place (somewhere under “web development”), connect it to similar frameworks, to programming language it uses and so on. Not only the skills change, but large parts of ontology need to be updated relatively often. Just think if “development” in “cloud” is the same as “on premise”, as “hybrid” and as “private cloud” and if it isn’t then where the difference will show up. The whole area might spring practically overnight and rise from obscurity to mainstream, like for example “reinforcement learning”. Clearly, dramatic changes are not specific to job search and home improvement but exist in any area where a website needs to understand the vocabulary of regular people.

Another temptation is to plug special people (curators) in the loop for every change. There are many reasons it sounds tempting: people will help avoid SEO (search engine optimization) problems, potential embarrassment when terms show up in wrong places (as constantly happens on Facebook) and so on. The problem, of course, is that people become a bottleneck. Not just everyone can curate an ontology: such a person requires understanding of ontology’s structure as a whole and expertise in the area of curation. Now your site has to wait until a person looks at every change. Meanwhile your site has no idea what its users are looking for. All these risks can be mitigated without human involvement. We can set up multiple sources of automated updates: parsing various pages on the Internet, analysing user queries, taking user’s suggestions and so on. Every change runs through ML models that establish its location in ontology. The change is then automatically plugged into the right place.. Every so often we want to rearrange things to fight entropy and make sure we are in the best possible place and that’s where the curated release comes in.

Speaking of the ontology’s structure, the idea that comes to everyone is to model it based on other known taxonomies: plants, rocks, and so on. Let’s consider the difference a true ontology brings to the site’s functionality. For example, let’s take a website matching job offers to candidate profiles. One of the main friction points is getting candidates to specify skills they have and employers — skills they need for a particular job. One of the approaches is to build a taxonomy solving that particular problem and guiding users from general to specific in such a way that at the end all of them speak the same language and the website can match offers to profiles. An example of such a taxonomy is below. While it certainly works there are a few obvious issues: if something (like “pandas”, 1) fits into several places you need to build several hierarchies (2,3) leading to it. After you found it in one place finding all other places where it is requires a separate query. You can’t build any kind of hypothesis based on your taxonomy as you don’t know how close “Data Mining” is to “Machine Learning” so you can’t suggest anything to your users based on where “pandas” is in the hierarchy the user actually followed to find it. Similarly, your matching algorithm has to rely on the precise skills match. For example, if a job requires “pandas” and “python” while a profile specifies “pandas” only, matching algorithm has no basis for assuming that anybody having experience with “pandas” necessarily has to know “python”. Suppose there is another tool called “SciPi”. In a single taxonomy model we don’t know if “SciPi” is really similar to “pandas” or performs a different function entirely. Consecutively we don’t know if it needs to be present in all the places “pandas” is present or if people knowing “pandas” can master “SciPi” easily. We never know how complete our taxonomy is as it’s likely built by curators and can only be updated from similar taxonomies (unlikely as this is purpose-built) or manually.

All of these questions and problems can be resolved if we re-plan the taxonomy into more connected ontology. In fact, some of these issues can be resolved even if we add different types of links into the existing structure. For example, adding “depends on” type of link (dotted green arrow) helps us figure out if people who know “pandas” and “SciFi” know “python” as well. Perhaps they can not only do Data Science jobs but other types of jobs that require “python” as well!

We can go one step further and rearrange our taxonomy into several connected ones and create a real ontology as in the picture below.

While the ontology here is far from ideal it solves practically all issues we noticed in the taxonomy above. Taxonomies presented here can be measured in completeness. They already allow independent automated updates. For example, we can add new programming languages as they appear. Adding a new occupation can be done by parsing and processing texts related to that occupation, such as textbooks, specialized websites, other job search sites to set up links between the occupation and its tools and other aspects. Continuing a job search example, a well-developed ontology makes it possible to answer a query “develop eCommerce website” with several proposals for a team composition and individuals ready to be hired for that team. Links need to have attributes. The most widely used attribute is the version numbers expressing ontology’s versions. With such links you can easily find the differences between ontology versions (as one query to the graph), recover state of the graph at any date or version in the past (very useful for various ML purposes). Having different link types and interconnected nature allows you to use the graph for different purposes: helping users browse your product, search for best fit, suggest search criteria or keywords, convert text terms to semantics.

In conclusion, it’s hard to imagine a modern website with search and browse capabilities without using ontology for semantic search and match, as well as for directing a user in their discovery of your products.

--

--