As you build any system, you will see a gradual or sometimes sudden explosion of the need for data from the system. This may be driven by increasing capabilities of the system or increasing usage of the data in your system or both. What you built for your system, is also needed by others - if not now, later. Soon other teams in your company or external partners and customers will want to access your system’s data and integrate or build on it. Sure, you can and will provide APIs and expose your data. But, that in itself is not sufficient. For your data to be more useful, you must also provide contextual information about your data, i.e. metadata.
Metadata gives meaning to your data and provides instructions and insights to its users - your application developers and your end-users.
Designing and managing metadata has been an afterthought in some systems of my past experience. When the need arises, we do our best to reconstruct a metadata system around the data stack we have built. This is challenging on many fronts. It is prudent to think ahead and design for your data needs including metadata.
So how do you design, store, and manage your metadata? You need to think of categorizing, classifying, versioning, usage, quality, and other aspects. You start out with your own metadata repository but beware — soon that in itself becomes a complex system taking away the precious time and capacity from your development team to keep it up. Not to mention your team may not have the relevant expertise. I look for solutions that my team does not have to build. There are commercial solutions out there for data catalog and metadata management, but they are not very affordable (infrastructure, cost, complexity, etc.).
I would want open-source and an open system. So, I was delighted to come across this project called OpenMetadata in this post by its creators Suresh Srinivas (ex-HortonWorks, ex-Uber) and Sriharsha Chintalapani. This blog announcement captures the problem space and highlights how they are reimagining the metadata ecosystem.
I like the schema-first approach (glad they are using JSON) and at a first glance, looks pretty comprehensive. Hopefully, this can be the basis of a much-needed standard for metadata. They just published their first benchmark which showed great results. This is the real stuff!
Suresh and Sriharsha bring a lot of their experience into this project — see this, this and this.
So for all you developers and teams out there, if you are doing anything significant with data, invest time in metadata and definitely take a look at OpenMetadata and give it a try. I am very excited about the future OpenMetadata.