Radial Theory has designed a highly customizable master data management solution, Radial MDM, that is actively relied upon to support the ongoing data engineering needs of multiple clients.

Radial MDM is implemented in Java, is compatible with a wide variety of data sources and sinks (including Apache Kafka, Apache Nifi, AWS S3, SQS, Elasticsearch, various databases etc.), and can be deployed in a variety of configurations, including both on-premise and in the cloud.

ClientMultiple clients, including a large global real-estate company and several non-profit organizationsServicesData analytics, systems integration and custom developmentYear2019 - presentKey TechnologiesJava, Python, ElasticSearch, Apache Kafka, AWS SQS, AWS S3, AWS Lambda, Tableau

What is Radial MDM?

Radial MDM is a Master Data Management system that we designed as a tool to address a set of thorny issues that were ubiquitous across many of the data engineering practices we’d worked with. Some of these challenges are:

Uncorrelated data sources

We wanted a platform upon which data sourced from many different systems could be brought together and unified, akin to a traditional Data Warehouse.

Divergent data dialects

We wanted a tool that could help normalize data retrieved from multiple independent sources, so that this data could be treated in a common dialect when aggregated or queried.

Unsanitized data

Different systems accumulate data under different constraints, according to different needs. This can lead to systems having different policies on how things like data duplication and missing data are handled (or not handled). We wanted to make a rich set of tooling available in real-time that could deal with such situations on an as-needed basis.

Enabling New Functionality

Radial MDM was also designed to enable a set of functionality, previous unavailable to our clients, that we knew would be useful in many types of projects. These include:

Data Clustering and Aggregation

A process in which data from multiple sources can be combined into groupings of strongly-related data sets bound by conceptual units such as identities, time and place ("events"), or other grouping factors. These clusters can then be used to generate derived data structures, such as aggregate Entity Profiles summarizing related items in a grouping, or Master Records representing unique clusters of same-entities found across all data sources (also known as "EntityResolution").

Subscriptions and Publications

...in which interested subscribers can receive ongoing updates to clusters of interest in custom data formats. In this way, Radial MDM can be used as part of a data pipeline involving aggregation and transformation of data from multiple sources to one or more subscribers.

Architecture

Radial MDM was designed to be a highly-configurable and modular platform for connecting data engineering processes. At the core of Radial MDM are three types of pipeline elements: Indexers, Linkers, and Profilers.

Radial MDM high level architecture

Indexers

Indexers take in data from disparate sources and apply data cleansing, normalization and augmentation operations.

Linkers

Linkers process normalized, enriched data and apply clustering and deduplication logic in order to group related data items into “clusters”, according to configured matching rules.

Profilers

Profilers monitor the set of clusters, and assemble and deliver bespoke cluster summaries (“profiles”) to interested data subscribers downstream, according to their subscription configurations.

Technologies Used

The core elements of Radial MDM were written in Java, while pipeline processes were implemented in Python. The framework can be deployed flexibly using containers, AWS lambdas, or service processes as needed for a given solution stack.

Radial MDM was designed to work with and feed downstream subscribers, such as Data Warehouse systems or other data stream ingesters.

For More Information

Radial Theory is considering contributing Radial MDM to the open source community, in which case we’ll provide an update and link to more documentation. You may also contact us if you think a system like this can help with your business, or are interested in learning more.

Privacy Preference Center