ML Feature Stores: It All Starts with Feature Engineering

Feature stores are an incredibly understated resource. Especially when building machine learning (ML) models. They act as an accessible, centralised location where you’re able to store important features for ML models and their predictions.

In other words, a library for your organisation’s curated data.

Feature stores are so incredible, in fact, that they:

  • Make it easier to develop and deploy models.
  • Allow for better accessibility in analytics and other ML projects.
  • Are an ideal option for companies looking to get the absolute best out of their ML investments.

When Uber introduced the idea back in 2017, who would have known that it would become essential to any successful data-driven conquest?

Now, giants like Apple, Salesforce, Twitter and Meta (Facebook) all make effective use of them.

So what are feature stores? Why does your organisation need one? And when should you start building one out?

In this three-part Machine Learning Feature Store series, we answer some of these pressing questions with the help of our Machine Learning experts, Christiaan Viljoen and Dominic Kafka.

The Traditional ML Approach Just Doesn’t Cut It

Besides the challenges that big data brings to the table, traditional ML methods haven’t exactly been a great approach to efficient and effective model deployment.

You see, when data scientists create a new model, they must first identify which features it requires. A process that isn’t always simple, because there often isn’t one single location to find all of those features.

Instead, new features are generated for each model, or live across many different sources, making the development process fractured and potentially wasteful.

According to Continual, there are several obstacles around traditional approaches to ML:

  • Time: Data Scientists spend copious amounts of time transforming data before being able to start building models.
  • Unclean Data: Then, when they start any new use case, there is insufficient clean data to use.
  • Tracking: Notebook-based data science makes it harder to track and manage the data. Notebooks are a great tool to run experiments and execute code, but don’t have the capabilities to develop and track a ready-to-deploy ML solution.
  • Talent: People who don’t have the necessary expertise are unable to make use of any ML tools at their disposal.
  • No Unifying Data Layer: Online and offline requirements for data lack a unifying data layer.

Challenges in speed, performance, volume and complexity are simplified by the use of a feature store.

Even maintenance becomes far more manageable with the help of automation and reusability.

That’s why feature stores and feature engineering are becoming a highly attractive option for organisations looking to simplify operational ML.

Feature Engineering: The Power Behind Features & Feature Stores

Feature engineering is the process of transforming raw data into features that can be used by machine learning models for predictions and analytics.

It involves extracting important information from the data, cleaning it up, and converting it into a format that the model can understand.

Dominic Kafka:

“You’re taking raw data and you’re doing things to it such that it is more interpretable and more meaningful.”

“There are two versions of this: Aggregation, where there is a lot of data that needs to be aggregated and summarised in such a way that is meaningful to me and also to the machine learning model.”

“Transformation. In other forms of data capture, you might have non-linear signals that you capture. The data might be both non-linear and highly dense. In which case you might want to do the opposite. You may want to take it apart a bit more to do things like transformations to the data’s subset, so that it’s more separable, clear, or easier to understand.”

Christiaan Viljoen:

“Let’s talk about a bank as an example. Wherever your banking system tracks transactions, your goal is to extract that transaction data and apply machine learning to it. Using that data, you’re able to build machine learning models to make predictions about possible scenarios that might occur in the future. Or what could be the best course of action to take.”

“Transactions happen every day, in huge volumes, across the country. Whether through bank branches, ATM’s, tills, etc. In order to actually build a machine learning model, using data like ‘Transaction one: Bob drew £50; Transaction two: Bob sent his mom £1000; Transaction three: Netflix subscription is £15’, is not useful to the machine learning model.”

“What is useful is taking all of the transactions that he did in a week or in a month and summing them together. For example: in January he spent £3000, in February he spent £2500, etc. Over time, we will see how much it changed. Did it increase? Did it decrease?”.

Your Solution To Feature Engineering + An AMA With Our Experts!

Feature stores are an important part of the modern data-driven landscape. They are a great way to improve the efficiency and productivity of your team. They offer great cost, resource and time savings – and they bring simplicity to a world of complexity.

Do you want to build your own feature store, but don’t know where to start?

Come and join us! We’re hosting an AMA with Christiaan Viljoen and Dominic Kafka on all things ML and feature stores.

Whether you’re wanting to learn:

  • About how feature stores work,
  • If they’re the right fit for your business,
  • Or how to actually build one out,

Then our experts are here to guide you!

Register for this fantastic opportunity today: Why You Need A Feature Store (And When To Build One)

[PSA: Keep your eyes peeled for the next two pieces on our Machine Learning Feature Store series! OR get the free eBook!]

More in the Blog

Stay informed on all things AI...

< Get the latest AI news >

Join Our Webinar Cloud Migration with a twist

Aug 18, 2022 03:00 PM BST / 04:00 PM SAST