Keeping up with our ML feature store series?
Great! Then you’re one step closer to unwrapping the value of a feature store – and the impact one will have on your business.
(If not: part 1 & part 2)
But first, you need to identify whether oar not you’re ready for a feature store:
- Does your organisation need one?
- Or would it be better off without it?
It’s important that you’re able to distinguish whether or not you need a feature store in the first place. So we’ve got your favourite ML experts, Dominic Kafka and Christiaan Viljoen answering those exact questions!
Why You Need A Feature Store
Feature stores aren’t necessary for every company.
If your company is not doing any data analysis or machine learning, then there is no real need for one. Also, if you only have a small amount of data that doesn’t change rapidly, then it may not yield the returns that you hope for.
So why should an organisation start looking into feature stores?
While there are plenty of reasons to invest in one, repeatability, centralisation and auditability stand out.
Repeatability & Centralisation
Data often sits in various spaces around organisations. Having a feature store centralises your data and makes it easier to access.
Dominic Kafka:
“Feature stores allow teams to avoid spending time and effort redoing work that has already been done. Instead of data scientists having to start their modelling from scratch, using a feature store allows them to use models already developed, saving time and money.”
“An example is when you have two consumers of these features: Business analysts and the machine learning models. Both who require access to the same source, because you don’t want to have duplication there, either. With a feature store in place, you eliminate duplication, re-modeling and any room for error. You don’t want them to have to rebuild them and introduce any room for error.”
Viljoen adds:
“As we all know, data often sits in various spaces around organisations. Having a feature store centralises your data and makes it easier to access.”
“For example, you have data sitting in a SQL warehouse and some other data sitting in S3 or in MongoDB. If everyone connects to the same data source, it becomes your single source of truth.”
By organising and documenting all of a product’s different capabilities, you can create consistency across teams, as well as make it easier when searching through documentation or trying out new functionality, as everything will be in one place.
(Side Note: Don’t Repeat Yourself)
To avoid the unnecessary cost of repetition, both Viljoen and Kafka use the principle of don’t repeat yourself (DRY) to guide them when feature engineering.
“As soon as you find yourself repeatedly doing something, you need to write a piece of code that replicates that. It’s the same for a feature store. You can either repeat yourself by creating the same features over and over, or you can just use what’s already there.” – Viljoen
The DRY principle stems from the 1999 book The Pragmatic Programmer by Andy Hunt and Dave Thomas. Defined as “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system,” it is the means of reducing repetition in code and optimising productivity.
By applying this principle, you keep code clean and easy to read, making it easier to maintain and update over time.
You can apply the same principle to data. By extracting your features into a reusable format, you avoid the need to rebuild your models every time you want to use them.
Auditability
Biassed data and algorithms skew decision-making in a way that might result in disadvantages to low income, minority groups. For example, there are cases of software in banks that favour applicants of only a select few races.
With a feature store, you can easily identify what data your model has been trained on and compare that to the actual feeds it’s receiving.
“If you’re making a credit card prediction, for example, you have to first identify if the person qualifies to get the credit card. You want to make sure that the features that you use for that prediction are saved somewhere and the way that you got to those features is documented. So that it can be traced easily.”
“Especially if there are biases in your data, e.g. not giving a certain population group credit cards, which can attract big problems for the company.”
This makes iterating much easier because you’re able to see exactly where problems stem from – as well as when and how things were improved upon.
With end-to-end lineage, it ensures there are no questions left unanswered. They give you access to information about why predictions were made at any point in time, making this approach ideal for testing out new models and also tuning existing ones.
Cost & Time Benefits
Feature stores provide a number of benefits that can save organisations both time and money.
Dominic Kafka: “Especially around computing costs. When data is dispersed across different locations or formats, it can be difficult and expensive to compute. A feature store helps to consolidate all that data into a manageable format, making things easier and more affordable.”
They make data more accessible. Data analysts and scientists need to be able to quickly access the data they need in order to build models and analyse results. A feature store simplifies this process by providing an easily accessible location for all data-related features. This includes data that is stored in different formats or locations.
To Build, Or Not To Build, A Feature Store
While feature stores and their incredible benefits are certainly an attractive option to many enterprises, it’s critical to know whether or not you should build one in the first place.
Your Infrastructure Sucks
The necessary infrastructure, tools and talent are crucial requirements to successfully build and deploy a feature store. These requirements will come to affect the time that it takes to complete one and will determine whether or not your organisation is ready for one in the first place.
Dominic Kafka: “An important question that needs consideration is where the feature store is being built. While some organisations will opt for something on-prem, a recommended choice would be to build it in the cloud if you want to do ML maturely. I don’t see a lot of opportunities to make ML work scalably in the future without cloud.”
Vijoen: “It also depends on the position that the company is currently in. A bank that stores a lot of their data in AWS, with pipelines to that data and an existing SQL feature store would allow us to have a minimal viable product for them in a much quicker time than if a company had none of that. You want to know if they’ve got some form of a feature store, or have taken any actions towards building one out.”
You Don’t’ Quite Understand Your Data
Another good way to gauge your organisation’s readiness for a feature store is around how well you understand your data. If the data is already going through transformations (being converted, processed and stored in an effective way), then a feature store is more promising.
Kafka: “Does the organisation do repeatable transformations? Are they repeatedly doing similar transformations? If so, they’ve ticked the box. They’re ready to do some form of feature store. If they have that, they have a platform. Whether it’s on-prem, or in the cloud.”
Your Solution To Feature Engineering
Feature stores are an important part of the modern data-driven landscape.
They are a great way to improve the efficiency and productivity of your team. They offer great cost, resource and time savings. And they bring simplicity to a world of complexity.
Are you interested in building your own feature store, but don’t know where to start?
We can help you!
Whether you’re wanting to learn more about:
- How feature stores work,
- if they’re the right fit for your business, or
- the steps needed to build one out…
… Our experts are here to guide you.