Kaggle’s H&M Fashion Recommendation Workshop Recap – Jul 27, 2022

August 1, 2022

Kaggle-H&M-Personalized-Fashion-Recommendations

On July 27th, we hosted the first Kaggle’s H&M Fashion Recommendations workshop. In this workshop we showed how to turn the Kaggle’s H&M Personalized Fashion Recommendations competition into a Kubeflow Pipeline using the KFP SDK and the Kale JupyterLab extension. In this blog post we’ll recap some highlights from the workshop plus give a summary of the Q&A!

First, thanks for voting for your favorite charity!

With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give workshop attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this workshop’s voting was the National Pediatric Cancer Foundation (NPCF). The NPCF is a nonprofit organization dedicated to funding research to eliminate childhood cancer. We are pleased to be making a donation of $100 to them on behalf of the workshop attendees. Again, thanks to all of you who attended and voted!

About the Kaggle Competition

H&M Group is a family of brands and businesses with 53 online markets and approximately 4,850 stores. Their online store offers shoppers an extensive selection of products to browse through. But with too many choices, customers might not quickly find what interests them or what they are looking for, and ultimately, they might not make a purchase. To enhance the shopping experience, product recommendations are key. More importantly, helping customers make the right choices also has positive implications for sustainability, as it reduces returns, and thereby minimizes emissions from transportation.

In this competition, H&M Group invites you to develop product recommendations based on data from previous transactions, as well as from customer and product meta data. The available meta data spans from simple data, such as garment type and customer age, to text data from product descriptions, to image data from garment images.

What topics were covered in the workshop?

Overview of Kubeflow
Installing Kubeflow
About the H&M Personalized Fashion Recommendations Competition
Turning the H&M Fashion Recommendations Competition into a Kubeflow Pipeline with the KFP SDK
Turning the H&M Fashion Recommendations Competition into a Kubeflow Pipeline with the Kale JupyterLab extension
Comparing the Methods

What did I miss?

Here’s a short teaser from the 45 minute workshop where Jimmy walked us through a notebook that turns Kaggle’s H&M Fashion Recommendations competition into a Kubeflow Pipeline using the Kale JupyterLab extension.

Ready to get started with Kubeflow?

Arrikto’s Kubeflow as a Service is the easiest way to get deployed and have a pipeline running in under 5 minutes. Comes with a 7-day free trial with no credit card required. Click to get started.

Scalable Machine Learning Applications in Retail

Arrikto introduces some new use cases where machine learning is being used to solve retail problems. You can now schedule a workshop for your team here.

Missed the July 27 workshop?

If you were unable to join us last week, but would still like to attend a workshop in the future, register for one of these upcoming workshops.

You can also sign up for the next H&M Fashion Recommendations Workshop on Oct 26 directly here.

Q&A from the training

Below is a summary of some of the questions that popped into the Q&A box during the workshop. [Edited for readability and brevity.]

What are the pros and cons of developing and deploying pipelines with Kale vs the KFP SDK?

The advantages of using the open source Kale JupyterLab extension include:

No need to do repetitive installs of Python packages for each component
No need to have any special knowledge of Kubeflow Pipelines’ domain specific language
Pipeline steps, hyperparameters, GPU usage, metrics tracking, resolving dependencies and injected data are all handle via visual annotations vs code

A current limitation of the Kale extension is that if you want to make use of the snapshotting capabilities, it currently only supports Arrikto’s Rok storage medium. Also, the latest version of Kale is only available on Arrikto’s Kubeflow as a Service. Note, that both of these issues will be resolved in the near future!

Is it possible for the MLOps pipeline to have a “manual step” where managers or leads can review the results first before proceeding with the next step?

You would need to code in some sort of an alert that would pause the pipeline and then permit it to continue when a specific command or input. It’s Python at the end of the day, so anything is possible.