On Aug 3rd, we hosted the first Kaggle’s JPX Tokyo Stock Prediction workshop. In this workshop we showed how to turn Kaggle’s JPX Tokyo Stock Exchange Prediction competition into a Kubeflow Pipeline using the KFP SDK and the Kale JupyterLab extension. In this blog post we’ll recap some highlights from the workshop plus give a summary of the Q&A!
First, thanks for voting for your favorite charity!
With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give workshop attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this workshop’s voting was the International Committee of the Red Cross (ICRC). Since 1863, the mission of the International Committee of the Red Cross (ICRC) has been to protect and assist victims of armed conflict and promote understanding and respect for international humanitarian law. We are pleased to be making a donation of $100 to them on behalf of the workshop attendees. Again, thanks to all of you who attended and voted!
About the Kaggle Competition
There are plenty of existing quantitative trading efforts used to analyze financial markets and formulate investment strategies. To create and execute such a strategy requires both historical and real-time data, which is difficult to obtain especially for retail investors. This competition will provide financial data for the Japanese market, allowing retail investors to analyze the market to the fullest extent.
This competition compares models against real future returns after the training phase is complete. The competition involves building portfolios from the stocks eligible for predictions (around 2,000 stocks). You have access to financial data from the Japanese market, such as stock information and historical stock prices to train and test models.
What topics were covered in the workshop?
- Overview of Kubeflow
- Overview of Kubeflow Notebooks
- Overview of Kubeflow Pipelines
- Develop and Run the JPX Tokyo Stock Exchange Prediction Pipeline
- Overview of Kale
- Automate the JPX Tokyo Stock Exchange Prediction Pipeline with Kale
- Next Steps
What did I miss?
Here’s a short teaser from the 60 minute workshop where Jimmy walked us through a notebook that turns Kaggle’s Tokyo Stock Exchange Prediction competition using the Kale JupyterLab extension.
Ready to get started with Kubeflow?
Missed the Aug 3 workshop?
If you were unable to join us last week, but would still like to attend a workshop in the future, register for one of these upcoming workshops.
Q&A from the training
Below is a summary of some of the questions that popped into the Q&A box during the workshop. [Edited for readability and brevity.]
Can you explain setting up artifact storage on – prem and in cloud, and making it where one pipeline can work in either environment?
We use Rok as our artifact storage in Arrikto Kubeflow as a Service and Enterprise Kubeflow offerings. Rok is a Kubernetes storage provider (integrated via Kubernetes’s CSI) that stores snapshots of PVCs containing referenceable artifacts in whatever object storage service is close to where Kubernetes is installed. So, for an on-prem environment Rok’s underlying store would be the object storage service residing on-prem (e.g., MinIO, Ceph, or any S3-compatible proprietary solution). For cloud environments Rok’s underlying store would be the cloud provider’s object storage service (e.g., AWS S3, Google GCS, Azure Blob). Rok allows for transparent, peer-to-peer syncing of snapshots containing artifacts across Kubernetes clusters, independently of where they reside and which underlying object storage service they are using. It also allows for mounting clones of these snapshots instantly as new PVCs to pipelines. So not only you can have the same pipeline executed across different environments and magically find the artifact it needs locally, but in the future you will be able to split a single pipeline to run different steps across on-prem and cloud environments.
How can I make parameters appear on the run page? I am familiar with passing parameters from a config file or from globals in the Python DSL, but not making them appear on the run tab on the GUI as fields to complete when submitting a run.
Let’s assume you already have a run that has been defined. For example, we have a pipeline we’ve defined from a Notebook for Kaggle’s Tokyo Stock Market competition.
If you look at the “Config” tab for this pipeline run you can see the “Run Parameters” defined. Specifically, “LR” and “N_EST”
If you use “Create Run” in the Kubeflow UI to start a pipeline you’ll see the parameters in the UI as defined in the notebook, but they can be modified.
If you use “Clone a Run” in the Kubeflow UI to start a pipeline you’ll see the parameters in the UI as defined in the notebook, but they can be modified.
or “Create Run” you will see in the GUI options for modifying the pipeline parameters before execution.
Is there a step-by-step guide on how to get started on Kubeflow?
Yes! I would recommend signing up for our free Kubeflow Fundamentals four part training series.
Is there any software that needs to be installed on a machine to get started with Kubeflow or are their cloud-based options available?
You have three options:
Number 1: You can install it locally, assuming you have Kubernetes and sufficient resources to support a Kubeflow deployment. Because most folks do not have sufficient free resources on their laptops, almost everyone opts for either a deployment in the cloud or a managed service.
Number 2: You can install Kubeflow in the cloud on a Kubernetes cluster using a packaged distribution from one of the major cloud providers.
Number 3: You can also skip the whole installation step, and just sign up for Arrikto’s Kubeflow as a Service, which is the fastest and most painless way to get to a Kubeflow UI without having to have any special knowledge about Kubernetes.