New versions of how-to-guides for computing at scale (#302)

Nixtla · Apr 29, 2024 · 22ef101 · 22ef101
2 parents d77047a + 182a5a5
commit 22ef101
Show file tree

Hide file tree

Showing 6 changed files with 1,785 additions and 977 deletions.
diff --git a/nbs/docs/how-to-guides/0_computing_at_scale.ipynb b/nbs/docs/how-to-guides/0_computing_at_scale.ipynb
@@ -0,0 +1,105 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Computing at Scale with TimeGPT"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Handling large datasets is a common challenge in time series forecasting. For example, when working with retail data, you may have to forecast sales for thousands of products across hundreds of stores. Similarly, when dealing with electricity consumption data, you may need to predict consumption for thousands of households across various regions.\n",
+    "\n",
+    "Nixtla's `TimeGPT` enables you to use several distributed computing frameworks to manage large datasets efficiently. `TimeGPT` currently supports `Spark`, `Dask`, and `Ray` through `Fugue`.\n",
+    "\n",
+    "In this notebook, we will explain how to leverage these frameworks using `TimeGPT`. \n",
+    "\n",
+    "**Outline:**\n",
+    "1. [Getting Started](#1-getting-started)\n",
+    "2. [Forecasting at Scale](#2-forecasting-at-scale) \n",
+    "3. [Important Considerations](#3-important-considerations) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Getting started \n",
+    "\n",
+    "To use `TimeGPT` with any of the supported distributed computing frameworks, you first need an API Key, just as you would when not using any distributed computing.\n",
+    "\n",
+    "Upon [registration](https://dashboard.nixtla.io/), you will receive an email asking you to confirm your signup. After confirming, you will receive access to your dashboard. There, under`API Keys`, you will find your API Key. Next, you need to integrate your API Key into your development workflow with the Nixtla SDK. For guidance on how to do this, please refer to the [Setting Up Your Authentication Key tutorial](https://docs.nixtla.io/docs/setting_up_your_authentication_api_key)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Forecasting at Scale "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using `TimeGPT` with any of the supported distributed computing frameworks is straightforward, as `TimeGPT` will read a `pandas` DataFrame and then use the corresponding framework. Thus, the usage is almost identical to the non-distributed case. \n",
+    "\n",
+    "1. Instantiate a `NixtlaClient` class.\n",
+    "2. Load your data as a `pandas` DataFrame.\n",
+    "3. Initialize the distributed computing framework. \n",
+    "    - [Spark](https://docs.nixtla.io/docs/1_computing_at_scale_spark)\n",
+    "    - [Dask](https://docs.nixtla.io/docs/2_computing_at_scale_dask)\n",
+    "    - [Ray](https://docs.nixtla.io/docs/3_computing_at_scale_ray)\n",
+    "4. Use any of the `NixtlaClient` class methods.\n",
+    "5. Stop the distributed computing framework, if necessary. \n",
+    "\n",
+    "These are the general steps that you will need to follow to use `TimeGPT` with any of the supported distributed computing frameworks. For a detailed explanation and a complete example, please refer to the guide for the specific framework linked above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "::: {.callout-important}\n",
+    "Parallelization in these frameworks is done along the various time series within your dataset. Therefore, it is essential that your dataset includes multiple time series, each with a unique id. \n",
+    ":::"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Important Considerations "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### When to Use a Distributed Computing Framework\n",
+    "\n",
+    "Consider using a distributed computing framework if your dataset:\n",
+    "\n",
+    "- Consists of millions of observations over multiple time series.\n",
+    "- Is too large to fit into the memory of a single machine.\n",
+    "- Would be too slow to process on a single machine.\n",
+    "\n",
+    "### Choosing the Right Framework\n",
+    "\n",
+    "When selecting a distributed computing framework, take into account your existing infrastructure and the skill set of your team. Although `TimeGPT` can be used with any of the supported frameworks with minimal code changes, choosing the right one should align with your specific needs and resources. This will ensure that you leverage the full potential of `TimeGPT` while handling large datasets efficiently."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "python3",
+   "language": "python",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}