{ "cells": [ { "cell_type": "markdown", "id": "83b05792", "metadata": {}, "source": [ "# Shifted Beta-Geometric Model with Cohorts and Covariates" ] }, { "cell_type": "markdown", "id": "3913b9b2", "metadata": {}, "source": [ "The Shifted Beta-Geometric (sBG) model was first introduced in [\"How to Project Customer Retention\"](https://faculty.wharton.upenn.edu/wp-content/uploads/2012/04/Fader_hardie_jim_07.pdf) by Hardie & Fader in 2007. It is ideal for predicting customer behavior in business cases involving contract renewals or recurring subscriptions, and the original model has been expanded in PyMC-Marketing to support multidimensional cohorts and covariates. In this notebook we will reproduce the research results, then proceed to a comprehensive example with EDA and additional predictive methods." ] }, { "cell_type": "markdown", "id": "78347062-1ccc-4237-a925-5617ba1ed5eb", "metadata": {}, "source": [ "## Setup Notebook" ] }, { "cell_type": "code", "execution_count": 3, "id": "5a4844d3", "metadata": {}, "outputs": [], "source": [ "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import matplotlib.ticker as mtick\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sb\n", "import xarray as xr\n", "from dateutil.relativedelta import relativedelta\n", "from pymc_extras.prior import Prior\n", "\n", "from pymc_marketing import clv\n", "\n", "# Plotting configuration\n", "az.style.use(\"arviz-darkgrid\")\n", "plt.rcParams[\"figure.figsize\"] = [12, 7]\n", "plt.rcParams[\"figure.dpi\"] = 100\n", "plt.rcParams[\"figure.facecolor\"] = \"white\"\n", "plt.rcParams[\"figure.constrained_layout.use\"] = True\n", "\n", "%load_ext autoreload\n", "%autoreload 2\n", "%config InlineBackend.figure_format = \"retina\"" ] }, { "cell_type": "code", "execution_count": 4, "id": "890cb1f0-35f5-4549-8446-f328fadda19f", "metadata": {}, "outputs": [], "source": [ "seed = sum(map(ord, \"sBG Model\"))\n", "rng = np.random.default_rng(seed)" ] }, { "cell_type": "markdown", "id": "6b5f1993-c3ea-41fd-bfd0-51f28c6a52b1", "metadata": {}, "source": [ "## Load Data" ] }, { "cell_type": "markdown", "id": "17a30927-3273-42ed-85e7-f3ab42de2451", "metadata": {}, "source": [ "Data must be aggregrated in the following format for model fitting:\n", "\n", "- `customer_id` is an index of unique identifiers for each customer\n", "- `recency` indicates the most recent time period a customer was still active\n", "- `T` is the maximum observed time period for a given cohort\n", "- `cohort` indicates the cohort assignment for each customer\n", "\n", "For active customers, `recency` is equal to `T`, and all customers in a given cohort share the same value for `T`. If a customer cancelled their contract and restarted at a later date, a new `customer_id` must be assigned for the restart.\n", "\n", "Sample data is available in the PyMC-Marketing repo. To see the code used to generate this data, refer to `generate_sbg_data()` in `scripts/clv_data_generation.py` in the repo." ] }, { "cell_type": "code", "execution_count": 5, "id": "fb5c2b35-2d72-4967-a572-73d754405232", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | customer_id | \n", "recency | \n", "T | \n", "cohort | \n", "
|---|---|---|---|---|
| 0 | \n", "1 | \n", "1 | \n", "8 | \n", "highend | \n", "
| 1 | \n", "2 | \n", "1 | \n", "8 | \n", "highend | \n", "
| 2 | \n", "3 | \n", "1 | \n", "8 | \n", "highend | \n", "
| 3 | \n", "4 | \n", "1 | \n", "8 | \n", "highend | \n", "
| 4 | \n", "5 | \n", "1 | \n", "8 | \n", "highend | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 1995 | \n", "1996 | \n", "8 | \n", "8 | \n", "regular | \n", "
| 1996 | \n", "1997 | \n", "8 | \n", "8 | \n", "regular | \n", "
| 1997 | \n", "1998 | \n", "8 | \n", "8 | \n", "regular | \n", "
| 1998 | \n", "1999 | \n", "8 | \n", "8 | \n", "regular | \n", "
| 1999 | \n", "2000 | \n", "8 | \n", "8 | \n", "regular | \n", "
2000 rows × 4 columns
\n", "| \n", " | regular | \n", "highend | \n", "
|---|---|---|
| 0 | \n", "100.0 | \n", "100.0 | \n", "
| 1 | \n", "63.1 | \n", "86.9 | \n", "
| 2 | \n", "46.8 | \n", "74.3 | \n", "
| 3 | \n", "38.2 | \n", "65.3 | \n", "
| 4 | \n", "32.6 | \n", "59.3 | \n", "
| 5 | \n", "28.9 | \n", "55.1 | \n", "
| 6 | \n", "26.2 | \n", "51.7 | \n", "
| 7 | \n", "24.1 | \n", "49.1 | \n", "
| 8 | \n", "22.3 | \n", "46.8 | \n", "
| 9 | \n", "20.7 | \n", "44.5 | \n", "
| 10 | \n", "19.4 | \n", "42.7 | \n", "
| 11 | \n", "18.3 | \n", "40.9 | \n", "
| 12 | \n", "17.3 | \n", "39.4 | \n", "
<xarray.Dataset> Size: 264kB\n",
"Dimensions: (chain: 4, draw: 1000, cohort: 2)\n",
"Coordinates:\n",
" * chain (chain) int64 32B 0 1 2 3\n",
" * draw (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999\n",
" * cohort (cohort) <U7 56B 'highend' 'regular'\n",
"Data variables:\n",
" phi (chain, draw, cohort) float64 64kB 0.1377 0.3894 ... 0.1613 0.3723\n",
" kappa (chain, draw, cohort) float64 64kB 5.257 1.867 ... 3.852 1.749\n",
" alpha (chain, draw, cohort) float64 64kB 0.7238 0.7268 ... 0.6213 0.6509\n",
" beta (chain, draw, cohort) float64 64kB 4.533 1.14 3.895 ... 3.231 1.098\n",
"Attributes:\n",
" created_at: 2025-12-16T11:10:26.842039+00:00\n",
" arviz_version: 0.22.0\n",
" inference_library: pymc\n",
" inference_library_version: 5.25.1\n",
" sampling_time: 6.430373191833496\n",
" tuning_steps: 1000<xarray.Dataset> Size: 528kB\n",
"Dimensions: (chain: 4, draw: 1000)\n",
"Coordinates:\n",
" * chain (chain) int64 32B 0 1 2 3\n",
" * draw (draw) int64 8kB 0 1 2 3 4 5 ... 995 996 997 998 999\n",
"Data variables: (12/18)\n",
" lp (chain, draw) float64 32kB -3.3e+03 ... -3.299e+03\n",
" perf_counter_start (chain, draw) float64 32kB 1.659e+05 ... 1.659e+05\n",
" perf_counter_diff (chain, draw) float64 32kB 0.001657 ... 0.003421\n",
" acceptance_rate (chain, draw) float64 32kB 0.9945 0.9736 ... 0.9995\n",
" n_steps (chain, draw) float64 32kB 3.0 7.0 3.0 ... 3.0 7.0\n",
" divergences (chain, draw) int64 32kB 0 0 0 0 0 0 ... 0 0 0 0 0 0\n",
" ... ...\n",
" step_size_bar (chain, draw) float64 32kB 0.6497 0.6497 ... 0.5986\n",
" largest_eigval (chain, draw) float64 32kB nan nan nan ... nan nan\n",
" index_in_trajectory (chain, draw) int64 32kB -3 3 -2 3 -4 ... 3 -2 -1 2 5\n",
" reached_max_treedepth (chain, draw) bool 4kB False False ... False False\n",
" energy (chain, draw) float64 32kB 3.301e+03 ... 3.301e+03\n",
" smallest_eigval (chain, draw) float64 32kB nan nan nan ... nan nan\n",
"Attributes:\n",
" created_at: 2025-12-16T11:10:26.851747+00:00\n",
" arviz_version: 0.22.0\n",
" inference_library: pymc\n",
" inference_library_version: 5.25.1\n",
" sampling_time: 6.430373191833496\n",
" tuning_steps: 1000<xarray.Dataset> Size: 32kB\n",
"Dimensions: (customer_id: 2000)\n",
"Coordinates:\n",
" * customer_id (customer_id) int64 16kB 1 2 3 4 5 ... 1996 1997 1998 1999 2000\n",
"Data variables:\n",
" dropout (customer_id) float64 16kB 1.0 1.0 1.0 1.0 ... 8.0 8.0 8.0 8.0\n",
"Attributes:\n",
" created_at: 2025-12-16T11:10:26.854134+00:00\n",
" arviz_version: 0.22.0\n",
" inference_library: pymc\n",
" inference_library_version: 5.25.1<xarray.Dataset> Size: 80kB\n",
"Dimensions: (index: 2000)\n",
"Coordinates:\n",
" * index (index) int64 16kB 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999\n",
"Data variables:\n",
" customer_id (index) int64 16kB 1 2 3 4 5 6 ... 1996 1997 1998 1999 2000\n",
" recency (index) int64 16kB 1 1 1 1 1 1 1 1 1 1 ... 8 8 8 8 8 8 8 8 8 8\n",
" T (index) int64 16kB 8 8 8 8 8 8 8 8 8 8 ... 8 8 8 8 8 8 8 8 8 8\n",
" cohort (index) object 16kB 'highend' 'highend' ... 'regular' 'regular'| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| alpha[highend] | \n", "0.670 | \n", "0.112 | \n", "0.477 | \n", "0.874 | \n", "0.002 | \n", "0.003 | \n", "2305.0 | \n", "2512.0 | \n", "1.0 | \n", "
| alpha[regular] | \n", "0.703 | \n", "0.065 | \n", "0.591 | \n", "0.829 | \n", "0.001 | \n", "0.001 | \n", "3235.0 | \n", "3186.0 | \n", "1.0 | \n", "
| beta[highend] | \n", "3.821 | \n", "0.867 | \n", "2.313 | \n", "5.370 | \n", "0.020 | \n", "0.024 | \n", "1897.0 | \n", "2300.0 | \n", "1.0 | \n", "
| beta[regular] | \n", "1.181 | \n", "0.152 | \n", "0.904 | \n", "1.464 | \n", "0.003 | \n", "0.002 | \n", "2480.0 | \n", "2659.0 | \n", "1.0 | \n", "
| \n", " | customer_id | \n", "recency | \n", "T | \n", "cohort | \n", "highend_customer | \n", "
|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "1 | \n", "8 | \n", "population | \n", "1 | \n", "
| 1 | \n", "2 | \n", "1 | \n", "8 | \n", "population | \n", "1 | \n", "
| 2 | \n", "3 | \n", "1 | \n", "8 | \n", "population | \n", "1 | \n", "
| 3 | \n", "4 | \n", "1 | \n", "8 | \n", "population | \n", "1 | \n", "
| 4 | \n", "5 | \n", "1 | \n", "8 | \n", "population | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 1995 | \n", "1996 | \n", "8 | \n", "8 | \n", "population | \n", "0 | \n", "
| 1996 | \n", "1997 | \n", "8 | \n", "8 | \n", "population | \n", "0 | \n", "
| 1997 | \n", "1998 | \n", "8 | \n", "8 | \n", "population | \n", "0 | \n", "
| 1998 | \n", "1999 | \n", "8 | \n", "8 | \n", "population | \n", "0 | \n", "
| 1999 | \n", "2000 | \n", "8 | \n", "8 | \n", "population | \n", "0 | \n", "
2000 rows × 5 columns
\n", "| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| alpha[1] | \n", "0.671 | \n", "0.107 | \n", "0.481 | \n", "0.872 | \n", "0.004 | \n", "0.003 | \n", "863.0 | \n", "1106.0 | \n", "1.01 | \n", "
| alpha[2] | \n", "0.671 | \n", "0.107 | \n", "0.481 | \n", "0.872 | \n", "0.004 | \n", "0.003 | \n", "863.0 | \n", "1106.0 | \n", "1.01 | \n", "
| alpha[3] | \n", "0.671 | \n", "0.107 | \n", "0.481 | \n", "0.872 | \n", "0.004 | \n", "0.003 | \n", "863.0 | \n", "1106.0 | \n", "1.01 | \n", "
| alpha[4] | \n", "0.671 | \n", "0.107 | \n", "0.481 | \n", "0.872 | \n", "0.004 | \n", "0.003 | \n", "863.0 | \n", "1106.0 | \n", "1.01 | \n", "
| alpha[5] | \n", "0.671 | \n", "0.107 | \n", "0.481 | \n", "0.872 | \n", "0.004 | \n", "0.003 | \n", "863.0 | \n", "1106.0 | \n", "1.01 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| beta[1996] | \n", "1.197 | \n", "0.155 | \n", "0.921 | \n", "1.484 | \n", "0.005 | \n", "0.005 | \n", "894.0 | \n", "1096.0 | \n", "1.00 | \n", "
| beta[1997] | \n", "1.197 | \n", "0.155 | \n", "0.921 | \n", "1.484 | \n", "0.005 | \n", "0.005 | \n", "894.0 | \n", "1096.0 | \n", "1.00 | \n", "
| beta[1998] | \n", "1.197 | \n", "0.155 | \n", "0.921 | \n", "1.484 | \n", "0.005 | \n", "0.005 | \n", "894.0 | \n", "1096.0 | \n", "1.00 | \n", "
| beta[1999] | \n", "1.197 | \n", "0.155 | \n", "0.921 | \n", "1.484 | \n", "0.005 | \n", "0.005 | \n", "894.0 | \n", "1096.0 | \n", "1.00 | \n", "
| beta[2000] | \n", "1.197 | \n", "0.155 | \n", "0.921 | \n", "1.484 | \n", "0.005 | \n", "0.005 | \n", "894.0 | \n", "1096.0 | \n", "1.00 | \n", "
4000 rows × 9 columns
\n", "| \n", " | customer_id | \n", "recency | \n", "T | \n", "cohort | \n", "highend_customer | \n", "
|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "1 | \n", "8 | \n", "2025-01 | \n", "1 | \n", "
| 1 | \n", "2 | \n", "1 | \n", "8 | \n", "2025-01 | \n", "1 | \n", "
| 2 | \n", "3 | \n", "1 | \n", "8 | \n", "2025-01 | \n", "1 | \n", "
| 3 | \n", "4 | \n", "1 | \n", "8 | \n", "2025-01 | \n", "1 | \n", "
| 4 | \n", "5 | \n", "1 | \n", "8 | \n", "2025-01 | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 13995 | \n", "13996 | \n", "2 | \n", "2 | \n", "2025-07 | \n", "0 | \n", "
| 13996 | \n", "13997 | \n", "2 | \n", "2 | \n", "2025-07 | \n", "0 | \n", "
| 13997 | \n", "13998 | \n", "2 | \n", "2 | \n", "2025-07 | \n", "0 | \n", "
| 13998 | \n", "13999 | \n", "2 | \n", "2 | \n", "2025-07 | \n", "0 | \n", "
| 13999 | \n", "14000 | \n", "2 | \n", "2 | \n", "2025-07 | \n", "0 | \n", "
14000 rows × 5 columns
\n", "Sampler Progress
\n", "Total Chains: 4
\n", "Active Chains: 0
\n", "\n", " Finished Chains:\n", " 4\n", "
\n", "Sampling for 2 minutes
\n", "\n", " Estimated Time to Completion:\n", " now\n", "
\n", "\n", " \n", "| Progress | \n", "Draws | \n", "Divergences | \n", "Step Size | \n", "Gradients/Draw | \n", "
|---|---|---|---|---|
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.45 | \n", "15 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.44 | \n", "7 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.42 | \n", "15 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.43 | \n", "15 | \n", "
<xarray.Dataset> Size: 898MB\n",
"Dimensions: (chain: 4, draw: 1000, phi_interval___dim_0: 7,\n",
" kappa_interval___dim_0: 7,\n",
" dropout_covariate: 1, cohort: 7,\n",
" customer_id: 14000)\n",
"Coordinates:\n",
" * chain (chain) int64 32B 0 1 2 3\n",
" * draw (draw) int64 8kB 0 1 2 3 4 ... 996 997 998 999\n",
" * phi_interval___dim_0 (phi_interval___dim_0) int64 56B 0 1 2 3 4 5 6\n",
" * kappa_interval___dim_0 (kappa_interval___dim_0) int64 56B 0 1 2 3 4 5 6\n",
" * dropout_covariate (dropout_covariate) object 8B 'highend_customer'\n",
" * cohort (cohort) object 56B '2025-01' ... '2025-07'\n",
" * customer_id (customer_id) int64 112kB 1 2 3 ... 13999 14000\n",
"Data variables:\n",
" phi_interval__ (chain, draw, phi_interval___dim_0) float64 224kB ...\n",
" kappa_interval__ (chain, draw, kappa_interval___dim_0) float64 224kB ...\n",
" dropout_coefficient_alpha (chain, draw, dropout_covariate) float64 32kB ...\n",
" dropout_coefficient_beta (chain, draw, dropout_covariate) float64 32kB ...\n",
" phi (chain, draw, cohort) float64 224kB 0.3788 ......\n",
" kappa (chain, draw, cohort) float64 224kB 1.828 ... ...\n",
" alpha_scale (chain, draw, cohort) float64 224kB 0.6925 ......\n",
" beta_scale (chain, draw, cohort) float64 224kB 1.135 ... ...\n",
" alpha (chain, draw, customer_id) float64 448MB 0.834...\n",
" beta (chain, draw, customer_id) float64 448MB 4.792...\n",
"Attributes:\n",
" created_at: 2025-12-16T10:36:01.660031+00:00\n",
" arviz_version: 0.22.0\n",
" inference_library: nutpie\n",
" inference_library_version: 0.15.2\n",
" sampling_time: 108.42676997184753\n",
" tuning_steps: 1000<xarray.Dataset> Size: 336kB\n",
"Dimensions: (chain: 4, draw: 1000)\n",
"Coordinates:\n",
" * chain (chain) int64 32B 0 1 2 3\n",
" * draw (draw) int64 8kB 0 1 2 3 4 5 ... 995 996 997 998 999\n",
"Data variables:\n",
" depth (chain, draw) uint64 32kB 3 3 3 3 4 4 ... 4 4 4 4 4 3\n",
" maxdepth_reached (chain, draw) bool 4kB False False ... False False\n",
" index_in_trajectory (chain, draw) int64 32kB 6 -6 4 -3 9 ... 7 6 -8 -6 6\n",
" logp (chain, draw) float64 32kB -1.704e+04 ... -1.704e+04\n",
" energy (chain, draw) float64 32kB 1.705e+04 ... 1.705e+04\n",
" diverging (chain, draw) bool 4kB False False ... False False\n",
" energy_error (chain, draw) float64 32kB 0.2241 -0.1556 ... -0.3027\n",
" step_size (chain, draw) float64 32kB 0.4522 0.4522 ... 0.4336\n",
" step_size_bar (chain, draw) float64 32kB 0.4522 0.4522 ... 0.4336\n",
" mean_tree_accept (chain, draw) float64 32kB 0.8101 0.8465 ... 0.9595\n",
" mean_tree_accept_sym (chain, draw) float64 32kB 0.8918 0.8972 ... 0.9227\n",
" n_steps (chain, draw) uint64 32kB 7 7 15 7 15 ... 15 31 15 15\n",
"Attributes:\n",
" created_at: 2025-12-16T10:36:01.619159+00:00\n",
" arviz_version: 0.22.0<xarray.Dataset> Size: 224kB\n",
"Dimensions: (customer_id: 14000)\n",
"Coordinates:\n",
" * customer_id (customer_id) int64 112kB 1 2 3 4 5 ... 13997 13998 13999 14000\n",
"Data variables:\n",
" dropout (customer_id) float64 112kB 1.0 1.0 1.0 1.0 ... 2.0 2.0 2.0 2.0\n",
"Attributes:\n",
" created_at: 2025-12-16T10:36:01.659331+00:00\n",
" arviz_version: 0.22.0\n",
" inference_library: pymc\n",
" inference_library_version: 5.25.1<xarray.Dataset> Size: 224kB\n",
"Dimensions: (customer_id: 14000, dropout_covariate: 1)\n",
"Coordinates:\n",
" * customer_id (customer_id) int64 112kB 1 2 3 4 ... 13998 13999 14000\n",
" * dropout_covariate (dropout_covariate) <U16 64B 'highend_customer'\n",
"Data variables:\n",
" dropout_data (customer_id, dropout_covariate) float64 112kB 1.0 ......\n",
"Attributes:\n",
" created_at: 2025-12-16T10:36:01.655943+00:00\n",
" arviz_version: 0.22.0\n",
" inference_library: pymc\n",
" inference_library_version: 5.25.1<xarray.Dataset> Size: 672kB\n",
"Dimensions: (index: 14000)\n",
"Coordinates:\n",
" * index (index) int64 112kB 0 1 2 3 4 ... 13996 13997 13998 13999\n",
"Data variables:\n",
" customer_id (index) int64 112kB 1 2 3 4 5 ... 13997 13998 13999 14000\n",
" recency (index) int64 112kB 1 1 1 1 1 1 1 1 1 ... 2 2 2 2 2 2 2 2\n",
" T (index) int64 112kB 8 8 8 8 8 8 8 8 8 ... 2 2 2 2 2 2 2 2\n",
" cohort (index) object 112kB '2025-01' '2025-01' ... '2025-07'\n",
" highend_customer (index) int64 112kB 1 1 1 1 1 1 1 1 1 ... 0 0 0 0 0 0 0 0<xarray.Dataset> Size: 898MB\n",
"Dimensions: (chain: 4, draw: 1000, phi_interval___dim_0: 7,\n",
" kappa_interval___dim_0: 7,\n",
" dropout_covariate: 1, cohort: 7,\n",
" customer_id: 14000)\n",
"Coordinates:\n",
" * chain (chain) int64 32B 0 1 2 3\n",
" * draw (draw) int64 8kB 0 1 2 3 4 ... 996 997 998 999\n",
" * phi_interval___dim_0 (phi_interval___dim_0) int64 56B 0 1 2 3 4 5 6\n",
" * kappa_interval___dim_0 (kappa_interval___dim_0) int64 56B 0 1 2 3 4 5 6\n",
" * dropout_covariate (dropout_covariate) object 8B 'highend_customer'\n",
" * cohort (cohort) object 56B '2025-01' ... '2025-07'\n",
" * customer_id (customer_id) int64 112kB 1 2 3 ... 13999 14000\n",
"Data variables:\n",
" phi_interval__ (chain, draw, phi_interval___dim_0) float64 224kB ...\n",
" kappa_interval__ (chain, draw, kappa_interval___dim_0) float64 224kB ...\n",
" dropout_coefficient_alpha (chain, draw, dropout_covariate) float64 32kB ...\n",
" dropout_coefficient_beta (chain, draw, dropout_covariate) float64 32kB ...\n",
" phi (chain, draw, cohort) float64 224kB 0.3648 ......\n",
" kappa (chain, draw, cohort) float64 224kB 2.659 ... ...\n",
" alpha_scale (chain, draw, cohort) float64 224kB 0.9699 ......\n",
" beta_scale (chain, draw, cohort) float64 224kB 1.689 ... ...\n",
" alpha (chain, draw, customer_id) float64 448MB 0.54 ...\n",
" beta (chain, draw, customer_id) float64 448MB 2.945...\n",
"Attributes:\n",
" created_at: 2025-12-16T10:36:01.615801+00:00\n",
" arviz_version: 0.22.0<xarray.Dataset> Size: 336kB\n",
"Dimensions: (chain: 4, draw: 1000)\n",
"Coordinates:\n",
" * chain (chain) int64 32B 0 1 2 3\n",
" * draw (draw) int64 8kB 0 1 2 3 4 5 ... 995 996 997 998 999\n",
"Data variables:\n",
" depth (chain, draw) uint64 32kB 2 0 2 3 3 2 ... 3 4 3 3 3 3\n",
" maxdepth_reached (chain, draw) bool 4kB False False ... False False\n",
" index_in_trajectory (chain, draw) int64 32kB 1 0 -2 5 -3 ... -1 -5 -3 1 4\n",
" logp (chain, draw) float64 32kB -1.881e+04 ... -1.704e+04\n",
" energy (chain, draw) float64 32kB 1.885e+04 ... 1.704e+04\n",
" diverging (chain, draw) bool 4kB False True ... False False\n",
" energy_error (chain, draw) float64 32kB -0.297 0.0 ... -0.3335\n",
" step_size (chain, draw) float64 32kB 1.439 0.2431 ... 0.4336\n",
" step_size_bar (chain, draw) float64 32kB 1.439 0.4998 ... 0.4336\n",
" mean_tree_accept (chain, draw) float64 32kB 1.0 0.0 ... 0.7465 0.8491\n",
" mean_tree_accept_sym (chain, draw) float64 32kB 0.7342 0.0 ... 0.8206\n",
" n_steps (chain, draw) uint64 32kB 3 1 3 7 7 3 ... 31 7 7 7 7\n",
"Attributes:\n",
" created_at: 2025-12-16T10:36:01.622916+00:00\n",
" arviz_version: 0.22.0| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| alpha_scale[2025-01] | \n", "0.670 | \n", "0.057 | \n", "0.567 | \n", "0.776 | \n", "0.001 | \n", "0.001 | \n", "3615.0 | \n", "2966.0 | \n", "1.0 | \n", "
| alpha_scale[2025-02] | \n", "0.700 | \n", "0.066 | \n", "0.576 | \n", "0.824 | \n", "0.001 | \n", "0.001 | \n", "3633.0 | \n", "3542.0 | \n", "1.0 | \n", "
| alpha_scale[2025-03] | \n", "0.738 | \n", "0.074 | \n", "0.604 | \n", "0.876 | \n", "0.001 | \n", "0.001 | \n", "3809.0 | \n", "2978.0 | \n", "1.0 | \n", "
| alpha_scale[2025-04] | \n", "0.795 | \n", "0.094 | \n", "0.627 | \n", "0.966 | \n", "0.001 | \n", "0.002 | \n", "4007.0 | \n", "2671.0 | \n", "1.0 | \n", "
| alpha_scale[2025-05] | \n", "0.854 | \n", "0.137 | \n", "0.614 | \n", "1.101 | \n", "0.002 | \n", "0.002 | \n", "4185.0 | \n", "3264.0 | \n", "1.0 | \n", "
| alpha_scale[2025-06] | \n", "1.007 | \n", "0.299 | \n", "0.586 | \n", "1.545 | \n", "0.005 | \n", "0.009 | \n", "4845.0 | \n", "2831.0 | \n", "1.0 | \n", "
| alpha_scale[2025-07] | \n", "3.749 | \n", "34.215 | \n", "0.341 | \n", "5.671 | \n", "0.600 | \n", "7.920 | \n", "5479.0 | \n", "2460.0 | \n", "1.0 | \n", "
| beta_scale[2025-01] | \n", "1.125 | \n", "0.128 | \n", "0.887 | \n", "1.352 | \n", "0.002 | \n", "0.002 | \n", "3390.0 | \n", "2809.0 | \n", "1.0 | \n", "
| beta_scale[2025-02] | \n", "1.177 | \n", "0.146 | \n", "0.918 | \n", "1.452 | \n", "0.003 | \n", "0.003 | \n", "2910.0 | \n", "2730.0 | \n", "1.0 | \n", "
| beta_scale[2025-03] | \n", "1.244 | \n", "0.161 | \n", "0.954 | \n", "1.542 | \n", "0.003 | \n", "0.003 | \n", "3331.0 | \n", "2498.0 | \n", "1.0 | \n", "
| beta_scale[2025-04] | \n", "1.342 | \n", "0.196 | \n", "0.997 | \n", "1.704 | \n", "0.003 | \n", "0.003 | \n", "3574.0 | \n", "2620.0 | \n", "1.0 | \n", "
| beta_scale[2025-05] | \n", "1.448 | \n", "0.276 | \n", "0.977 | \n", "1.965 | \n", "0.004 | \n", "0.005 | \n", "3706.0 | \n", "3144.0 | \n", "1.0 | \n", "
| beta_scale[2025-06] | \n", "1.739 | \n", "0.566 | \n", "0.912 | \n", "2.725 | \n", "0.009 | \n", "0.018 | \n", "4453.0 | \n", "2754.0 | \n", "1.0 | \n", "
| beta_scale[2025-07] | \n", "6.601 | \n", "61.150 | \n", "0.617 | \n", "10.162 | \n", "1.064 | \n", "14.775 | \n", "5644.0 | \n", "2340.0 | \n", "1.0 | \n", "
| dropout_coefficient_alpha[highend_customer] | \n", "-0.172 | \n", "0.112 | \n", "-0.368 | \n", "0.051 | \n", "0.003 | \n", "0.002 | \n", "1054.0 | \n", "1416.0 | \n", "1.0 | \n", "
| dropout_coefficient_beta[highend_customer] | \n", "-1.428 | \n", "0.132 | \n", "-1.670 | \n", "-1.175 | \n", "0.004 | \n", "0.003 | \n", "1030.0 | \n", "1484.0 | \n", "1.0 | \n", "
| \n", " | cohort | \n", "customer_id | \n", "retention | \n", "
|---|---|---|---|
| 0 | \n", "2025-01 | \n", "510 | \n", "0.936442 | \n", "
| 1 | \n", "2025-01 | \n", "511 | \n", "0.936442 | \n", "
| 2 | \n", "2025-01 | \n", "512 | \n", "0.936442 | \n", "
| 3 | \n", "2025-01 | \n", "513 | \n", "0.936442 | \n", "
| 4 | \n", "2025-01 | \n", "514 | \n", "0.936442 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 7011 | \n", "2025-07 | \n", "13996 | \n", "0.749574 | \n", "
| 7012 | \n", "2025-07 | \n", "13997 | \n", "0.749574 | \n", "
| 7013 | \n", "2025-07 | \n", "13998 | \n", "0.749574 | \n", "
| 7014 | \n", "2025-07 | \n", "13999 | \n", "0.749574 | \n", "
| 7015 | \n", "2025-07 | \n", "14000 | \n", "0.749574 | \n", "
7016 rows × 3 columns
\n", "