AFT Model theory.

We can model the time to failure as: $$ \log T_i = \mu + \xi_i $$ where $\xi_i\sim p(\xi|\theta)$ and $\mu$ is the most likely log time of death (the mode of the distribution of $T_i$). We model log death as that way we do not need to restrict $\mu + \xi_i$ to be positive.

In the censored case, where $t_i$ is the time where an instance was censored, and $T_i$ is the unobserved time of death, we have: $$ \begin{aligned} \log T_i &= \mu(x_i) + \xi_i > \log t_i\\ \therefore \xi_i &> \log t_i - \mu(x_i) \end{aligned} $$ Note that $\mu$ is a function of the features $x$. The log likelihood of the data ($\mathcal{D}$) can then shown to be: $$ \begin{aligned} \log p(\mathcal{D}) = \sum_{i=1}^N \mathcal{1}(y_i=1)\log p(\xi_i = \log t_i - \mu(x_i)) + \mathcal{1}(y_i=0)\log p(\xi_i &> \log t_i - \mu(x_i)) \end{aligned} $$

class AFTModel[source]

AFTModel(distribution:str, input_dim:int, h:tuple=()) :: Module

Accelerated Failure Time model parameters:

  • Distribution of which the error is assumed to be
  • dim (optional): input dimensionality of variables
  • h (optional): number of hidden nodes

Modelling based on only time and (death) event variables:

# import pandas as pd

# url = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/survival/flchain.csv"
# df = pd.read_csv(url).iloc[:,1:]
# df.rename(columns={'futime':'t', 'death':'e'}, inplace=True)

# cols = ["age", "sample.yr", "kappa"]
# db, t_scaler, x_scaler = create_dl(df[['t', 'e'] + cols])

# death_rate = 100*df["e"].mean()
# print(f"Death occurs in {death_rate:.2f}% of cases")
# print(df.shape)
# df.head()
# from fastai.basics import Learner
# from torchlife.losses import aft_loss

# model = AFTModel("Gumbel", t_scaler, x_scaler)
# learner = Learner(db, model, loss_func=aft_loss)
# # wd = 1e-4
# learner.lr_find(start_lr=1, end_lr=10)
# learner.recorder.plot()