We can model the time to failure as: $$ \log T_i = \mu + \xi_i $$ where $\xi_i\sim p(\xi|\theta)$ and $\mu$ is the most likely log time of death (the mode of the distribution of $T_i$). We model log death as that way we do not need to restrict $\mu + \xi_i$ to be positive.
In the censored case, where $t_i$ is the time where an instance was censored, and $T_i$ is the unobserved time of death, we have: $$ \begin{aligned} \log T_i &= \mu(x_i) + \xi_i > \log t_i\\ \therefore \xi_i &> \log t_i - \mu(x_i) \end{aligned} $$ Note that $\mu$ is a function of the features $x$. The log likelihood of the data ($\mathcal{D}$) can then shown to be: $$ \begin{aligned} \log p(\mathcal{D}) = \sum_{i=1}^N \mathcal{1}(y_i=1)\log p(\xi_i = \log t_i - \mu(x_i)) + \mathcal{1}(y_i=0)\log p(\xi_i &> \log t_i - \mu(x_i)) \end{aligned} $$
Modelling based on only time and (death) event variables:
# import pandas as pd
# url = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/survival/flchain.csv"
# df = pd.read_csv(url).iloc[:,1:]
# df.rename(columns={'futime':'t', 'death':'e'}, inplace=True)
# cols = ["age", "sample.yr", "kappa"]
# db, t_scaler, x_scaler = create_dl(df[['t', 'e'] + cols])
# death_rate = 100*df["e"].mean()
# print(f"Death occurs in {death_rate:.2f}% of cases")
# print(df.shape)
# df.head()
# from fastai.basics import Learner
# from torchlife.losses import aft_loss
# model = AFTModel("Gumbel", t_scaler, x_scaler)
# learner = Learner(db, model, loss_func=aft_loss)
# # wd = 1e-4
# learner.lr_find(start_lr=1, end_lr=10)
# learner.recorder.plot()