Suppose that we have: $$ t_i = \mu + \xi_i $$ and $\xi_i\sim p(\xi_i|\theta)$. Then $\xi_i|\mu\sim p_\mu(\xi_i|\theta)$ where $p_\mu(\xi_i|\theta)$ is simply the distribution $p(\xi_i|\theta)$ shifted to the left by $\mu$.
In the event that the event is censored ($e_i=0$), we know that $t_i < \mu + \xi_i$ since the 'death' offset of $\xi_i$ is not observed.
Therefore we may write the likelihood of $$ \begin{aligned} p(t_i, e_i|\mu) =& \left(p(t_i-\mu)\right)^{e_i} \left(\int_{t_i}^\infty p(t-\mu) dt\right)^{1-e_i}\\ \log p(t_i, e_i|\mu) =& e_i \log p(t-\mu) + (1 - e_i) \log \left(1 - \int_{-\infty}^{t_i} p(t-\mu) dt \right) \end{aligned} $$
N = 5
event = torch.randint(0, 2, (N,))
log_pdf = torch.randn((N,))
log_cdf = -torch.rand((N,))
aft_loss = AFTLoss()
aft_loss(event, log_pdf, log_cdf)
We use the following loss function to infer our model. See here for theory. $$ -\log L = \sum_{i=1}^N \Lambda(t_i) - d_i \log \lambda(t_i) $$