How to estimate the Kaplam Meier model.

The KM estimator is estimated by: $$ S(t_i) = \prod_{t\le t_i} \left(1 - \frac{d_i}{n_i}\right) $$ where $d_i$ is the number of deaths at time $t$ and $n_i$ is the number of individuals alive just before $t$.

Given a large enough observation time, the last number of deaths will be equal to $n_i$. At this time point and into the future, the survival function is zero as $\left(1 - \frac{d_i}{n_i}\right)=0$.

class KaplanMeier[source]

KaplanMeier()

data:

df.head()
ctryname cowcode2 politycode un_region_name un_continent_name ehead leaderspellreg democracy regime start_year duration observed
0 Afghanistan 700 700.0 Southern Asia Asia Mohammad Zahir Shah Mohammad Zahir Shah.Afghanistan.1946.1952.Mona... Non-democracy Monarchy 1946 7 1
1 Afghanistan 700 700.0 Southern Asia Asia Sardar Mohammad Daoud Sardar Mohammad Daoud.Afghanistan.1953.1962.Ci... Non-democracy Civilian Dict 1953 10 1
2 Afghanistan 700 700.0 Southern Asia Asia Mohammad Zahir Shah Mohammad Zahir Shah.Afghanistan.1963.1972.Mona... Non-democracy Monarchy 1963 10 1
3 Afghanistan 700 700.0 Southern Asia Asia Sardar Mohammad Daoud Sardar Mohammad Daoud.Afghanistan.1973.1977.Ci... Non-democracy Civilian Dict 1973 5 0
4 Afghanistan 700 700.0 Southern Asia Asia Nur Mohammad Taraki Nur Mohammad Taraki.Afghanistan.1978.1978.Civi... Non-democracy Civilian Dict 1978 1 0
km = KaplanMeier()
km.fit(df.rename(columns={'duration': 't', 'observed': 'e'}))
km.plot_survival_function();
km.predict(np.array([5, 20, 30, 100]))
array([0.33400794, 0.09099421, 0.05275089, 0.        ])