class TestData[source]
TestData(t:array,b:Optional[array]=None,x:Optional[array]=None,t_scaler:MaxAbsScaler=None,x_scaler:StandardScaler=None) ::Dataset
Create pyTorch Dataset parameters:
- t: time elapsed
- b: (optional) breakpoints where the hazard is different to previous segment of time. Must include 0 as first element and the maximum time as last element
- x: (optional) features
class Data[source]
Data(t:array,e:array,b:Optional[array]=None,x:Optional[array]=None,t_scaler:MaxAbsScaler=None,x_scaler:StandardScaler=None) ::TestData
Create pyTorch Dataset parameters:
- t: time elapsed
- e: (death) event observed. 1 if observed, 0 otherwise.
- b: (optional) breakpoints where the hazard is different to previous segment of time.
- x: (optional) features
class TestDataFrame[source]
TestDataFrame(df:DataFrame,b:Optional[array]=None,t_scaler:MaxAbsScaler=None,x_scaler:StandardScaler=None) ::TestData
Wrapper around Data Class that takes in a dataframe instead parameters:
- df: dataframe. **Must have t (time) and e (event) columns, other cols optional.
- b: breakpoints of time (optional)
class DataFrame[source]
DataFrame(data=None,index:Optional[Collection[T_co]]=None,columns:Optional[Collection[T_co]]=None,dtype:Union[ForwardRef('ExtensionDtype'),str,dtype,Type[Union[str,float,int,complex,bool]],NoneType]=None,copy:bool=False) ::NDFrame
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
Parameters
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects.
.. versionchanged:: 0.23.0
   If data is a dict, column order follows insertion-order for
   Python 3.6 and later.
.. versionchanged:: 0.25.0
   If data is a list of dicts, column order follows insertion-order
   for Python 3.6 and later.
index : Index or array-like Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided. columns : Index or array-like Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, ..., n) if no column labels are provided. dtype : dtype, default None Data type to force. Only a single dtype is allowed. If None, infer. copy : bool, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input.
See Also
DataFrame.from_records : Constructor from tuples, also record arrays. DataFrame.from_dict : From dicts of Series, arrays, or dicts. read_csv : Read a comma-separated values (csv) file into DataFrame. read_table : Read general delimited file into DataFrame. read_clipboard : Read text from clipboard into DataFrame.
Examples
Constructing DataFrame from a dictionary.
d = {'col1': [1, 2], 'col2': [3, 4]} df = pd.DataFrame(data=d) df col1 col2 0 1 3 1 2 4
Notice that the inferred dtype is int64.
df.dtypes col1 int64 col2 int64 dtype: object
To enforce a single dtype:
df = pd.DataFrame(data=d, dtype=np.int8) df.dtypes col1 int8 col2 int8 dtype: object
Constructing DataFrame from numpy ndarray:
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), ... columns=['a', 'b', 'c']) df2 a b c 0 1 2 3 1 4 5 6 2 7 8 9
Create iterable data loaders/ fastai databunch using above:
create_dl[source]
create_dl(df:DataFrame,b:Optional[array]=None,train_size:float=0.8,random_state=None,bs:int=128)
Take dataframe and split into train, test, val (optional) and convert to Fastai databunch
parameters:
- df: pandas dataframe
- b(optional): breakpoints of time. Must include 0 as first element and the maximum time as last element
- train_p: training percentage
- bs: batch size
get_breakpoints[source]
get_breakpoints(df:DataFrame,percentiles:list=[20, 40, 60, 80])
Gives the times at which death events occur at given percentile parameters: df - must contain columns 't' (time) and 'e' (death event) percentiles - list of percentages at which breakpoints occur (do not include 0 and 100)