class
TestData
[source]
TestData
(t
:array
,b
:Optional
[array
]=None
,x
:Optional
[array
]=None
,t_scaler
:MaxAbsScaler
=None
,x_scaler
:StandardScaler
=None
) ::Dataset
Create pyTorch Dataset parameters:
- t: time elapsed
- b: (optional) breakpoints where the hazard is different to previous segment of time. Must include 0 as first element and the maximum time as last element
- x: (optional) features
class
Data
[source]
Data
(t
:array
,e
:array
,b
:Optional
[array
]=None
,x
:Optional
[array
]=None
,t_scaler
:MaxAbsScaler
=None
,x_scaler
:StandardScaler
=None
) ::TestData
Create pyTorch Dataset parameters:
- t: time elapsed
- e: (death) event observed. 1 if observed, 0 otherwise.
- b: (optional) breakpoints where the hazard is different to previous segment of time.
- x: (optional) features
class
TestDataFrame
[source]
TestDataFrame
(df
:DataFrame
,b
:Optional
[array
]=None
,t_scaler
:MaxAbsScaler
=None
,x_scaler
:StandardScaler
=None
) ::TestData
Wrapper around Data Class that takes in a dataframe instead parameters:
- df: dataframe. **Must have t (time) and e (event) columns, other cols optional.
- b: breakpoints of time (optional)
class
DataFrame
[source]
DataFrame
(data
=None
,index
:Optional
[Collection
[T_co
]]=None
,columns
:Optional
[Collection
[T_co
]]=None
,dtype
:Union
[ForwardRef('ExtensionDtype')
,str
,dtype
,Type
[Union
[str
,float
,int
,complex
,bool
]],NoneType
]=None
,copy
:bool
=False
) ::NDFrame
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
Parameters
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects.
.. versionchanged:: 0.23.0
If data is a dict, column order follows insertion-order for
Python 3.6 and later.
.. versionchanged:: 0.25.0
If data is a list of dicts, column order follows insertion-order
for Python 3.6 and later.
index : Index or array-like Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided. columns : Index or array-like Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, ..., n) if no column labels are provided. dtype : dtype, default None Data type to force. Only a single dtype is allowed. If None, infer. copy : bool, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input.
See Also
DataFrame.from_records : Constructor from tuples, also record arrays. DataFrame.from_dict : From dicts of Series, arrays, or dicts. read_csv : Read a comma-separated values (csv) file into DataFrame. read_table : Read general delimited file into DataFrame. read_clipboard : Read text from clipboard into DataFrame.
Examples
Constructing DataFrame from a dictionary.
d = {'col1': [1, 2], 'col2': [3, 4]} df = pd.DataFrame(data=d) df col1 col2 0 1 3 1 2 4
Notice that the inferred dtype is int64.
df.dtypes col1 int64 col2 int64 dtype: object
To enforce a single dtype:
df = pd.DataFrame(data=d, dtype=np.int8) df.dtypes col1 int8 col2 int8 dtype: object
Constructing DataFrame from numpy ndarray:
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), ... columns=['a', 'b', 'c']) df2 a b c 0 1 2 3 1 4 5 6 2 7 8 9
Create iterable data loaders/ fastai databunch using above:
create_dl
[source]
create_dl
(df
:DataFrame
,b
:Optional
[array
]=None
,train_size
:float
=0.8
,random_state
=None
,bs
:int
=128
)
Take dataframe and split into train, test, val (optional) and convert to Fastai databunch
parameters:
- df: pandas dataframe
- b(optional): breakpoints of time. Must include 0 as first element and the maximum time as last element
- train_p: training percentage
- bs: batch size
get_breakpoints
[source]
get_breakpoints
(df
:DataFrame
,percentiles
:list
=[20, 40, 60, 80]
)
Gives the times at which death events occur at given percentile parameters: df - must contain columns 't' (time) and 'e' (death event) percentiles - list of percentages at which breakpoints occur (do not include 0 and 100)