analyticsdf package
analyticsdf.analyticsdataframe module
- class analyticsdf.analyticsdataframe.AnalyticsDataframe(n, p, predictor_names=None, response_vector_name=None, seed=None)[source]
Bases:
object
Create a AnalyticsDataframe class.
Creates a dataframe class which uses the
n
,p
,predictor_names
andresponse_vector_name
arguments to initialize a dataframe.- Args:
- n:
Number of observations.
- p:
Number of predictors.
- predictor_names:
List of strings (default = [X1, X2, … Xp]).
- response_vector_name:
String (default = Y).
- Returns:
- AnalyticsDataframe class:
predictor_matrix: a Pandas Dataframe with Nan. response_vector: a Pandas Series with Nan.
- generate_response_vector_linear(predictor_name_list: list = None, beta: list = None, epsilon_variance: float = None)[source]
Generates a response vector based on a linear regression generative model.
- Args:
- predictor_name_list:
A list of predictor names in the initial AnalyticsDataframe.
- beta:
A list, coefficients of the linear model – first coefficient is the intercept
- epsilon_variance:
A scalar variance specification.
- Raises:
KeyError: If the column does not exists.
- generate_response_vector_polynomial(predictor_name_list: list, polynomial_order: list, beta: list, interaction_term_betas: array, epsilon_variance: float)[source]
Generates a response vector based on a linear regression generative model that contains polynomial terms for one or more of the predictors and interaction terms.
- Args:
- predictor_name_list:
A list of predictor names in the initial AnalyticsDataframe.
- polynomial_order:
A list of integers that specify the order of the polynomial for each predictor with legal values of 1 to 4.
- beta_vector:
- A list of the betas (coefficients of the linear model)
– First coefficient is the intercept – Next coefficients ( are the coefficients of the polynomial terms for the first predictor (as specified in the polynomial_order array) – Continuing in this manner for all the predictors specified in the predictor_names parameter - Array length must equal the sum of the values in the polynomial_order array plus one
- interaction_term_betas:
A np.array-like lower triangular matrix with both dimensions equal to the sum of the polynomial_order array containing the betas of any interaction terms
- epsilon_variance:
A scalar variance specification
- Raises:
KeyError: If the column does not exists. TypeError: If the column is not numeric.
- property predictor_names
- property response_vector_name
- update_predictor_beta(predictor_name_list, a, b)[source]
Update the predictors of the instance as beta distributed.
- Args:
- predictor_name_list:
A list of predictor names in the initial AnalyticsDataframe.
- a:
float or array_like of floats. Alpha, positive (>0).
- b:
float or array_like of floats. Beta, positive (>0).
- Raises:
KeyError: If the column does not exists.
- update_predictor_categorical(predictor_name=None, category_names: list | None = None, prob_vector: array | None = None)[source]
Update a predictor with categorical values.
- Args:
- predictor_name:
A predictor name in the initial AnalyticsDataframe.
- category_names:
A vector of strings that contains names of the different category values
- prob_vector:
A vector of numerics of the same length as category_names that specifies the probability (frequency) of each category value.
- Raises:
KeyError: If the column does not exists. ValueError: If sum of
prob_vector
not equal to 1. ValueError: If length ofprob_vector
not equal tocategory_names
.
- update_predictor_multicollinear(target_predictor_name=None, dependent_predictors_list=None, beta: list | None = None, epsilon_variance: float | None = None)[source]
Update the predictor to be multicollinear with other predictors.
- Args:
- predictor_name:
A string of target predictor name in the initial AnalyticsDataframe.
- dependent_predictors_list:
A list of predictor names which selected as dependents.
- beta:
A list, coefficients of the linear model – first coefficient is the intercept
- epsilon_variance:
A scalar variance specification.
- Raises:
KeyError: If the column does not exists.
- update_predictor_normal(predictor_name_list: list = None, mean: ndarray = None, covariance_matrix: ndarray = None)[source]
Update the predictors of the instance to normally distributed.
- Args:
- predictor_name_list:
A list of predictor names in the initial AnalyticsDataframe.
- mean:
A numpy array or list, containing mean values.
- covariance_matrix:
A symmetric and positive semi-definite N * N matrix, defines correlation among N variables.
- Raises:
KeyError: If the column does not exists. ValueError: If mean and cov does not have the same size.
- update_predictor_uniform(predictor_name=None, lower_bound=0, upper_bound=1.0)[source]
Update a predictor to uniformly distributed.
- Args:
- predictor_name:
String, a predictor name in AnalyticsDataframe object.
- lower_bound:
float, lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
- upper_bound:
float, upper boundary of the output interval. All values generated will be less than or equal to high. The default value is 1.0.
- Raises:
KeyError: If the column does not exists.
- update_response_poly_categorical(predictor_name: str | None = None, betas: dict | None = None)[source]
Add categorical factor into response in a polynomial manner.
- Args:
- predictor_name:
String, a predictor name in AnalyticsDataframe object.
- betas:
A dictionary key: categorical values in the current predictor value: beta value for this categorical type/value
- Raises:
KeyError: If the column does not exists. TypeError: If this is not categorical predictor.