analyticsdf package

analyticsdf.analyticsdataframe module

class analyticsdf.analyticsdataframe.AnalyticsDataframe(n, p, predictor_names=None, response_vector_name=None, seed=None)[source]

Bases: object

Create a AnalyticsDataframe class.

Creates a dataframe class which uses the n, p, predictor_names and response_vector_name arguments to initialize a dataframe.

Args:

n:: Number of observations.
p:: Number of predictors.
predictor_names:: List of strings (default = [X1, X2, … Xp]).
response_vector_name:: String (default = Y).

Returns:

AnalyticsDataframe class:: predictor_matrix: a Pandas Dataframe with Nan. response_vector: a Pandas Series with Nan.

generate_response_vector_linear(predictor_name_list: list = None, beta: list = None, epsilon_variance: float = None)[source]

Generates a response vector based on a linear regression generative model.

Args:

predictor_name_list:: A list of predictor names in the initial AnalyticsDataframe.
beta:: A list, coefficients of the linear model – first coefficient is the intercept
epsilon_variance:: A scalar variance specification.

Raises:

KeyError: If the column does not exists.

generate_response_vector_polynomial(predictor_name_list: list, polynomial_order: list, beta: list, interaction_term_betas: array, epsilon_variance: float)[source]

Generates a response vector based on a linear regression generative model that contains polynomial terms for one or more of the predictors and interaction terms.

Args:

predictor_name_list:

A list of predictor names in the initial AnalyticsDataframe.

polynomial_order:

A list of integers that specify the order of the polynomial for each predictor with legal values of 1 to 4.

beta_vector:

A list of the betas (coefficients of the linear model): – First coefficient is the intercept – Next coefficients ( are the coefficients of the polynomial terms for the first predictor (as specified in the polynomial_order array) – Continuing in this manner for all the predictors specified in the predictor_names parameter - Array length must equal the sum of the values in the polynomial_order array plus one

interaction_term_betas:

A np.array-like lower triangular matrix with both dimensions equal to the sum of the polynomial_order array containing the betas of any interaction terms

epsilon_variance:

A scalar variance specification

Raises:

KeyError: If the column does not exists. TypeError: If the column is not numeric.

property predictor_names

property response_vector_name

update_predictor_beta(predictor_name_list, a, b)[source]

Update the predictors of the instance as beta distributed.

Args:

predictor_name_list:: A list of predictor names in the initial AnalyticsDataframe.
a:: float or array_like of floats. Alpha, positive (>0).
b:: float or array_like of floats. Beta, positive (>0).

Raises:

KeyError: If the column does not exists.

update_predictor_categorical(predictor_name=None, category_names: list | None = None, prob_vector: array | None = None)[source]

Update a predictor with categorical values.

Args:

predictor_name:: A predictor name in the initial AnalyticsDataframe.
category_names:: A vector of strings that contains names of the different category values
prob_vector:: A vector of numerics of the same length as category_names that specifies the probability (frequency) of each category value.

Raises:

KeyError: If the column does not exists. ValueError: If sum of prob_vector not equal to 1. ValueError: If length of prob_vector not equal to category_names.

update_predictor_multicollinear(target_predictor_name=None, dependent_predictors_list=None, beta: list | None = None, epsilon_variance: float | None = None)[source]

Update the predictor to be multicollinear with other predictors.

Args:

predictor_name:: A string of target predictor name in the initial AnalyticsDataframe.
dependent_predictors_list:: A list of predictor names which selected as dependents.
beta:: A list, coefficients of the linear model – first coefficient is the intercept
epsilon_variance:: A scalar variance specification.

Raises:

KeyError: If the column does not exists.

update_predictor_normal(predictor_name_list: list = None, mean: ndarray = None, covariance_matrix: ndarray = None)[source]

Update the predictors of the instance to normally distributed.

Args:

predictor_name_list:: A list of predictor names in the initial AnalyticsDataframe.
mean:: A numpy array or list, containing mean values.
covariance_matrix:: A symmetric and positive semi-definite N * N matrix, defines correlation among N variables.

Raises:

KeyError: If the column does not exists. ValueError: If mean and cov does not have the same size.

update_predictor_uniform(predictor_name=None, lower_bound=0, upper_bound=1.0)[source]

Update a predictor to uniformly distributed.

Args:

predictor_name:: String, a predictor name in AnalyticsDataframe object.
lower_bound:: float, lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
upper_bound:: float, upper boundary of the output interval. All values generated will be less than or equal to high. The default value is 1.0.

Raises:

KeyError: If the column does not exists.

update_response_poly_categorical(predictor_name: str | None = None, betas: dict | None = None)[source]

Add categorical factor into response in a polynomial manner.

Args:

predictor_name:: String, a predictor name in AnalyticsDataframe object.
betas:: A dictionary key: categorical values in the current predictor value: beta value for this categorical type/value

Raises:

KeyError: If the column does not exists. TypeError: If this is not categorical predictor.