analyticsdf package

analyticsdf.analyticsdataframe module

class analyticsdf.analyticsdataframe.AnalyticsDataframe(n, p, predictor_names=None, response_vector_name=None, seed=None)[source]

Bases: object

Create a AnalyticsDataframe class.

Creates a dataframe class which uses the n, p, predictor_names and response_vector_name arguments to initialize a dataframe.

Args:
n:

Number of observations.

p:

Number of predictors.

predictor_names:

List of strings (default = [X1, X2, … Xp]).

response_vector_name:

String (default = Y).

Returns:
AnalyticsDataframe class:

predictor_matrix: a Pandas Dataframe with Nan. response_vector: a Pandas Series with Nan.

generate_response_vector_linear(predictor_name_list: list = None, beta: list = None, epsilon_variance: float = None)[source]

Generates a response vector based on a linear regression generative model.

Args:
predictor_name_list:

A list of predictor names in the initial AnalyticsDataframe.

beta:

A list, coefficients of the linear model – first coefficient is the intercept

epsilon_variance:

A scalar variance specification.

Raises:

KeyError: If the column does not exists.

generate_response_vector_polynomial(predictor_name_list: list, polynomial_order: list, beta: list, interaction_term_betas: array, epsilon_variance: float)[source]

Generates a response vector based on a linear regression generative model that contains polynomial terms for one or more of the predictors and interaction terms.

Args:
predictor_name_list:

A list of predictor names in the initial AnalyticsDataframe.

polynomial_order:

A list of integers that specify the order of the polynomial for each predictor with legal values of 1 to 4.

beta_vector:
A list of the betas (coefficients of the linear model)

– First coefficient is the intercept – Next coefficients ( are the coefficients of the polynomial terms for the first predictor (as specified in the polynomial_order array) – Continuing in this manner for all the predictors specified in the predictor_names parameter - Array length must equal the sum of the values in the polynomial_order array plus one

interaction_term_betas:

A np.array-like lower triangular matrix with both dimensions equal to the sum of the polynomial_order array containing the betas of any interaction terms

epsilon_variance:

A scalar variance specification

Raises:

KeyError: If the column does not exists. TypeError: If the column is not numeric.

property predictor_names
property response_vector_name
update_predictor_beta(predictor_name_list, a, b)[source]

Update the predictors of the instance as beta distributed.

Args:
predictor_name_list:

A list of predictor names in the initial AnalyticsDataframe.

a:

float or array_like of floats. Alpha, positive (>0).

b:

float or array_like of floats. Beta, positive (>0).

Raises:

KeyError: If the column does not exists.

update_predictor_categorical(predictor_name=None, category_names: list | None = None, prob_vector: array | None = None)[source]

Update a predictor with categorical values.

Args:
predictor_name:

A predictor name in the initial AnalyticsDataframe.

category_names:

A vector of strings that contains names of the different category values

prob_vector:

A vector of numerics of the same length as category_names that specifies the probability (frequency) of each category value.

Raises:

KeyError: If the column does not exists. ValueError: If sum of prob_vector not equal to 1. ValueError: If length of prob_vector not equal to category_names.

update_predictor_multicollinear(target_predictor_name=None, dependent_predictors_list=None, beta: list | None = None, epsilon_variance: float | None = None)[source]

Update the predictor to be multicollinear with other predictors.

Args:
predictor_name:

A string of target predictor name in the initial AnalyticsDataframe.

dependent_predictors_list:

A list of predictor names which selected as dependents.

beta:

A list, coefficients of the linear model – first coefficient is the intercept

epsilon_variance:

A scalar variance specification.

Raises:

KeyError: If the column does not exists.

update_predictor_normal(predictor_name_list: list = None, mean: ndarray = None, covariance_matrix: ndarray = None)[source]

Update the predictors of the instance to normally distributed.

Args:
predictor_name_list:

A list of predictor names in the initial AnalyticsDataframe.

mean:

A numpy array or list, containing mean values.

covariance_matrix:

A symmetric and positive semi-definite N * N matrix, defines correlation among N variables.

Raises:

KeyError: If the column does not exists. ValueError: If mean and cov does not have the same size.

update_predictor_uniform(predictor_name=None, lower_bound=0, upper_bound=1.0)[source]

Update a predictor to uniformly distributed.

Args:
predictor_name:

String, a predictor name in AnalyticsDataframe object.

lower_bound:

float, lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.

upper_bound:

float, upper boundary of the output interval. All values generated will be less than or equal to high. The default value is 1.0.

Raises:

KeyError: If the column does not exists.

update_response_poly_categorical(predictor_name: str | None = None, betas: dict | None = None)[source]

Add categorical factor into response in a polynomial manner.

Args:
predictor_name:

String, a predictor name in AnalyticsDataframe object.

betas:

A dictionary key: categorical values in the current predictor value: beta value for this categorical type/value

Raises:

KeyError: If the column does not exists. TypeError: If this is not categorical predictor.