Terms Explained:

  • Predictive/ Supervised Learning => (modeling) => Discriminative Model
  • descriptive/ Un-Supervised Learning => (modeling) => Generative Model

Epochs, Iteration & Batch size:

Epoch: Number of passes the (deep/machine) learning algorithm has made across the full training dataset.
Batch: Used to group data sets.
Iteration: process of running one batch through the model.

Gradient

  • A gradient measures how much the output of a function changes if you change the inputs a little bit.
  • A gradient simply measures the change in all weights with regard to the change in error.

SVM: Supervised Algorithm

In SVM, a data point is viewed as a p-dimensional vector (a list of p numbers), and we wanted to know whether we can separate such points with a (p-1)-dimensional hyperplane(maximum-margin hyperplane). This is called a linear classifier(maximum margin). It tries to fit a line (or plane or hyperplane) between the different classes that maximizes the distance from the line (or plane or hyperplane) to the points of the classes.

There are six types of kernels in SVM:

  1. Linear kernel - used when data is linearly separable.
  2. Radial basis kernel - Create a decision boundary able to do a much better job of separating two classes than the linear kernel.
  3. Polynomial kernel - When you have discrete data that has no natural notion of smoothness.
  4. Sigmoid kernel - used as an activation function for neural networks.

Bagging vs Boosting:

Bagging is an acronym for ‘Bootstrap Aggregation’ and is used to decrease the variance in the prediction model. Bagging is a parallel method that fits different, considered learners independently from each other, making it possible to train them simultaneously.

Bagging generates additional data for training from the dataset. This is achieved by random sampling with replacement from the original dataset. Sampling with replacement may repeat some observations in each new training data set. Every element in Bagging is equally probable for appearing in a new dataset.

These multi datasets are used to train multiple models in parallel. The average of all the predictions from different ensemble models is calculated. The majority vote gained from the voting mechanism is considered when classification is made. Bagging decreases the variance and tunes the prediction to an expected outcome.

  • Example of Bagging:

The Random Forest model uses Bagging, where decision tree models with higher variance are present. It makes random feature selection to grow trees. Several random trees make a Random Forest.

Boosting is a sequential ensemble method that iteratively adjusts the weight of observation as per the last classification. If an observation is incorrectly classified, it increases the weight of that observation. The term ‘Boosting’ in a layman language, refers to algorithms that convert a weak learner to a stronger one. It decreases the bias error and builds strong predictive models.

Data points mispredicted in each iteration are spotted, and their weights are increased. The Boosting algorithm allocates weights to each resulting model during training. A learner with good training data prediction results will be assigned a higher weight. When evaluating a new learner, Boosting keeps track of learner’s errors.

  • Example of Boosting:

The AdaBoost uses Boosting techniques, where a 50% less error is required to maintain the model. Here, Boosting can keep or discard a single learner. Otherwise, the iteration is repeated until achieving a better learner.