back to index

Python / ML / CV Notes

Common interview-style questions across Python, ML, and CV.

~/posts/python-ml-cv-notes $ cat post.md

/ LANG EN / 中文
/ THEME / /

Python

What’s monkey patching?

Modifying a function’s behavior after it’s been declared. Useful in mock-driven tests; best avoided in production.

*args and **kwargs

* gathers extra positional arguments; ** gathers named arguments:

def hello(a, b, *args, **kwargs):
    ...

Positional arguments past a and b go into args; xxx=xxx arguments go into kwargs.

How does Python’s threading work, and is it a good idea?

Python doesn’t have true parallel multithreading. The threading module exists, but it isn’t a tool for speeding up synchronous CPU work — the Global Interpreter Lock (GIL) guarantees that only one thread executes Python bytecode at any moment.

Threads still help for IO-bound concurrency (multi-threaded downloads, for example). For CPU-bound parallelism, Python isn’t the right tool.

Doing it properly usually means:

  • Write the hot path in C and call from Python, sidestepping the GIL.
  • Switch engines — Spark or Hadoop spawn Python workers in separate processes, getting actual parallelism out of the system.

Python’s garbage collector

  • Reference counting on every reference change; objects with count 0 are freed immediately.
  • Periodic cycle detection for objects that hold each other but nothing external holds them. Python is good at the two-object case and only okay at bigger cycles — avoid where possible.
  • Generational GC: each new object lives in a generation; newer generations are collected more aggressively. It’s a heuristic.

Machine learning

Logistic Regression vs Naive Bayes

Naive Bayes has one strong premise: the conditional independence assumption — features are independent given the class. Plug features into a logistic formula, compute the posterior, compare against a threshold; that’s NB classification.

Bayes' rule + conditional independence = Naive Bayes

Logistic Regression effectively uses a linear-regression prediction to approximate the log-odds of the posterior — turning the probability expression into a polynomial.

The two methods land on different weights because NB assumes independence and LR doesn’t. NB doesn’t need gradient descent — you can just count per-feature occurrence ratios. LR uses gradient descent and picks up the couplings between features in the weights.

Different optimization targets, too:

  • LR optimizes the posterior likelihood p(y|x) — a discriminative model.
  • NB optimizes the joint likelihood p(x, y) — a generative model.

Discriminative vs generative

Discriminative: predicts unknown y from known x by modeling P(y|x). Doesn’t care about the joint distribution. Usually does well on classification and regression. Most discriminative models are inherently supervised — not easy to repurpose for unsupervised settings.

Generative: models the joint P(x, y). Can be sampled to generate observations, and via Bayes’ rule yields the conditional too. More flexible than discriminative when the dependency structure is complex.

Common discriminative models:

  • Logistic Regression
  • Linear Regression
  • Support Vector Machine
  • Boosting — combine a set of “weak learners” (classifiers performing slightly better than random) into a “strong learner”. Originates from Michael Kearns’ question.
  • Conditional Random Field — a discriminative probabilistic model, used for labeling sequences (NLP, biological sequences).
  • Artificial Neural Network (ANN) — a mathematical model loosely modeled on biological neural systems, approximating functions via large networks of artificial neurons.
  • Random Forest — an ensemble of decision trees; the output class is the mode across trees.

LR vs SVM

  • Both handle non-linear problems; both start as binary classifiers.
  • SVM maximizes a margin; LR maximizes likelihood. SVM emits a class label, not a probability.
  • Different loss functions. LR is more interpretable; SVM has constraint-style regularization built in.

CV

How do you implement image resizing in code?

Interpolation:

  • Nearest neighbor — simplest and roughest.
  • Bilinear — smoother.
  • Lanczos — sharper.

2D Lanczos weights neighbors in a kernel on both axes — essentially a weighted sum across an 8×8 descriptor.

Is OpenCV’s default channel order BGR or RGB?

BGR. Historical artifact; no deeper reason worth explaining.

What’s HOG (Histogram of Oriented Gradient)?

A feature descriptor for object detection. HOG builds a histogram of gradient orientations over small image regions. Paired with an SVM, it had real success in pedestrian detection.

The intuition: an object’s appearance and shape can be captured fairly well by the distribution of gradient (or edge) directions locally — and gradient mostly lives at edges.

Implementation:

  1. Divide the image into small connected regions (cells).
  2. For each cell, build a histogram of pixel-level gradient directions.
  3. Concatenate the cell histograms into the final descriptor.

Strengths:

  • Operating on small local cells gives invariance to geometric and illumination distortions — those distortions tend to live at larger scales.
  • With coarse spatial sampling, fine angular sampling, and strong local normalization, a pedestrian’s small limb movements don’t meaningfully shift the descriptor.

Canny edge detection

  1. Denoise. Gradient operators amplify edges and also noise, so smooth first (typically Gaussian blur).
  2. Compute gradients to get candidate edges. Strong gradient doesn’t guarantee an edge, but every real edge has a strong gradient.
  3. Non-maximum suppression. Along each gradient direction, keep only the local maximum; thin out fat edges to thin edges.
  4. Double thresholding. Pick low and high. Above high is a strong edge; below low is dropped; in between is a weak edge — kept if it connects to a strong edge, dropped otherwise.
back to index