Manual Machine Learning Hyperparameter Tuning: How to Optimize Models by Hand

Manually optimizing machine learning parameters means refining model settings through deliberate trial and adjustment to improve results. By tuning machine learning parameters manually, you can gain clearer insight into how your model behaves and why it performs the way it does. This guide explains how to optimize machine learning models by hand.

We will cover several practical tuning approaches, including one-at-a-time tuning, a manual grid search, and random search techniques. We will also include Python code examples that illustrate learning-rate tuning ideas and model evaluation methods.

Prerequisites

  • Ability to write and execute Python scripts with confidence, including handling library installations.
  • Hands-on familiarity with scikit-learn for training and evaluating machine learning models such as SVC and GradientBoostingClassifier.
  • Clear understanding of the difference between model parameters and hyperparameters, as well as training/validation/test splits and evaluation metrics like accuracy, F1-score, and AUC.
  • A Python setup with numpy, pandas, matplotlib, and scikit-learn installed.
  • Comfort with topics such as the bias-variance trade-off, loss functions, and gradient-driven optimization.

Model Parameters vs. Hyperparameters

Model parameters are the internal weights or coefficients that a machine learning model learns from data during training. These learned values directly shape predictions (for instance, a neural network’s output depends on its trained weights).

By contrast, hyperparameters are external settings chosen by the user to steer training. They stay fixed while training runs, because they control how learning happens rather than being learned from the dataset.

Why Tune Hyperparameters Manually?

Modern machine learning libraries include automated hyperparameter tuning tools. Still, manual hyperparameter tuning can be helpful in certain scenarios. Here are some common cases:

Small Datasets or Simple Models

For small tasks or straightforward algorithms, manual tuning may be the quickest path. Automated approaches can be excessive. Even though manual tuning takes time, it can be practical for small datasets or simple models.

Resource Constraints

Automated hyperparameter searches can demand significant compute because they evaluate many combinations. Carefully testing a small set of configurations by hand can still produce a solid model when compute is limited.

Expert Intuition

Experienced practitioners often have a sense of which hyperparameter ranges tend to work, based on theory or prior work. Using that intuition to manually explore values can lead to strong outcomes faster than uninformed automated searches.

Manual Hyperparameter Tuning for SVM: A Step-by-Step Guide

Consider a binary classification task (for example, predicting whether tumors are benign or malignant). We will use the scikit-learn Breast Cancer Wisconsin dataset. We will also build a support vector machine classifier using its default settings.

Establish a Baseline Model

Begin by fitting a baseline model with default hyperparameters. This baseline provides a reference point by giving initial performance metrics that you can try to improve. Consider the following code:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score

# Load dataset and split into train/validation sets
X, y = load_breast_cancer(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train a baseline SVM model with default hyperparameters
model = SVC(kernel='rbf', probability=True, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_val)

# Evaluate baseline performance
print("Baseline accuracy:", accuracy_score(y_val, y_pred))
print("Baseline F1-score:", f1_score(y_val, y_pred))
print("Baseline AUC:", roc_auc_score(y_val, model.predict_proba(X_val)[:, 1]))

Output:

Baseline accuracy: 0.9298245614035088
Baseline F1-score: 0.9459459459459459
Baseline AUC: 0.9695767195767195

The model achieves about 92% validation accuracy with default settings, while the F1-score (~0.94) indicates a strong precision-recall balance. The AUC (~0.96) suggests excellent ranking ability. These results serve as your baseline for comparison.

Why use multiple metrics? Accuracy by itself can be deceptive, especially when classes are imbalanced. This evaluation includes F1-score to balance precision and recall, plus AUC to reflect ranking quality across classification thresholds. You can prioritize whichever metric best matches your goal (such as F1 or AUC).

Choose Hyperparameters to Tune

Although models often include many settings, not all of them matter equally. A common best practice is to start with a small set of the most impactful hyperparameters. For this SVM example, we will focus on two main hyperparameters: C and gamma.

C (Regularization parameter): Controls the balance between model complexity and training error.

  • Low C: Stronger regularization, which can increase bias but reduce variance, raising the chance of underfitting.
  • High C: Weaker regularization, allowing closer fit to training data. This can reduce bias but increase variance, raising the chance of overfitting.

Gamma (Kernel coefficient): Determines how strongly a single training example can shape the decision boundary.

  • Low gamma: Wider influence per point, producing smoother, more general decision boundaries.
  • High gamma: Influence becomes very local, which can create complex boundaries.

Tune Hyperparameters One-by-One (Manual Trial and Error)

You can modify one key hyperparameter at a time to observe the model’s response. This trial-and-error approach is often the simplest way to begin hyperparameter tuning.

Procedure:

  • Start by selecting one hyperparameter to vary (for example, C in an SVM).
  • Pick a set of candidate values based on intuition or defaults. Because hyperparameters often have non-linear effects, testing C on a logarithmic scale can be helpful (e.g., 0.01, 0.1, 1, 10, 100).
  • For each value, train the model and measure performance on the validation set.
  • Print or plot results to identify the strongest value and use it as your next reference point.

Let’s consider the following example:

for C in [0.01, 0.1, 1, 10, 100]:
    model = SVC(kernel='rbf', C=C, probability=True, random_state=42)
    model.fit(X_train, y_train)
    val_ac = accuracy_score(y_val, model.predict(X_val))
    print(f"C = {C:<5} | Validation Accuracy = {val_ac:.3f}")

Output:

C = 0.01  | Validation Accuracy = 0.842
C = 0.1   | Validation Accuracy = 0.912
C = 1     | Validation Accuracy = 0.930
C = 10    | Validation Accuracy = 0.930
C = 100   | Validation Accuracy = 0.947

  • Low C (e.g., 0.01) leads to under-regularization → underfitting (low accuracy).
  • Moderate C (around 1) often delivers a practical balance between bias and variance.
  • Very high C (e.g., 100) can introduce mild overfitting, which may increase accuracy but reduce generalization.

Next, you could hold C=1 constant and tune gamma in the same way:

for gamma in [1e-4, 1e-3, 1e-2, 0.1, 1]:
    model = SVC(kernel='rbf', C=1, gamma=gamma, probability=True, random_state=42)
    model.fit(X_train, y_train)
    val_ac = accuracy_score(y_val, model.predict(X_val))
    print(f"gamma = {gamma:<6} | Validation Accuracy = {val_ac:.3f}")

Output:

gamma = 0.0001 | Validation Accuracy = 0.930
gamma = 0.001  | Validation Accuracy = 0.895
gamma = 0.01   | Validation Accuracy = 0.640
gamma = 0.1    | Validation Accuracy = 0.632
gamma = 1      | Validation Accuracy = 0.632

γ = 1e-4: Very small γ values create simpler decision boundaries that help avoid overfitting while delivering the top accuracy of 0.930. γ ≥ 1e-3: As γ increases, the RBF kernel becomes more sensitive to individual samples and can overfit, which reduces generalization as reflected by the sharp accuracy drop to ~0.895 and below.

Key Takeaways

  • Manual one-at-a-time tuning helps you develop intuition about how hyperparameters influence results.
  • Use the validation set for selecting hyperparameters, and reserve the test set for final evaluation to reduce overfitting risk.
  • Relying on multiple metrics (such as F1 and AUC) supports stronger evaluation and can reveal overfitting and bias/variance problems earlier.
  • Once you identify promising ranges manually, you can move to more systematic methods such as manual grid search, random search, or Bayesian optimization.

Manual Grid Search (Systematic Exploration)

After early one-by-one exploration, a manual grid search helps you test hyperparameter combinations. This approach defines a grid of candidate values for each hyperparameter and evaluates every combination. Here, we sweep C over 0.1, 1, 10, 50, and set γ to 1e−4, 1e−3, 0.01, 0.1 while training an RBF-kernel SVM for each pair. We then evaluate each model on the validation set. The (C, γ) pair that yields the strongest validation accuracy is selected as best_params, while we also keep track of F1 and AUC.

from sklearn.metrics import accuracy_score, f1_score, roc_auc_score

param_grid = {
    "C":     [0.1, 1, 10, 50],
    "gamma": [1e-4, 1e-3, 0.01, 0.1]
}

best_ac = 0.0
best_f1  = 0.0
best_auc = 0.0
best_params = {}

for C in param_grid["C"]:
    for gamma in param_grid["gamma"]:
        model = SVC(kernel='rbf',
                    C=C,
                    gamma=gamma,
                    probability=True,
                    random_state=42)
        model.fit(X_train, y_train)

        # Predictions and probabilities
        y_v_pred  = model.predict(X_val)
        y_v_proba = model.predict_proba(X_val)[:, 1]

        # metrics computation
        ac = accuracy_score(y_val, y_v_pred)
        f1  = f1_score(y_val, y_v_pred)
        auc = roc_auc_score(y_val, y_v_proba)

        # You can Track best by accuracy or change to f1/auc as needed
        if ac > best_ac:
            best_ac    = ac
            best_f1     = f1
            best_auc    = auc
            best_params = {"C": C, "gamma": gamma}

        print(f"C={C:<4}  gamma={gamma:<6}  => "
              f"Accuracy={ac:.3f}  F1={f1:.3f}  AUC={auc:.3f}")

print(
    "\nBest combo:", best_params,
    f"with Accuracy={best_ac:.3f}, F1={best_f1:.3f}, AUC={best_auc:.3f}"
)

Output:

Best combo: {'C': 1, 'gamma': 0.0001} with Accuracy=0.930, F1=0.944, AUC=0.958

The grid search reaches the same accuracy as the baseline, but F1 and AUC are slightly lower. This suggests the default SVM hyperparameters are already close to optimal for F1/AUC, and the manual grid search may have been too coarse or focused on the wrong ranges.

Key points for manual grid search:

  • The grid resolution matters. A coarse grid (few values) can miss the best solution. A finer grid with many values improves the odds of finding the best pairing, but requires more training runs.
  • For larger grids, cross-validation can provide more reliable evaluation per combination, but it increases compute costs.
  • Grid search runs into the curse of dimensionality: adding hyperparameters or adding more candidate values can rapidly expand combinations. In such cases, random search becomes a useful alternative.

Manual Random Search for Hyperparameter Optimization

To begin a manual random search, define sensible ranges for each hyperparameter. Then, for multiple trials, sample random values from within those ranges. In our experiment, C values are drawn from a log-uniform distribution between 0.1 and 100, while gamma values are sampled between 1e-5 and 1e-2.

The function below lets you run random search while tracking multiple metrics. It enables you to find strong hyperparameters for Accuracy, F1-Score, and ROC AUC in a single run.

import random
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score

def random_search_svm(X_train, y_train, X_val, y_val, ntrials=10):
    """
    Run random search across C and gamma parameters for an RBF SVM model.
    Monitor optimal hyperparameters for peak Accuracy, F1-score, and ROC AUC values.
    """
    best = {
        'accuracy': {'score': 0, 'params': {}},
        'f1':       {'score': 0, 'params': {}},
        'auc':      {'score': 0, 'params': {}}
    }

    for i in range(1, ntrials + 1):
        # Log-uniform sampling
        C = 10 ** random.uniform(-1, 2)       # 0.1 to 100
        gamma = 10 ** random.uniform(-5, -2)  # 1e-5 to 1e-2

        # model training
        model = SVC(kernel='rbf', C=C, gamma=gamma,
                    probability=True, random_state=42)
        model.fit(X_train, y_train)

        # Prediction and evaluation
        y_pred = model.predict(X_val)
        y_proba = model.predict_proba(X_val)[:, 1]
        ac = accuracy_score(y_val, y_pred)
        f1 = f1_score(y_val, y_pred)
        auc = roc_auc_score(y_val, y_proba)

        # Print trial results
        print(f"Trial {i}: C={C:.4f}, gamma={gamma:.5f} | "
              f"Acc={ac:.3f}, F1={f1:.3f}, AUC={auc:.3f}")

        # For each metric, we will update the best
        if ac > best['accuracy']['score']:
            best['accuracy'].update({'score': ac, 'params': {'C': C, 'gamma': gamma}})
        if f1 > best['f1']['score']:
            best['f1'].update({'score': f1, 'params': {'C': C, 'gamma': gamma}})
        if auc > best['auc']['score']:
            best['auc'].update({'score': auc, 'params': {'C': C, 'gamma': gamma}})

    # For each metric, print summary of best hyperparameters
    print("\nBest hyperparameters by metric:")
    for metric, info in best.items():
        params = info['params']
        score = info['score']
        print(f"- {metric.capitalize()}: Score={score:.3f}, Params= C={params.get('C'):.4f}, gamma={params.get('gamma'):.5f}")

When you call random_search_svm(X_train, y_train, X_val, y_val, ntrials=20) to run a deeper search, you may see output similar to the following:

Best hyperparameters by metric:
- Accuracy: Score=0.939, Params= C=67.2419, gamma=0.00007
- F1: Score=0.951, Params= C=59.5889, gamma=0.00002
- Auc: Score=0.987, Params= C=59.5889, gamma=0.00002

Keep in mind that results can differ across runs because the trials are random. The output highlights which (C, γ) pairs deliver the best performance for each metric. This simple random search approach consistently discovers hyperparameter settings that beat the existing baselines and the coarse grid search outcomes.

Evaluate Model Performance and Select the Best Model

Our validation metrics show that the default RBF-SVM delivered a strong baseline: 0.9298 accuracy, 0.9459 F1 score, and 0.9696 AUC.

Evaluating all pairs of C values {0.1, 1, 10, 50} with γ values {1e-4, 1e-3, 0.01, 0.1} nudged accuracy slightly upward to 0.9300. However, this coarse grid search produced a lower AUC of 0.9580 and an F1 score of 0.9440, suggesting it may have missed a better kernel region.

By comparison, focused random sampling across ranges C ∈ [0.1, 100] and γ ∈ [1e-5, 1e-2] identifies three separate “best” configurations:

  • Accuracy‐optimized: C ≈ 67.24, γ ≈ 7 × 10−5 (Accuracy = 0.9390)
  • F1‐optimized: C ≈ 59.59, γ ≈ 2 × 10−5 (F1 = 0.9510)
  • AUC‐optimized: C ≈ 59.59, γ ≈ 2 × 10−5 (AUC = 0.9870)

This random-search strategy increases accuracy by nearly 0.01 and F1 by 0.005 compared to baseline, while raising AUC by almost 0.02—an especially meaningful gain in ranking quality.

Primary Objective:

  • If your main goal is overall correctness, select the Accuracy-optimized model.
  • If you need balanced precision and recall, pick the F1-optimized model.
  • If you care most about ranking performance across thresholds, choose the AUC-optimized model.

If you plan to deploy the model, retrain it on the combined training and validation data, then evaluate it on the test set.

The code example below shows how to load and split the dataset, merge training and validation partitions, refit the model using chosen optimal settings, and evaluate on the test set using accuracy, F1-score, and ROC AUC.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
import numpy as np

# 1. data loading and spliting into train+val vs. test
X, y = load_breast_cancer(return_X_y=True)
X_tem, X_test, y_tem, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 2.Spliting X_tem into training and validation sets. We want to reproduce the previous split
X_train, X_val, y_train, y_val = train_test_split(
    X_tem, y_tem, test_size=0.25, random_state=42, stratify=y_tem
)
# Note that 0.25 of 80% give us 20% validation, that matches our original 80/20 split

# 3. We merge train and validation data for our final testing
X_merged = np.vstack([X_train, X_val])
y_merged = np.hstack([y_train, y_val])

# 4. Retrain using our best hyperparameters
best_C = 59.5889      # replace with your chosen C
best_gamma = 2e-05    # replace with your chosen gamma

f_model = SVC(
    kernel='rbf',
    C=best_C,
    gamma=best_gamma,
    probability=True,
    random_state=42
)
f_model.fit(X_merged, y_merged)

# 5. Evaluate on hold-out test set
y_test_pred = f_model.predict(X_test)
y_test_proba = f_model.predict_proba(X_test)[:, 1]

test_ac = accuracy_score(y_test, y_test_pred)
test_f1 = f1_score(y_test, y_test_pred)
test_auc = roc_auc_score(y_test, y_test_proba)

print("Final model test accuracy:   ", test_ac)
print("Final model test F1-score:    ", test_f1)
print("Final model test ROC AUC:     ", test_auc)

Running this block produces hold-out test metrics, showing how the tuned SVM performs on unseen data. You can compare the final accuracy, F1, and AUC against the baseline and the grid/random search outcomes to confirm real-world improvements from the chosen configuration.

Tuning the Learning Rate

The learning rate is a key hyperparameter in neural network training. For illustration, let’s use a Gradient Boosting Classifier as the model.

from sklearn.ensemble import GradientBoostingClassifier

for lr in [0.001, 0.01, 0.1, 0.3, 0.7, 1.0]:
    model = GradientBoostingClassifier(n_estimators=50, learning_rate=lr, random_state=42)
    model.fit(X_train, y_train)
    val_acc = accuracy_score(y_val, model.predict(X_val))
    print(f"Learning rate {lr:.3f} => Validation Accuracy = {val_acc:.3f}")

Output:

Learning rate 0.001 => Validation Accuracy = 0.632
Learning rate 0.010 => Validation Accuracy = 0.939
Learning rate 0.100 => Validation Accuracy = 0.947
Learning rate 0.300 => Validation Accuracy = 0.956
Learning rate 0.700 => Validation Accuracy = 0.956
Learning rate 1.000 => Validation Accuracy = 0.965

With a learning rate of 0.001, learning progresses too slowly to improve results. Accuracy rises once the learning rate is increased to 0.1 or above. When the learning rate sits between 0.3 and 1.0, validation performance becomes strong, reaching roughly 95–96% accuracy. The best validation accuracy here appears at a learning rate of 1.0.

Developers often review learning rates on a logarithmic ladder using values such as 0.0001, 0.001, 0.01, 0.1, and 1.0. After finding the right magnitude, you can test intermediate values (such as 0.05, 0.1, 0.2, 0.3) to narrow down the best choice.

Adjusting Regularization and Other Hyperparameters

Many models expose a regularization parameter that you can tune:

  • In regression or SVM, the C value or the regularization penalty (L1/L2 strength) acts as a hyperparameter.
  • Neural networks use dropout rate and L2 weight decay as regularization hyperparameters.
  • Decision trees use max_depth or min_samples_leaf to limit complexity through regularization.

Manually tuning these parameters uses the same trial-and-error and search strategies described earlier.

Tips and Best Practices for Manual Hyperparameter Tuning

Manual tuning can feel overwhelming because there are many options and interactions. Applying the practices below helps ensure each experiment is intentional, repeatable, and easy to interpret.

Best Practice Description
Coarse-to-Fine Search Start with a broad, log-scale range (very low to very high). Find the strongest region, then “zoom in” with smaller step sizes around that area.
One Change at a Time Adjust only one hyperparameter per experiment. This makes it obvious which setting caused the performance shift.
Keep a Log Write down every hyperparameter configuration and its results (for example, in a notebook or printed output). This avoids repeated trials and helps reveal trends.
Use Validation Effectively Always evaluate on held-out data. If data is limited, use k-fold cross-validation to ensure improvements reflect generalization rather than noise fitting.
Mind Interactions After isolating single-parameter effects, test combinations (for example, learning rate + batch size, number of trees + learning rate). Hyperparameters can interact, and the best pair may differ from separate optima.
Don’t Tune Too Many at Once Concentrate on 2–3 high-impact hyperparameters and leave less important ones at default values.
Stop When Returns Diminish Define a target improvement (for example, +2% accuracy). If additional trials bring only tiny gains, stop manual tuning or switch to automated methods for finer optimization.

Practical Scenarios Where Manual Tuning Shines

Manual hyperparameter tuning can be especially valuable in several real-world situations.

Small Datasets or Quick Experiments

Small datasets support fast training runs. That makes multiple manual experiments a realistic option when working with limited data.

Debugging and Prototyping

If a model performs poorly, manually changing a single parameter can help you pinpoint the issue. For instance, if increasing max_depth does not raise validation accuracy, the bottleneck may not be model complexity but data quality.

Educational and Intuitive Understanding

Manual hyperparameter tuning forces you to understand how each configuration changes model behavior. This is useful for learning and debugging. The bias-variance tradeoff becomes easier to grasp when you directly observe how behavior shifts with more complexity or stronger constraints.

When Automated Tuning Fails

Automated searches can get stuck in suboptimal regions and may be misled by noise. Human intuition can detect patterns in outcomes and decide on new directions (for example, “all high values of parameter X perform poorly, so we will test a very low value of X”).

FAQ SECTION

Why manually tune ML parameters instead of using automation?

Using manual control builds intuition, reduces compute usage, and can resolve problems faster than black-box search methods.

What parameters should I tune first in a model?

Prioritize tuning the hyperparameters that most strongly influence performance:

  • For neural networks, the learning rate is typically the first hyperparameter to tune. After that, tune the number of layers/units, batch size, and regularization strength, including L2 and dropout.
  • For tree-based models, important settings include tree depth, number of trees (estimators), and learning rate (for boosted trees).
  • For SVM, tune C and kernel gamma when using the RBF kernel.
  • For k-Nearest Neighbors, tune the number of neighbors k.

Can manual tuning improve model accuracy significantly?

Manual tuning can help, especially if the initial hyperparameters are a poor match. Many models’ defaults are not optimal for every problem. With careful manual tuning, you can raise your model’s accuracy.

Is manual parameter tuning still relevant today?

Absolutely. Even in the AutoML era, practitioners still rely on manual sweeps to satisfy performance, latency, and regulatory requirements.

Conclusion

Data scientists and machine learning engineers should keep manual hyperparameter tuning as a core skill. Testing different settings through one-at-a-time experiments, manual grid searches, or random searches helps you understand how key parameters shape the bias-variance trade-off and overall model performance.

Using best practices such as coarse-to-fine searches, consistent logging, and strong validation makes every trial meaningful and reproducible. When running experiments on GPU Server, you have full control over your compute environment—making it well-suited for manual hyperparameter adjustment with frameworks like PyTorch or TensorFlow.

The manual approach helps you uncover issues and teach concepts while enabling informed choices when time or compute resources are limited.

Source: digitalocean.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in:

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

MySQL INSERT & CREATE TABLE Tutorial

MySQL, Tutorial
Vijona1 hour ago MySQL Tables and Data Insertion for Beginners MySQL is a widely used relational database management system (RDBMS) found in web apps, online shops, and many backend projects.…