Skip to main content
AI-PoweredSplitting ToolsFree — No sign-up

Dataset Split Calculator

The 80/10/10 split is standard, but it is wrong for small datasets — under 10,000 samples, go 70/15/15 so your validation set actually means something. For 1M+ samples, 98/1/1 works because even 1% gives you 10,000 test examples. This calculator gives exact sample counts for any ratio and dataset size, so you stop eyeballing and start getting reproducible splits.

80/10/10

Standard Split

70/15/15

Small Dataset

98/1/1

Large Dataset

1,000

Min Test Size

By SplitGenius TeamUpdated February 2026

To split a dataset for machine learning, divide total samples into train, validation, and test sets. The standard split is 80/10/10. For a 50,000-sample dataset: 40,000 train, 5,000 validation, 5,000 test. Smaller datasets should use 70/15/15 for more reliable evaluation. Enter your dataset size below.

Dataset Configuration

For reproducibility (e.g., 42)

%
%
%

Total: 100% — balanced

Preserves class distribution in each split (recommended for classification)

Dataset Split by Size — Sample Counts

Exact sample counts for common dataset sizes using the standard 80/10/10 train/validation/test split.

Total SamplesTrain (80%)Validation (10%)Test (10%)
1,000800100100
5,0004,000500500
10,0008,0001,0001,000
50,00040,0005,0005,000
100,00080,00010,00010,000
1,000,000800,000100,000100,000

How This Calculator Works

1

Enter Your Details

Fill in amounts, people, and preferences. Takes under 30 seconds.

2

Get Fair Results

See an instant breakdown with data-driven calculations and Fairness Scores.

3

Share & Settle

Copy a shareable link to discuss results with everyone involved.

Frequently Asked Questions

People Also Calculate

Explore 182+ Free Calculators

Split rent, bills, tips, trips, wedding costs, childcare, and more.

Browse All Calculators

How to Split a Dataset for Machine Learning

Every ML model needs three data splits: training data to learn patterns, validation data to tune hyperparameters, and test data to evaluate final performance. Getting the ratios wrong leads to overfitting, unreliable metrics, or wasted data.

Recommended Split Ratios by Dataset Size

Dataset SizeRecommended SplitWhy
< 1,000Use cross-validationToo small for a fixed split — k-fold gives better estimates
1,000 – 10,00070 / 15 / 15Larger val/test sets for reliable evaluation
10,000 – 100,00080 / 10 / 10Standard split — enough data for all three sets
100,000 – 1M90 / 5 / 5Even 5% gives 5K+ samples for evaluation
> 1M98 / 1 / 11% still gives 10K+ samples — maximize training data

Stratified vs Random Splitting

Random splitting shuffles all samples and assigns them to splits randomly. Works well when your data is balanced and IID (independent and identically distributed).

Stratified splitting preserves the class distribution in each split. If your dataset is 90% class A and 10% class B, each split will maintain that 90/10 ratio. Always use stratified splits for imbalanced classification problems.

Common Mistakes

  • Data leakage: Never normalize or preprocess data before splitting. Fit scalers on training data only, then apply to val/test.
  • Temporal leakage: For time-series data, split chronologically — don't shuffle. Future data must never appear in training.
  • Peeking at test data: Only evaluate on the test set once, at the very end. Use the validation set for all tuning decisions.

For splitting amounts by ratio or percentage, use our ratio calculator or percentage split calculator.