Split Test Group Configuration Tool
Date range for pages is website max processed date (or today if max processed date not set) - 90 days.
The split test config tool uses a Greedy Balancing approach, based on principles from the Subset Sum and Partition Problem algorithm families. URLs are first sorted by click count in descending order to prioritize the most impactful pages. Each URL is then assigned to the Control or Test group by comparing the total clicks accumulated in each group so far, always adding to the group with the lower total. This method ensures a roughly even distribution of total clicks between groups while respecting the maximum group size constraint. If both groups are full, remaining URLs are excluded. The algorithm is heuristic and greedy — it optimizes group balance in real-time without global rebalancing — making it highly efficient for large datasets while providing a fair and balanced split.
This allocation is also based on the potential removal decided by the following:
Anomaly:
- When a URL has an extremely high number of clicks
- We calculate the mean and standard deviation of clicks for all pages and mark pages as anomalies if their click count deviates from the mean by more than 2 standard deviations.
LowClicks:
- The page has had 0 clicks for more than half of the 90 days, so it is excluded from the results
NoGscData:
- We did not find any gsc data for this page
GroupsFull:
- Both groups are at maximum capacity in terms of URL count, so this page is excluded
OR
- If one of the groups is at maximum capacity, and the sum of the clicks in the other group is greater than the full group, we exclude the page to maintain an event split
Balancing approach
The split test config tool uses a Greedy Balancing approach, based on principles from the Subset Sum and Partition Problem algorithm families. URLs are first sorted by click count in descending order to prioritize the most impactful pages. Each URL is then assigned to the Control or Test group by comparing the total clicks accumulated in each group so far, always adding to the group with the lower total. This method ensures a roughly even distribution of total clicks between groups while respecting the maximum group size constraint. If both groups are full, remaining URLs are excluded. The algorithm is heuristic and greedy — it optimizes group balance in real-time without global rebalancing — making it highly efficient for large datasets while providing a fair and balanced split.
This allocation is also based on the potential removal decided by the following:
Anomaly:
- When a URL has an extremely high number of clicks
- We calculate the mean and standard deviation of clicks for all pages and mark pages as anomalies if their click count deviates from the mean by more than 2 standard deviations.
LowClicks:
- The page has had 0 clicks for more than half of the 90 days, so it is excluded from the results
NoGscData:
- We did not find any gsc data for this page
GroupsFull:
- Both groups are at maximum capacity in terms of URL count, so this page is excluded
OR
- If one of the groups is at maximum capacity, and the sum of the clicks in the other group is greater than the full group, we exclude the page to maintain an event split
Updated on: 28/04/2025
Thank you!