maximum number of samples 10,000?

#2
by rchereji - opened

Hi All. When I'm trying to fit a TabPFNRegressor() model on a large dataset, I get the following error:

ValueError: Number of samples 8744913 in the input data is greater than the maximum number of samples 10000 officially supported by TabPFN. Set `ignore_pretraining_limits=True` to override this error!

If I initialize the model using TabPFNRegressor(ignore_pretraining_limits=True), then I don't get the previous error, but I get the following error:

ValueError: The number of quantiles cannot be greater than the number of samples used. Got 1748982 quantiles and 10000 samples.

Is there a way to use TabPFNRegressor with datasets that are larger than 10,000 samples?

For reproducibility, here is an example that will generate these kind of errors:

from sklearn.datasets import make_regression
from tabpfn import TabPFNRegressor

# Parameters
n_samples = 1000000
n_features = 100
noise = 0.1
random_state = 42

# Generate the dataset
X, y = make_regression(n_samples=n_samples, n_features=n_features, noise=noise, random_state=random_state)

# First try
model = TabPFNRegressor()
model.fit(X, y)

# Second try
model2 = TabPFNRegressor(ignore_pretraining_limits=True)
model2.fit(X, y)
Prior Labs org

Thanks a lot for reporting! I opened an issue on github: https://github.com/PriorLabs/TabPFN/issues/169
Note that even with this issue it seems to work above 10K, up to 50K for your example in my tests (though not sure if performance might be impacted by this bug).

Leogrin changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment