How robust are cross-country comparisons of PISA scores to the scaling model used?