Finally coming back to this, you say "we need to account for bias" - but positivity bias is endemic…

The simplicity of the challenge that I've set above is that it is not sensitive to being influences by that sort of bias. It doesn't rely…

Finally coming back to this, you say "we need to account for bias" - but positivity bias is endemic in science anywhere without pre-registration of hypotheses and experimental methods (which is everywhere). Positivity bias is where you report results when they match your hypothesis and abandon research studies where you see they go against your hypothesis because it makes you as a researcher look like a chump and your research look "failed". And it exists throughout science, and has its worst impacts on questions where the answer could well be finely balanced (e.g. if trans women do peform similarly to cis women, then researchers looking for greater performance may only publish results which show slight advantage and researchers looking for lesser peformance might avoid publishing results with a slight advantage).

The simplicity of the challenge that I've set above is that it is not sensitive to being influences by that sort of bias. It doesn't rely on good behaviour by researchers. It's really simple. Are trans women beating the best cisgender women? Yes or no. It would be extremely difficult for trans women to collude collectively to cheat this test. Any number of trans women may sandbag their own personal performance, but it would only takes one talented trans woman athlete who actually wants the win to "pass" the test, and that's if there was widespread trans collusion to pervert results (which I believe is an absurdly cynical starting point with respect to bias -- but to be clear, my point here is that even with a lot of bias, my proposed challenge above would be difficult to pervert).