Yeah, I've studied Shulula's data many times. When the testing first came out, I looked over it a lot and I replicated his tests in my own fashion on the same monster. I didn't test for the same thing, I simply tested each level of Treasure Hunter assuming independant drpos, with sample sizes varrying from 200 to 300 for each level. (I wish I still had this data floating around somewhere... I might...)
I did not record how many "No drop" outcomes I ran into; however, the major flaw in coming to any based on Shulala's data conclusion is the sample size. It's small, as you admitted, and unfortunately far too small to draw any conclusions even with a difference of 24 drops.
I broke down her data into the individual drops, and came up with this:
Code:
TH Drop 1 Drop 2 Drop 3 Drop 4 Ttl Kills
0 (0%) 0 (6.6%) 7 (14.2%) 15 (18.9%) 20 42 106
1 (0.99%) 1 (5.9%) 6 (13.9%) 14 (37.6%) 38 59 101
2 (3%) 3 (8%) 8 (13%) 13 (55%) 55 79 100
3 (7%) 7 (9%) 9 (31%) 31 (41%) 41 88 100
4 (2%) 2 (15%) 15 (26%) 26 (52%) 52 95 100
Hopefully that formats properly. Looking at this, it seems quite clear that the wildly varying results between TH2, TH2+1, and TH2+2 are caused by the extremely small sample. It's no surprise that TH2 and TH+2 ran into the same amount of no drop outcomes.
Don't forget that while 88 to 95 is an 8% increase, it's also being stacked four times. In a perfect world, if you have a monster with 4 different drops and increase the drop rate on each one by exactly 1%, your total increase in drops is going to be more than 1%. If we use TH2 -> TH2+1 as an example, it would go from 79 to 83 drops (adding 1 drop to each item), which is a 5% increase. If we use TH2+1 -> TH2+2 as an example, it would go from 88 to 92 drops, which is a 4.5% increase.
With this in mind, an 8% increase is not at all out of the realm of possibility (in fact, it's very closely in it).
I feel like I've rambled on, but my main point is that Shulala's testing is far too small in terms of sample size and inconclusive to even consider trying to draw a hypothesis from.
One thing a lot of people don't realize is that if you're testing for a difference of 1%, it doesn't matter whether you use an item that drops at a 1% base rate or an item that drops at a 50% base rate. You'll need the same sample size to draw an accurate conclusion.