So put it as simply as I can: Test dummy dps is not mathematically significant. There is a caveat to this, and I'll get to it later. But for now, onwards.
There are big reasons (and lots of small ones) why test dummies don't matter.
1. Test Dummies Are Not An Isolated Environment
Which is to say that when a warrior comes up and hits your test dummy and applies sunder, he's ruining your test by introducing an outside dps buff. Is it a big change? No - you're a balance druid. But your treants being buffed is significant. If a mage is trying fire and puts Scorch up, then that 5% crit is big. Or the ret pally who's doing 3% crit.
But wait - this is controllable, right? We can get a true, isolated test dummy dps test. So for the sake of argument let's scratch this and move on to the next problem:
This is true for all raid buffs - all of your spells scale differently with each and every stat. Everything has a different spell power coefficient. IS doesn't scale with crit at all (MF only does with 2t9), and Wrath and Starfire also vary significantly due to Eclipse. But that's just the tip of the iceberg - the biggest problem for a moonkin on test dummies is haste. Letting alone that past the soft cap starfire is the only thing that scales decently with it, you run into the problem that it moves.
The normal soft haste cap is right about 400. But that's raid buffed and it includes the 5% spell haste from a shaman. Take away that buff and the cap shoots up to 585. Any moonkin about 400 haste is going to see inflated test dummy dps. Now, you could get a test dummy raid together - I suppose it's possible, and if anyone's done it I'd love to hear about it. But that does make it pretty useless for comparison purposes unless everyone does that.
Which is to say, the statistical sample size N (the number of tests you do) that is actually significant.
If you want a lot of math (real math with letters, not the number math most people use) then I'd suggest reading here. Yes it's wikipedia - and yes, it's only accurate on average. But that is still accurate significantly more often then test dummy dps, so I'd recommend not arguing with it.
Ignore the other problems. Let's assume that you've managed 100 tests of a decent length so they actually resemble something close to what a balance druid gets (~3 minutes or more). Let's assume you made them all equal length - that no one outside intereferred and human error was zero.
Then maybe, maybe you can take those results and come up with something within 10% of the "real" number - which is to say, the number that accurately represents your unbuffed test dummy dps and totally ignores raid scaling and all the other issues. And what you've got still isn't very useful, because all you can is that probably you do more dps then the other guy if you ever find yourself trying to solo Patchwerk at level 80.
3. You're Killing Patchwerk
The problem here is that you're never going to kill him in a raid.
There are fights in ICC that allow melee to pull a patchwerk*
*except some of them can't use certain moves at certain times because AOE is a bad idea**
**oh, and the ones where even if they can probably sometimes stand still the entire time, occasionally they have to move due to fight mechanics.
There aren't any fights where casters can - the closest thing is Deathwhisper (if your raid lets you) and even then you need to be capable of stopping dps so you actually transition correctly. Or if you see a Curse about to put your Starfire on cooldown for 15 seconds at the start of Lunar Eclipse.
No one fights Patchwerk anymore. It's just not happening - maybe in the entry raids at 85, sure, and it will be fun. But mechanics get more complicated and difficult as time goes on, and that involves movement - interrupts, all sorts of things you need to worry about. Unless your test dummy dps involves those too (and remember they have to be consistent for it to be statistically valid), well, all you're doing is killing a level 60 boss and talking about your dps - and it's just not important.
Test dummy dps isn't really important, but that's not quite the same thing as saying test dummies are useless. I use them all the time.
Test dummies are excellent for testing a lot of simple things. Want to know how Moonfire works with the glyph and Imp Moonfire? Test dummies will tell you. Want to test the new 4t10 bonus when you get it? Test dummies are good for that (although keep in mind there are bugs that only occur on test dummies, so it's not conclusive). Now that doesn't mean you can use test dummies to compare something like 4t10 to 2t9, but it does mean you can probably get solid numbers from them, and from that it's math. EJ uses test dummies for specific purposes all the time - validly - but they also make sure they control the test and they get a big N.
The real purpose of test dummies, however, is practice. When my rotation changed in 3.2 I spent a lot of time on test dummies getting used to it. I use test dummies when I want to make sure Power Auras will track Eclipse like I want it to. I use test dummies when I'm working on my UI, because I want to see as much as I would in a raid - DBM Test bars, cooldown bars going, everything tracking. That means I know it works, and more importantly I get used to looking in the right place for the information.
I also usually treat test dummies as close as I can to real boss fights - any player who rides by is a boss mechanic. Shamans are usually coldflame, Warriors are void zones, Mages are AOE explosions and Hunters I, er, ignore. But you get the idea - make it as dynamic as you can. The key to practice is to do what you actually have to do in a raid, and if you use test dummies for that then I'll salute you.