As I, and others, have said ad nauseam, "calculating the number of viewers from webserver logs is like a radio station trying to measure the number of listeners from the power broadcast by the antenna."
a) That the ratings experts don't sample all viewers is certainly true. However, if you do good sampling and have a nice 'smooth' population, good statistics will save you (but see below.)
b) The ratings experts say they are sampling from the most visited sites. And, then they trot out their own numbers to prove it. It is unlikely that they get a true picture of Internet traffic. For instance, many websites don't want to be counted. Others find decentralization to be very important. Decentralization makes it almost impossible for outside sources to get a handle on audience share. However, I think in the 1 Megahit+ a day class and over they probably are sampling at the large data centers. Therefore, in these cases they may even have a better idea of traffic.
As bandwith became cheaper and more available, it became harder to count traffic in the 1Mhit/day class. Unless they put a sampling box on every machine in the world, gauging traffic to a websites is extremely difficult.
And, what about distributed and peer-to-peer services like BitRate, Bittorrent, Gnutella, Napster, and Freenet to which the methods cannot apply?
So, they are almost certainly undersampling the sites that get a 0.1 Megahits/day+... like us.
I don't think that a) holds true because the Internet user population is not smooth. Rather, it is a class of disparate groups, like on Usenet, with limited overlapping interest. Perhaps while their numbers might hold for generalist sites, such as Yahoo or Google, they don't apply to sites like ours... which is a collection of disparate interest groups, and micromanaged traffic. In fact, our sites are really more like Usenet-newsgroups/email than the web. When it comes to counting viewers, it is an unreasonable task. On the other, creating results becomes quite easy.
The Web itself is not a smooth homogeneous structure, it has blobs and tendrils and a rich structure -- almost biological in complexity. There are huge sites inside corporations and universities, that transfer petabytes of data daily, and are not even mentioned by the net ratings Gods.
Thus, you have a fragmented medium and a fragmented user base. You have little chance of estimating viewership from serverlogs. What works is microtargeting, as we well know.
As was just mentioned, all this is just not relevant. The web is a fine thing, but should not be considered in isolation. First of all, it is an interactive medium. To count the traffic going in one direction is silly. It is the back-and-forth motion that makes a website effective. This motion needs to take into consideration many other things than just hits or impressions. Other important ingrediants to the stew include cgi-forms, redistribution, accessibility, and email.
The web is more than just a broadcast medium. The cost of entering the market is near zero. You may eventually have as many websites as users. e.g. the number of porn sites is growing faster than the number of porn viewers. Heehee... take that and put it in your models.
Old broadcast models developed for TV and radio need not apply. For instance, email is not a broadcast medium (misused as a broadcast medium we call it spamming.) It can be targeted, contextual and many times more effective than just a website. Email, if you think about it, is the logical extension of the fragmentation of the web except that it came first!
In conclusion, it is recommended that rather than looking at traffic, you look at "how effective is the traffic?" That is to say, would you prefer to have 1,000,000 viewers with 0 sales, or 1 viewer that is converted into a sale? In the past year, 90% of my sales originated with a website. Of that 90%, all of these sales were closed through email. Instead of trying to count the millions of viewers, I found it much more rewarding to count the money.