October 20, 2003

Google pitfalls

It's strange, but I've come across two rather fundamental (yet unrelated) problems with Google searches in the last half an hour.

First problem. if you use Linux or some other Open Source projects, you've probably had problems at something you want to do but isn't well explained in the documentation. So you go and look up the problem on Google, and get 350,000 results. Of these, most of the pages are from back in 2000 when everyone was coming across the issue and generating a lot of traffic on forums and mailing lists talking about how to fix the problem, what has to be implemented to get it to work, etc. This traffic is all very relevant to your search terms, but not very useful if all you want to know is how to do whatever it is. Usually, buried in the 32nd page of results is a page from a couple of months ago that answers your question in a concise way but, because of the way it is written, is not high up in the ranking.

This, in my eyes, is a major problem. Google doesn't seem to give much importance to how new a page is, and therefore the usefulness of the results is skewed in some cases. Ironically, the problem is worse when searching in Google Groups, but since it allows sorting of results by date, it can be overcome much more easily by just displaying the results in reverse chronological order.

Second problem. This isn't really a problem for me, but I was thinking about it as problem number one happened, and when I tried it my suspicions were confirmed.

Imagine yourself as a complete newbie. You've been told that the Internet is good for many things, and that everything you need can be found by searching Google. So far, so good, and many of us might actually give that kind of advice. Now, you've heard that the Internet can be used for sending email, so you search Google for "how to send email", or maybe just "sending email".

If you've clicked on the links above, you will have seen that there is absolutely nothing in that result list that might help you on your quest. To find some helpful results you'd need to search for something like "free email services" or "how to get an email account", search terms that betray decent knowledge about the subject area in the first place.

I think it's obvious from these examples that in order to find anything on Google, you need to have a fairly clear idea of what you're looking for, which defeats the point when you're looking something up because you have no idea what it's about. While not a bug per se, I think this is an unfortunate side-effect of the searching methods used.

These two issues can be serious hurdles standing in between you and the data you need. Basically, in order to narrow your search, you need to know more about what you're searching for. And you can't find out more until you've found what you're looking for. It's a vicious circle.

It just goes to show that Google is not the ultimate knowledge searching tool. And while this is unfortunate for you and me because sometimes we might not be able to find the data we need, it means that there is still room for improvement and maybe (gasp!) competition.

And given Google's recent spate of bad press, this may well be a Good Thing(tm).

Posted by Dave at 08:45 PM | Comments (0)

October 11, 2003

For the discerning pre-teen girl

Have you heard about BarbieOS?

This year, Mattel is upping the ante by making the B-Book into a full-fledged desktop replacement targeted specifically at toddler through preteen girls who are currently Windows users but may be seeking alternatives, possibly due to increasing licensing fees or out of a desire to break free of vendor lock-i

Brilliant stuff.

Posted by Dave at 03:01 PM | Comments (0)

Uni-what??

Joel Spolsky talks about Unicode and Character Sets. Essential reading for non-experts such as myself.

Actually I think the problem that programmers have with character sets are that 1. they use tools that don't understand them and therefore never learn about them (Joel mentions PHP), and 2. tools that do understand character sets but that don't make charset-related issues obvious. An example of this is Apache's Xerces. Anyone who has ever spent serious time researching a weird XML parse error only to find out that the XML document they were using had the wrong charset in the header knows what I mean.

Anyway, read the article, it's worth it.

Posted by Dave at 02:34 AM | Comments (0)

Manipulation

I try not to talk about politics because this is not that kind of blog, but sometimes it's just too much. You may have heard about the Spanish diplomat murdered in Iraq recently.

It's a serious story, but I get annoyed at the way different media portray it. Today in the TV news and newspapers, TVE (Televisión Española, Spanish National Television) called the guy an attaché to the spanish embassy. TeleCinco (independent spanish TV) referred to him as a military something-or-other, and El Pais, a major newspaper, called him a spy.

I won't get into my own opinion, but the language used by each medium makes their bias quite clear. I found the government-backed TVE message to be the most disturbing, which maybe makes my bias quite clear. Either way, it annoys me.

Like Rage Against the Machine says, There's nothing proper 'bout your propaganda...

Posted by Dave at 02:09 AM | Comments (0)

October 10, 2003

The Crypto Software Wars

I received a GPG-signed piece of email yesterday. This is no big deal, except that I didn't have anything set up on my machine to cope with it, so I took it as the opportunity to set up some crypto software on my machine.

I do this every so often, you see. I move to a new desktop machine, reinstall or upgrade the operating system and suddenly I have a blank page to configure the desktop experience. Over the years, I've become used to setting up the basics automatically - development tools, browser/email/messengers/etc, office apps and other necessities, but there are always things to do which aren't automatic because they are not used as often, and crypto software is one of those things.

So every year or so I get to investigate anew, see all the available software and decide on what to install and how to make it work. As usual, every time I set these things up it's in a different environment - now I'm using Linux (RedHat 9, to be precise), and I recently switched to Mozilla Thunderbird for mail.

Thankfully, even Linux projects are quite professional these days, so installation was painless, everything from RPMs or, for Firebird, from .xpis. This is what I installed:

Installing in that order should work fine. Run kgpg from the command line to create or import your keypairs, and you're done. Much simpler than when I first started experimenting with PGP back in '96 or so, back when using crypto software was more like a war (hence the title of this post).

This was all done from my work machine, so when I get home I'll replicate the process and then post my new collection of keys to gather dust for another year or so. :)

Posted by Dave at 02:07 PM | Comments (0)

October 03, 2003

Dear Sir...

Nigerian SCO Connection at ArsTechnica.

(via Jon Udell)

Posted by Dave at 03:24 PM | Comments (0)

October 02, 2003

Google Fraud

Russ has posted an entry about Google AdSense, possible fraud and abusive terms&conditions. It's good reading, and since I can't add anything to it I'm going to go off on a tangent from there.

The idea that caught my eye is in the update to the article - the fraud-monitoring software. I'm thinking about how can traffic that isn't fake can seem like bad traffic to a monitoring algorithm?

My first thought is inspired by one of my minor annoyances in recent times - HTTP proxies.

It is conceivable that a certain number of people behind a proxy find an advertisement on a site interesting and click on it, all in a short length of time. If the algorithm isn't proxy-savvy (i.e. knows how to interpret X-Forwarded-For headers) or the proxy isn't compliant (not all of them are), then the monitoring software could be fooled by honest traffic.

This is not a far-fetched idea, most big ISPs use proxies, which they automatically configure from their install CDs. And some, in particular Telefonica here in Spain, give you no choice by installing a transparent proxy, which is the root of all evil. :) If the AdSense monitoring software is not proxy aware, then it would throw out sites like ADSLNet or BandaAncha because anyone in Spain using ADSL will appear to come from the proxy IP (currently 80.58.4.42, if you really want to know), even if the clicks were from many different people.

And of course, this causes other problems for me. I don't know how other people in Spain keep track of how many individual users use their site, but I can't do it since I use shared hosting and can't change the Apache config to log real IPs instead of proxy IPs. I get lots of hits in my logs from the proxy, but I am also behind that proxy and I use my domain for quite a few other things, so most of those hits are probably mine. :) Anyway, if you're in Spain and using ADSL, leave me a comment so I know you're there!

Posted by Dave at 12:33 PM | Comments (0)