Thursday, April 20, 2006 4:30 PM PT Posted by Erika Ingvald
Many of us depend on comparison shopping on the web for a diversity of things, flight tickets and computers included, and there are a number of agents available to help us out. But often it takes a lot of time just to manipulate these services to actually include products that you know are out there. The databases involved aren't considering your queries as any human travel agent would. Web agents aren't involved in evaluations or intelligent guesses.
This is how it works; users are provided with pricing information for thousands of products. Data is reconciled from dozens of online retailers who may represent products in slightly different ways. Due to those slight differences, until now, it has been tough for comparison-shopping software to properly figure out whether it is describing the same product or different ones.
Now, sponsored by the Internet giant
Yahoo, Stanford Professor Jennifer Widom and her colleagues have created a new kind of database,
Trio, which handles that. Her solution is to assign a "confidence value" (a probability between 0 and 1) to each retailer's data records, and then combine multiple records that are likely to represent the same product. That's as close as you get in letting a database 'feel' similarities between data.

"It really turned me on when I realized this could improve Internet comparison shopping. It's nice to have automated a way to generate something likely to be the truth, which gives a reflection of reality," Widom says.
According to Jennifer Widom, Trio is not only able of improving comparison shopping. It will also be of help to nail criminals and to improve the work of fellow researchers.
Jennifer Widom got the idea when she heard about the yearly
Christmas Bird Count [when ornithologists and amateur birders all over the country make a joint effort mapping birds]. She assumed that the collected data from such an effort must vary, depending on the source and wanted to build a database that could keep track of the source of each and every observation, and also the records of the uncertainties of these observations.
Another possible application would be to feed a database with all the weighed answers from witnesses to a criminal act. If a witness is discredited late in an investigation, his confidence values can be reduced through the entire database. Then Widom's database can recalculate all the related confidence values that the discredited witness testimony has influenced - and let the investigator shift her suspicions towards someone else.
What will happen to the technique in the future remains to be seen. Being an open source it's there for anyone who has a set of data to test on it. Because for the moment there is no known relevant data around. Test data has to be generated artificially.
"I'm surprised, but most researchers I've interviewed in order to get hold of a proper set of data have thrown any such information away, just because there was no database around to handle such information. I hope this will change now that we're around", Jennifer Widom says.