Understanding Deep Web Search Interfaces: A Survey
(Published, SIGMOD Record 2010)
Ritu Khare, Yuan An, and Il-Yeol Song, The iSchool at Drexel.

Description of Datasets

Approach

Datasets/ Resources

No. of Sites

No. of Interfaces

Avg. no. of form elements

Tot. no. form elements

Avg. no. of text-labels Tot. no. of attributes

Test Domains

CombMatch '01

 

 

100

4

 

   

 

LITE '01

Semiconductor Research Corporation, The Semiconductor Reference Site, Hoover Online Business Network, Lycos Companies Online.

52

100

4.6 (min=1, max=12)

460

   

DB Technology, Movies, Semiconductors

HSP '04

TEL-8, Invisible-net

500

150, 30

 

 

   

Automobile, Airfare, Books, Car Rental, Hotel, Jobs, Movies, Music, Real Estate

DEQUE '05

AutoTrader, ZIPFind, Amadeus, PubMed, ClassicCar, Lycos Companies Online, Powell's Books, AA Flight Search, Phuket Hotel Guide, Mobile.de, Yahoo RealEstate

11

58

 

160

   

Automobile, Airfare, Books, Car Rental, Hotel, Medical, Real Estate, Scientific Publication, Shopping

LEX '07

TEL-8, Invisible-net

146

184

 

1582

  1117

Books, Electronics, Games, Movies,Music,Toys,Watches.

LabelEx '08

FFC*, TEL-8

 

2884, 296

 

 

   

Automobile, Airfare, Books, Movies

SchemaTree '09

ICQ,
TEL-8,
LEX's datasets

 

100,
243,
134

6.57 (min=1, max=18)
5.08 (min=1, max=11)

 

3.75(min=1, max=10)1.57(min=1, max=8)  

Automobile, Airfare, Books, Car Rental, Electronics, Games, Hotel, Jobs, Movies, Music, Real Estate, Toys, Watches

HMM '09

TEL-8, Completeplanet, Beaucoup, NAR

 

500

 

 

   

Automobile, Biology, Health, Movies, References and Education.

ExQ'09

ICQ

 

100

6.5(min=2, max=14)   2.32(min=1, max=7)  

Airfare, Automobile, Books, Job, Real Estate

    * Provided by authors upon request

        Datasets used in important Deep Web projects: QI Project, WISE-iExtractor, WISE-Integrator, MetaQuerier

Acknowledgements: We sincerely thank the authors of LEX and ExQ for providing useful insights and comments.