Downloads
available resources from Sen's research

The SARD Corpus
(Spatially-strAtified Route Directions Corpus)

The SARD Corpus is the Spatially-strAtified Route Directions Corpus built for Sen Xu's thesis study on regional variations in spatial language usages. The corpus consists of 11,254 webpages in their original html format, most of the webpages (evaluated as above 93%) contains human-genearted route directions in English from the United States(10,055 documents), the United Kingdom (710 documents), and Australia (489 documents). The corpus is organized in a hierachical order: nation->postal region(state)->postal code(each document). It is made publically available as a resource for other reseachers interested in spatial language, spatial cognition and so forth. Please feel free to download it and contact Sen Xu if any question occurs


We are currently working on copyrights of the SARD Corpus for its availability to the public
Please contact Sen Xu to request the link to download the corpus.

Factsheet of the SARD Corpus (one page pdf)

Sen Xu's Master Thesis
Exploring Regional Variation in Spatial Language: A Case Study on Spatial Orientation by Using Volunteered Spatial Language Data
offers more details on how the SARD corpus is built and a case study on analyzing the SARD corpus

Events Database from RSS feeds

The Event Database is generated using a list of News RSS Feeds and TABARI and an extended event ontology based on CAMEO with spatial and temporal entity extraction. You can view a presentation on building the Events Database from RSS feeds here.

Penn State University |GeoVISTA Center | maskVISTA | Bryant Smith