The SARD Corpus is the Spatially-strAtified Route Directions Corpus built for Sen Xu's thesis study on regional variations in spatial language usages. The corpus consists of 11,254 webpages in their original html format, most of the webpages (evaluated as above 93%) contains human-genearted route directions in English from the United States(10,055 documents), the United Kingdom (710 documents), and Australia (489 documents). The corpus is organized in a hierachical order: nation->postal region(state)->postal code(each document). It is made publically available as a resource for other reseachers interested in spatial language, spatial cognition and so forth. Please feel free to download it and contact Sen Xu if any question occurs
We are currently working on copyrights of the SARD Corpus for its availability to the public
Please contact Sen Xu to request the link to download the corpus.
Sen Xu's Master Thesis
Exploring Regional Variation in Spatial Language: A Case Study on Spatial Orientation by Using Volunteered Spatial Language Data
offers more details on how the SARD corpus is built and a case study on analyzing the SARD corpus