Time-measurement expressions from the British National Corpus
This dataset is based on a set of sentences extracted from the British National Corpus (BNC). Each sentence includes one or more time-measurement expressions consisting of a cardinal numeral followed by a time noun, then a second noun, with or without an intervening modifier or modifiers. Semantically, the combination of the numeral plus time noun represents a measurable quantity of the second noun. Here are some examples:
· That meant she had a good eight hours’ start before anyone need even think about her absence. (BNC FNT 13)
· Students who undertake the four-year sandwich course spend the third year in industrial placement. (BNC B3C 1798)
· An optional 10-minute speed test (2105) may be taken by candidates entering for the proficiency examination, without additional fee. (BNC HBP 2000)
· It is worth reflecting what a most remarkable contribution women have made to Save The Children throughout its seventy four years history. (BNC JNG 258)
· The consultant, Dr. Nigel Cox, has been a given a one-year suspended jail sentence for attempting to murder a terminally ill patient. (BNC K21 1072)
In the dataset, the sentences are parsed to show the constituents of the phrases containing the time-measurement expressions. They are also annotated with a variety of metatextual, orthographic, morphosyntactic, length, frequency, and semantic variables. The sentences were extracted from the corpus using the Simple Query Syntax of the Lancaster Interface to the BNC (Hoffmann et al. 2008)[1].
The dataset itself is in the file called:
Bell-PorteroMunoz_timeMeasurementExpressionsBNC_figshare_2021-09-Dec.txt
The dataset consists of a table of 17591 rows and 70 columns. There is one row per corpus hit and one column per variable.
The structure of the dataset is outlined in the file called:
DataStructure_Bell-PorteroMunoz_timeMeasurementExpressionsBNC_figshare_2021-09-Dec.txt
Note that, in both these files, the data is in the form of a tab-delimited table and the characters have UTF-8 encoding.
To open the downloaded files in Microsoft Excel on either a Windows computer or a Mac:
1. Start Excel, then choose File > Open, and navigate to the file
2. In the Text Import Wizard, select ‘Delimited’ and File origin: ‘Unicode (UTF-8)’
3. Next select ‘Tab’ from the list of delimiters and the double quote mark from the ‘Text qualifier’ drop-down menu
4. Select ‘General’ as the Column data format.
If you open the files directly from a folder without following the steps above, some characters may not display correctly.
The dataset and the methodology used to create it are described in full in the following paper:
Bell, Melanie J. & Portero Muñoz, Carmen. 2022. Time-measurement constructions in English: A corpus-based exploration. In Lotte Sommerer and Evelien Keizer, eds. 2022. English Noun Phrases from a Functional-Cognitive Perspective: Current issues [Studies in Language Companion Series 221], 312–362. John Benjamins Publishing Company.
We are grateful to John Potter and Satu Vartiainen for checking the parsing of the data.
The work was partly funded by a European Science Foundation NetWords travel grant to Carmen Portero Muñoz, who conducted the work within the framework of the research group HUM693 Lingüística Cognitiva y Funcional (LINCOFU) (Autonomous Government of Andalusia).
[1] Hoffmann, Sebastian, Evert, Stefan, Smith, Nicholas, Lee, David & Berglund-Prytz, Ylva. 2008. Corpus Linguistics with BNCweb – A Practical Guide. English Corpus Linguistics, Vol. 6. Frankfurt: Peter Lang.