Data Transformation

Working with data extracted from Google’s Project Gutenberg, such as the Privy Purse Expenses of Henry VIII 1529-1532, has highlighted differences from my usual sources taken from commercial systems. Firstly, data cleansing uses techniques designed for textual analysis such as the replacement of old English characters and symbols, and the grouping of words into stop words, names, places etc. to help with the translation into modern English which can then facilitate further classification. Next, values are in Latin numerals and also have to be ‘translated’, firstly into the equivalent Tudor currency, and then, via a user-defined parameter, into an equivalent current value which helps to bring it to life. Other values have to be searched for and manipulated, such as wages which are recorded in various ways – from pence per day to annuities. Furthermore, some knowledge of the context of the data is required, for example that some people are referred to in different ways. Anne Boleyn, for instance, is never named as such but is instead: Mistress Anne; Lady Anne; Lady Anne Rochford; Lady Marquess of Pembroke; and other variants. Calendar tables are another challenge, since Excel cannot handle dates prior to 1900, so some workarounds have to be put in place, such as using proxy years.

The effort is worth it though, as visualising the daily activities and expenses (via Power BI) of all those involved in maintaining Henry VIII’s palaces and lifestyle, as he struggles with his ‘Great Matter’, is highly rewarding.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Casacolori – The Colourful Past

Working With Historical Data Sources