IRIScan Desk 6 Business and ChatGPT

This month taking a different tack, not to focus on any given document, but on a couple of tools you can use to help access these documents. Modern technology provides many tools that allow those less skilled in ancient languages very good access to medieval texts, especially if someone has previously transcribed them.

Many texts can be found online these days, but certainly not all, or probably even the majority. However, far more have been published in often quite limited book runs, for the last couple of centuries in increasing numbers. These can be read as books, but some us can’t manage that, and the rest quite often want to have electronic access to the text, to read on Kindles or do whatever else we want. Of those texts that have been digitized, many were OCRed a while ago when the results are frankly often atrocious.

As a result of my declining eyesight I recently purchased a document scanner, the IRIScan Desk 6 Business, their top of the range scanner, although there are other models. The main difference between this model and the Pro is the resolution of the camera, and as I’m aware I’m going to need increasing magnification as time gies I went for the better one. Scanner is a bit of a misnomer, because it is actually a camera on a mount, rather than what many of us would consider a scanner. From my point of view, the reason I chose the IRIScan was not just the images, but also for the software which can do OCR on any image producing Word or text files, or speech output which can be saved as MP3 or WAV, as well as just being played from within the scanner software. It also has a book mode which allows it to scan pages of a book and automatically flatten a curved page, although this isn’t perfect. The IRIScan’s flattening is AI based in some fashion: there is a more expensive model from another company that uses lasers to map the actual curvature of the page and correct, but which didn’t have any text-to-speech capability. The User Interface is a little dated, but what is behind that is really quite remarkable. Later, I shall show the results of a short excerpt from the History of William Marshall.

The second tool is something you may have heard hyped in the media recently. Namely ChatGPT, a text predication AI from OpenAI. I used the free version based on GPT 3.5 (there is a new premium version 4.0). There has been a huge amount hype about the glory and doom this AI heralds (based mostly on the premium version), but it quickly becomes apparent when you interact with it that there is no general intelligence there, at least with v3.5. However, once you lay those expectations aside, what you do have is still an amazing and useful piece of technology, and for the purposes of this article, a truly impressive translator. It isn’t perfect, but then again neither am I in the translations I’ve presented previously in this series. It is certainly far better than older tools we might be familiar with, such as Goggle Translate. For this article I gave it a Cely Letter to modernize, my transcription of a a 16th century Latin legal document and the Anglo-Norman text from the History of William Marshall I obtained from the scan above. Given it takes mere seconds to provide a translation, the quality is excellent and more than suitable for most uses.

The first example is a complete work flow from scan to translation. For this I chose the second ‘paragraph’ of the History of William Marshall from the first volume of a three volume set by the Anglo-Norman Text Society, which is presented with the Anglo-Norman on the left and a translation on the right. This OCR covers lines 23 to 38 on page 1. The IRIScan OCR text was perfect, except for a quirk of using spaces instead of line breaks on apparently random occasions, which is a bit weird. I’ve edited these, but that was still far easier than having to transcribe the entire text manually. So with the line breaks corrected we end up with the text:

Al tens le rei Estiemble avint,
Kui Engleterre a peine tint
E Normendie molement
Tant ku’il la perdi folement,
Kuns chevaliers proz e loials,
Sire Johan li Mareschals,
Qui tant esteit de grant emprise
E de grant ovre e de grant mise
Qu’entor lui out plente de buens;
E si n’ esteit il mie kuens
Ne baron de tresgrant richesce,
Mais tant foisona sa largesse
Ke tute gent s’en merveilloent.
Neis cil kui point ne l’amoent,
Qui en avoient grant envire,
En conveneit souvent bien dire.

Then I logged into ChatGPT (you do need an account to use it) and asked whether it could translate Anglo-Norman to English, to which it replied, “Yes, I can translate Anglo-Norman to English. Anglo-Norman is a variety of Old French that was spoken in England after the Norman Conquest in 1066, and it was the language of the ruling class and the legal and administrative system in England for several centuries. While it is similar to modern French in many ways, there are also significant differences in vocabulary, grammar, and pronunciation. “If you have a specific text or phrase that you would like me to translate, please let me know and I will do my best to assist you.”

So I gave it the text above and received the following translation:

At the time when King Stephen reigned,
When England barely held on
And Normandy languished
Until it was foolishly lost,
There were brave and loyal knights,
Sir John the Marshal,
Who was of great enterprise
And of great work and great expense,
And had around him plenty of good men;
And he was not a count
Nor a baron of very great wealth,
But his generosity was so abundant
That all people marveled at it.
Even those who did not love him,
Who envied him greatly,
Had to admit it was often said.

For comparison I present the IRIScan OCR of the ANTS’s translation of the same lines:

In the reign of King Stephen,
who ruled over England with difficulty,
and Normandy weakly
until he was foolish enough to lose it,
there was a brave and trustworthy knight
called John the Marshal;
he was so enterprising,
his affairs on such a scale,
that he surrounded himself with many worthy men.
And yet he was no earl,
no baron with fabulous wealth,
yet his generosity so increased
that all were amazed by it.
Even those who had no love for him,
who were very jealous of him,
often could not help but speak well of him.

My second trial was a Cely letter as transcribed by Henry Malden in the 19th century, the original being late 15th century English:

(Anno lxxvj)
Welbelovyd Brother I recomaund me herttely to yow ferthermore informynge yow that the xiij day of Aprell the zeere above sayd I Robard Cely have ressayvyd of Wylliam Eston mersar of London xijli ster: to pay at Andewarpe in sencyon martte the xxiiij day of June for every nobyll of vjs viijd ster: vijs xd flemeshe and I pray yow to delyver to the sayd Wylliam Eston xijli starlynge at the same ratte takynge a byll of ys honde to paye at London the sayd xijli at a day as longe hafter the day as I toke the mony wys beffore. In wettnes herof I sette my seelle at London the xiij day of Aprell per Robard Cely.

Addressed: A George Cely.

ChatGPT’s modernization:

In the year 1476, Dear Brother, I commend myself heartily to you. Furthermore, I inform you that on the thirteenth day of April of the aforementioned year, I, Robert Cely, have received from William Eston, merchant of London, 12 pounds sterling to pay at Antwerp in Saint John’s Market on the 24th day of June, for every noble at 6 shillings and 8 pence sterling, 7 shillings and 10 pence Flemish. And I request that you deliver to the aforementioned William Eston 12 pounds sterling at the same rate, taking a bill of his hand to pay at London the said 12 pounds at a day as long after the day as I took the money wise before. In witness hereof, I set my seal at London on the thirteenth day of April, by Robert Cely.

Addressed to George Cely.

Given that I’m used to translation tools being tripped up by even minor spelling errors, ChatGPT has coped almost flawlessly with the English presented here. I’m particularly impressed with the way it has picked out the prices and quantities and expanded them in full. What I found particularly intriguing was the way that the date lxxvj, which is effectively just ’76, has been expanded to 1476. This is correct, but not stated explicitly in the original text. My guess is that ChatGPT’s training data may well have included the Malden texts which would have givena dating context to these letters as a whole. It may mean that ChatGPT hasn’t come to this text completely cold, but it is an impressive

You can see my modernization at https://www.wikitree.com/wiki/Space:Cely_Papers:_Letter_SC_1/53/3:_13_April_1476:_Robert_Cely_to_George_Cely

The final document was the quit claim from Magdalene College, Oxford to Ronbert Huckvale that I wrote about a couple or so months ago:

Nov[er]int universi p[er] present~ nos Thomam Knollys president~ Collegij s[an]c[t]e marie magdalenen universitate Oxon٣ & scholarebus eiusdem collegij remisisse relaxasse & om[n]io p[ro] nobis & Successurib[us] quiet[e] clamasse Rob[er]to Huckvale de Oxon com[itatu] oxon٣ om[n]es & o[mn]imod]es Act[i]ones tam reales q[ua]m personales fect[as] querel[as] debit[as] debat[as] & demand[as] qu[e] unq[ua]m h[ab]uim[us] h[ab]em[us] seu quovismo[d]o in futur[um] h[ab]ere pot[er]im[us] rat[i]one compot~ vel rat[i]one alia quacuj[us] A p[ri]ncipio mundi usq[ue] in diem confect[i]o[n]is present[ium]/ In cui[us] rei testimo[niu]m hinc present[i] scripto n[ost]ro Ego prefat~ Thom[a]s Knollys Sigillu[m] meu[m] apposui dat[um] in sc[r]i[pt]o Collegij n[ost]re p[re]dict~ septi[m]o die Julij Anno R[egi]s Henr[ici] Octavi vicesimo tertio p[er] me thome knollys <illegible>

I presented this text, complete with all the expansions and unexpanded contractions and got the following:

We, Thomas Knollys, President of St. Mary Magdalen College, University of Oxford, and scholars of the same College, have remitted, released and forever discharged Robert Huckvale of Oxford in the county of Oxford, all and every action, both real and personal, suits, debts and demands, which we ever had, have or might have in the future, by reason of account or any other reason whatsoever, from the beginning of the world until the day of the completion of these presents. In witness whereof, in testimony of this matter, we have affixed our seal to this present writing. Given at our aforesaid College on the seventh day of July, in the twenty-third year of the reign of King Henry VIII, by me, Thomas Knollys.

This is pretty good, although the first four words have been omitted. So I said “You left out the first four words.” ChatGPT responded with “I apologize for that. Here is the complete translation: To all present, we, Thomas Knollys …” The ellipsis is my own to avoid repeating the rest of the translation.

My translation can be found at https://www.wikitree.com/wiki/Space:Quitclaim_-Thomas_Knollys_to_Robert_Huckvale-1531-_Oxfordshire_P20/2D/1

It’s worth noting that this document only uses the regnal year, so I finished by asking “What year is this document from?” and received the response:

“The document does not explicitly state the year it was written. However, it mentions that it was written during the 23rd year of the reign of King Henry VIII of England. King Henry VIII’s reign began in 1509 and ended in 1547. Therefore, the document was likely written in the year 1531 (23 years after 1509).”

I asked a follow up question, which then starts to show up the limits of ChatGPT as it whilst it answers, it misses the obvious logcal conclusion: “Can you be more precise knowing the exact date that Henry VIII’s reign started?”

The reply, “Yes, King Henry VIII’s reign began on April 21, 1509. Therefore, if the document was written during the 23rd year of his reign, it would have been written between April 21, 1531, and April 20, 1532.” ChatGPT has missed the explicit dating of 7th July which dates the document to 1531.

So, whilst ChatGPT is an extremely useful translation tool, far superior to those such as Google Translate, you do need to keep an eye on its output. It will omit or mangle things, so you need to sanity check the results, much like when using a calculator. However, it does give an excellent overview within seconds, which is far faster than I can translate Latin, I’m just not that fluent to do it in realtime. The questions over the date also demonstrate to me, that even so, it doesn’t have the general understanding of what it is translating. It’s a feeling that I get throughout my interactions, that it is an extremely clever chat simulator (with some extremely useful skills), but there’s no actual reasoning going on behind the façade. However, for our purposes in quickly understanding documents, that façade is well worth checking out.

Combined with a document scanner, you can translate printed texts pretty rapidly, which is especially useful for a lot of books which were printed in limited runs and/or don’t have an easily accessible digitized version. In the meantime I’m going to check out ChatGPT’s suggestions for OCR software that can read historic handwriting