Sorry, Sonia – we don’t serve your type here.


About 15 months ago , we put pen to paper, or more precisely, Google Publisher, to outline what the vision was for our secret platform. A key component of that vision had to deal with the technology called “Natural Language Processing”, or NLP, which is also known by the term “Natural Language Understanding”, or NLU.

So what does these technologies refer to?

Basically, it is about how computer’s understand human thought when expressed as words.

The capability of modern technology to achieve that should actually not be a subject of debate at this point in time. After all, Google, Siri and Alexa are some examples of technologies that can do precisely that.

However, the question is whether this technology could be transferred to the Rembau Times, and the good people of Rembau at large.

When we first drew up our plans, we hardly knew anything at all about this technology and it seemed that the only avenue available was to subscribe to Google technology, which would be prohibitively expensive. Natural Language technologies could very much be lumped together under the heading : “Tech Voodoo” .

However, the world these days is characterized by Open platforms and open technologies. Surprisingly, you can get your hands on the latest technologies which are developed by world – class companies in this arena, for free.

Yes – you heard it right. For Free and with no strings attached.

To put in perspective, consider the AI company called Snorkel.AI. To date, the company has raised over $50 million US and yet its flagship snorkel platform is available for free download. That is like $50 million of R&D that one can avoid and take advantage of. Or there is the case of Explosion AI, a German company which is behind the much lauded SpaCy NLP platform. The Spacy platform is touted as an “industrial strength natural language” platform and it too can be downloaded for free as well.

In fact, there are many technologies out there which operate in this field which can be downloaded for free.

In fact , Pew research conducted a survey and a majority of people believe that the machines will take over many jobs currently done by humans.

So what is prohibiting widespread adoption of this technology? Why can’t we have services that read our news and tell us what is important? Why can’t we have systems that go through tons of annual reports and tell us precisely what are the merits of an investment and its risks?

There are two drawbacks, one minor, and the other a bit more major.

Let’s first take the minor drawback.

Number one, to use these platforms you need to have considerable technology skill. You definitely need to know how to install Python, Visual Studio code and install these platforms. Some of them come with additional requirements such as downloading compilers from Microsoft and actually running a compilation of the code on your machine. It is not at all like the guided installers you may have experienced when you bought your pirated version of Microsoft Office 20 years ago. It is way more complicated.

However, that is a minor issue. There are enough tutorials out there to guide people on how to download and install these platforms. So it is not a show stopper by any means.

So we now hit the major drawback.

What is preventing the major adoption of these technologies.

Well – it seems that even the best technology out there fails when it comes to human’s ability to perceive the meaning of a word. Consider a the term “Sonia”.

“Sonia” could refer to a female, such as Sonia Chew, the Taiwanese singer. It is a reasonably well recognized female name.

But on the other hand, “SONIA” could refer to the  Sterling Overnight Index Average, which is the interbank interest rate for British pounds.

Now a human will have no problem in differentiating between these two versions of Sonia, but how do computers fare in this task?

Actually, this is part of a task called “Named Entity Recognition” , which is the process by which computers scan through a piece of text and decide which parts of the text refer to real world objects, such as places, products, companies and person names.

Now most of these modern platforms claim that they over a high level of accuracy on this task , but is it really the case?

Well, we can actually answer that question.

We used the latest version of spaCy to scan through the Annual Reports of 4 major banks, namely JP Morgan, Citibank, Morgan Stanley and Goldman Sachs. For each annual report, it is quite normal to find over 100,000 words.

Now spaCy did a fantastic job in picking out all the parts of the text that related to a natural person. For example, it accurately tagged mentions of Jamie Dimon in the JP Morgan report and even mentions of “Low Taek Jho” in the Goldman Sachs Annual Report (well, they had to pay $3 billion right.) . That part of the task is called “recall”, which is the ability of an AI algorithm to tag all the correct mentions of people’s names in the report.

From my initial scan, the success was something like 100%.

However, spaCy not only tagged Jho Low as a person, it tagged a lot of other stuff as person’s as well. It tagged “SONIA”, “Brexit” and even “Citibranded” as person’s as well. In fact, for every 1 correct person identified by Spacy, it included about 2 other incorrect taggings, so in the end, there are many false tagging of non-persons as Persons. This part of the task is aptly called precision, and the stats were terrible , something of the order of 40%.

So perhaps that is why this technology has yet to be fully adopted due to the limitations of the latest technology available in actually delivering accurate results.

So does it mean that The Rembau Times is stuck?

So does it mean that the promise of our secret platform, which relies heavily on Artificial Intelligence , cannot achieve results better than 40% accuracy, which is the current benchmark, on even the most basic task required of a computer?

 Die Meister! 

Die Besten!

Les grandes équipes!

The champions! 


For more information on what people think about Robots taking over jobs, read this article from dynamiccio over here.

Warning: A non-numeric value encountered in /home/customer/www/ on line 326