To those who I think I am anti-China, this post is to rebut those allegations.
As we have decided to ignore the news cycle and focus on reading up on “Fast Neighborhood Graph Search using Cartesian Concatenation” paper for the past several days, we finally have managed to get some working level understanding
First, kudos to the researchers Jingdong Wang, Jing Wang, Gang Zeng, Rui Gan, Shipeng Li and Baining Guo. They were from Peking University and Microsoft Research Group. Their paper was excellently written.
Updated: Read our last para on technology independence
But before we dive into this post and spin everybody’s head around, lets take a moment to draw a deep breath and not get carried away with this.
Yes United is 3 points clear at the top of the table, but they will face Liverpool next. Remember supporting United is a category of self-inflicted harm, given their shaky defense.
As a side note, it goes to show that if we don’t waste our time getting worked up, we can actually achieve better things. So I downloaded this paper about a week back, and at first the following paragraph was gobbledygook to me.
Also kudos Vital Sine, who provided an excellent introduction to the Cartesian product in graphs over here. So what we did was to continually re-read parts of the paper and watched his Youtube video as well by Yusuke Matsui over here. Matsui-san is from University of Tokyo, one of the premier universities in the world.
We also spent our insomnia hours watching this video from Stanford NLP on TF-IDF encoding over here. Another very well composed video
The idea is quite interesting and let us unpack it.
Suppose you have a database with 2 columns: A Cartoons catalog and a Characters catalog. For the first Cartoon Characters catalog, you have ‘Spongebob Squarepants‘ and ‘Cow and Chicken‘. For the second Cartoon Characters catalog, you have ‘Cow’, ‘Super Cow’, ‘Chicken‘, ‘The Red Guy‘, ‘Patrick‘, ‘Spongebob‘ , ‘Mr Krabs‘.
In traditional database query, you will can find out the characters of Spongebob Squarepants cartoon by executing the query: FIND Characters WHERE Cartoons = ‘Spongebob Squarepants‘, which should return ‘Spongebob’, ‘Mr Krabs’ and ‘Patrick’.
Now what happens when the query was “Find cartoons with characters which are superheroes“. The answer should be Cow and Chicken, as SuperCow is a Super Hero but how can a system find that out?
Here is where the idea of the paper comes in. For argument sake, lets say we divide our database into 2 sets, a Spongebob set and a Cow and Chicken set. What the research group says is to generate all possible combinations in the Spongebob set x the Cow and Chicken set: so we will have not only (Spongebob Squarepants, Spongebob) , (Spongebob Squarepants, Patrick) but we will also have (Spongebob Squarepants, Cow) and (Spongebob Squarepants, Super Cow) and so on until we have (Cow and Chicken, Supercow). Remember the last 2 examples are ‘hypothetical relations’, they do not exist in our database because Cow and Super Cow are characters in Cow and Chicken.
The magic happens is that we can compute a metric called distance for each relation (Spongebob Squarepants, Super Cow) and (Cow And Chicken, Supercow). Now the idea is that (Cow And Chicken, Supercow) relation will be closer than the (Spongebob Squarepants, SuperCow) relation because Supercow is more closely related to Cow and Chicken rather than Spongebob Squarepants, which is further apart.
That is what is meant by the above.
The rest of the paper discusses how to expand each bridge vector until the optimal result is retrieved.
Of course, that is not compelling for our toy example above. But consider if we have billions of documents in our database to search against a query. Using this method, the researches managed to arrive at 90% accurate results with an average query time of under 5 ms.
Note the paper seems a bit old, it was released in Dec 2013, about 7 years ago. However I still believe the technique is considered cutting edge and will have great application. Really respect the authors on their creative and ingenious process.
Case for Technology Independence
There a couple of things we would like to mention here. Number one, all the researches who came out with this algorithm are from Asia. The second is that the Big Tech platforms are already putting in restrictions in their own country on views they disagree with. Our continent has to be prepared for the eventuality the Big Tech platforms impose their own warped Woke ideology on smaller nations in South East Asia. If tomorrow, Google says that using their products is an implicit support for LGBT movement, what will Malaysians do?
The day of us fighting each other over small politics must stop. We must not adopt the mentality of cursing others because of small disagreements. We must unite and put in plans for the region to become technologically independent so that we will not have to kow-tow to the Big Tech platforms.