Azeem Arshad

Yago Loader

18 March 2012, 23:03 | Permanent Link | Comments

Yago is a huge semantic knowledge base, containing millions of facts about millions of entities (People, Organizations, Places etc.). I have been playing around with Yago some time back. I was going through the stuff again this weekend and I came across this small loader program that I had written back then.

The Yago core database is a 3.4Gb download. It comes as a bunch of TSV (Tab Separated Values) files, one file for each predicate (Facts in knowledge bases are represented as triplets of subject, predicate, object). Each file contains tab separated ID, subject and predicate on every line. There is a converter tool also, that can load the TSV data into a database, but I wanted to try loading the files using MySQL LOAD DATA INFILE statement. LOAD DATA INFILE lets you bulk load a file into a table, thus ridding the need to run INSERT statements. My loader program simply reads a TSV file and creates a temporary file containing the rows to be loaded into the database table and then runs LOAD DATA INFILE statement.

Using the --exclude option and --log option with the same argument will exclude files in the log. This would allow us to stop the loading at any point and resume, some time later, with the TSV file that it stopped at.