How to generate training data
Step 0) characterize all pairs of papers with a common author name on the 11 variables
Step 1) a pair is chosen according to a selection criterion and submitted to an oracle
Step 2) the oracle compares the pair and returns either 1 (most likely a match) or 0 (most likely a non-match)
Step 3) update the inferred (best fitting) matching function
Repeat Steps 1,2,3 until the relative likelihood of having inferred the underlying matching function is high