The data can be downloaded in the folder data. There are 3 JSON arrays, for the 3 categories respectively. The correspondeces per catedory can be found under the correspondences folder. Each line in the files describes a triple (id_from_web_page;id_product_catalog;match) that gives the correspondece. Correspondences are given 1 if the items are matching, and 0 if items are non-matching. The triples are delimited by semi-colon. Finally, the original pages can be found in the pages folder. The name on each file corresponds to the id_self from the json data.