Research Repository

Matrix factorization for co-training algorithm to classify human rights abuses

Gokhale, Ragini and Fasli, Maria (2019) Matrix factorization for co-training algorithm to classify human rights abuses. In: 2018 IEEE International Conference on Big Data (Big Data), 2018-12-10 - 2018-12-13, Seattle, WA, USA.

fasli_big_data_03.pdf - Accepted Version

Download (828kB) | Preview


In the human rights domain, there is need to filter, efficiently classify and prioritize the types of violation endured by victims in order to provide the necessary rehabilitation and support. However, the domain is dominated by unstructured data either from victims' accounts, doctors'/professionals' reports or available on line. Manual classification still prevails in this domain which is extremely time consuming and slow. This is a problem for non-government operated charities. To this end we have explored the application of the co-training algorithm in order to improve the performance of a semi-supervised learning algorithm by incorporating large amounts of unlabeled data into the training data set. However, it remains challenging to apply co-training on the data without two independent and self sufficient views. This paper puts forth a method of randomly dividing the available features to apply matrix factorization so as to discover latent features underlying the interactions between different kinds of entities present in a single view dataset. These labeled views balance the biased information in the dataset, but still satisfy the co-training assumptions. Alongside, the views are constrained such that pairs of labeled views create weak classifiers which in turn increase the prediction accuracy when combined. In the majority of cases, any classification tries to connect a single class to each sample or object. However, in the human rights domain, a victim can be subjected to more than one type of violation or abuse. This is multi-label classification where a sample can be assigned to more than one class. This paper aims to address all these aspects by bringing together a semi supervised classification model that relies on the effectiveness of matrix collaborative filtering in order to classify stories narrated by victims into one or more types of human rights abuses. Experimental results demonstrate the efficiency of this approach when applied on real-world stories from different victims.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Published proceedings: 2018 IEEE International Conference on Big Data (Big Data)
Uncontrolled Keywords: semi-supervised learning; co-training; multi-label classification; matrix factorization; human rights violations
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health
Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor: Elements
Depositing User: Elements
Date Deposited: 20 Mar 2019 12:39
Last Modified: 23 Sep 2022 19:31

Actions (login required)

View Item View Item