GitXplorerGitXplorer
D

SSL_lib

public
5 stars
3 forks
0 issues

Commits

List of commits on branch master.
Unverified
d72939071f48d95edd4f0bd412fd8c8151100b2e

Added some documentation

DDimBer committed 7 years ago
Unverified
e718a86b66264cb6508d9ac6a5f267aae4c6f777

Added some documentation

DDimBer committed 7 years ago
Unverified
7764d81e1e1964c78d9e92ce5eacc9ad91cb0b2d

Added single thread option

DDimBer committed 7 years ago
Unverified
03e56b76935544e5408c088dcc7773c52725eb73

Added BlogCatalog

DDimBer committed 7 years ago
Unverified
63eb49274836866c29d97b2fa65f652e23850b0d

Fixed issue with Macro f1

DDimBer committed 7 years ago
Unverified
1c28a49059677a9f7643d415e274359657f372f5

Added reference papers

DDimBer committed 7 years ago

README

The README file for this repository.

OVERVIEW

Programm SSL implements and runs tests for different semi-supervised learing methods on multiclass or multilabel graphs with available groundtruth labels.

Two modes available:

  • test: Takes as input a graph and labels over all nodes. Randomly sumples a number of nodes (num_seeds) and predicts the labels of teh remaining ones. Experiments are repeated for a predefined number of times (num_iters) and the mean Micro F1 and Macro F1 scores are reported.
  • predict: This is the operational mode. A graph is given and a file with a subset of nodes and its labels. The selected method is implemented and the predicted labels over all the nodes of the graph in a predefined (outfile) output file.

Methods included:

  • PPR: Personalized PageRank
  • TunedRwR: Tuned random walk with restarts ( see here )
  • AdaDIF: Adaptive Diffusions ( see here )

INPUT FILES FORMAT

SSL loads the graph in adjacency list format from a .txt file that contains edges as tab separated pairs of node indexes in the format: node1_index \tab node2_index. Node indexes should be in range [1 , 2^64 ].

For multiclass graphs, the labels are loaded from a .txt file where each line is of the format: node_index \tab label . Labels have to be integers in [-127,127].

For multilabel graphs, labels are loaded from a .txt file in compressed one-hot-matrix form (see graphs/HomoSapiens/class.txt for example).

when in test mode, all nodes must be labeled (present in the label file).

When in predict(ion) mode, any subset of nodes can be labeled.

OUTPUT FILES FORMAT

  • Multiclass: Similar to input, each line is node_index \tab predicted_label
  • Multilabel: The output for multilabel graphs is a ranking for every node. Each line follows the format node_index: \tab pred_1 pred_2 ... pred_c, where pred_i is the i-th most probable label for this node.

COMPILATION

Dependencies: blas and pthread must be installed

Command line: make clean and then make

EXECUTION

Command line: ./SSL [OPTIONS]

OPTIONS

Command line optional arguments with values:

ARGUMENT VALUES DEFAULT DESCRIPTION
--mode test
predict
test Operational mode (see Overview)
--method Tuned_RwR
AdaDIF
PPR
AdaDIF Selection of prediction method (see Overview)
--graph_file (adjacency list).txt graphs/BlogCatalog/adj.txt See Input Files Format
--label_file (label list or one-hot).txt graphs/BlogCatalog/class.txt See Input Files Format
--outfile (predicted labels).txt out/label_predictions.txt File where predictions are stored when in --mode = __predict__ (see Output Files Format)
--num_seeds [1, 2^16] 1030 Number of nodes that are labeled ( only works when --mode = __test__ )
--walk_length [1, 2^16] 10 Length of AdaDIF (and/or PPR) random walk.
--lambda_trwr >=0.0 1.0 Regularization parameter for Tuned RwR method
--lambda_addf >=0.0 5.0 Smoothness over the graph regularization parameter for AdaDIF method
--num_iters [1, 2^16] 1 Number of experiments performed ( only works when --mode = __test__ )

Default values can be changed by editing defs.h

Command line optional arguments without values:

ARGUMENT RESULT
--unconstrained switches AdaDIF to unconstrained mode
--single_thread forces single thread execution
--multiclass specifies multiclass input / output (default is multilabel)