GitXplorerGitXplorer
D

SSL_lib

public
5 stars
3 forks
0 issues

Commits

List of commits on branch master.
Unverified
500ed00f9d972a5be1d29fb6d4e673472f4f8d12

Added boolean types

DDimBer committed 7 years ago
Unverified
cdd88d3ca5388e234a93258eafc968dd89209f98

Restricted scopes & fixed Tuned RwR bug

DDimBer committed 7 years ago
Unverified
ea3c8f7d8a1b9caab57beee7fce29c8e788ff1ea

Added DEBUG option

DDimBer committed 7 years ago
Unverified
22e8c87dc6cd0b5a8ed3fdb2f3e66b78858ffb61

Updated documentation

DDimBer committed 7 years ago
Unverified
5e276738a078e52feeb5d04625f4c5b0fb5b429f

Added operational (predict) mode

DDimBer committed 7 years ago
Unverified
5633fc9391bc7b7607501545946b67ebfdf5899f

Added operational mode

DDimBer committed 7 years ago

README

The README file for this repository.

OVERVIEW

Programm SSL implements and runs tests for different semi-supervised learing methods on multiclass or multilabel graphs with available groundtruth labels.

Two modes available:

  • test: Takes as input a graph and labels over all nodes. Randomly sumples a number of nodes (num_seeds) and predicts the labels of teh remaining ones. Experiments are repeated for a predefined number of times (num_iters) and the mean Micro F1 and Macro F1 scores are reported.
  • predict: This is the operational mode. A graph is given and a file with a subset of nodes and its labels. The selected method is implemented and the predicted labels over all the nodes of the graph in a predefined (outfile) output file.

Methods included:

  • PPR: Personalized PageRank
  • TunedRwR: Tuned random walk with restarts ( see here )
  • AdaDIF: Adaptive Diffusions ( see here )

INPUT FILES FORMAT

SSL loads the graph in adjacency list format from a .txt file that contains edges as tab separated pairs of node indexes in the format: node1_index \tab node2_index. Node indexes should be in range [1 , 2^64 ].

For multiclass graphs, the labels are loaded from a .txt file where each line is of the format: node_index \tab label . Labels have to be integers in [-127,127].

For multilabel graphs, labels are loaded from a .txt file in compressed one-hot-matrix form (see graphs/HomoSapiens/class.txt for example).

when in test mode, all nodes must be labeled (present in the label file).

When in predict(ion) mode, any subset of nodes can be labeled.

OUTPUT FILES FORMAT

  • Multiclass: Similar to input, each line is node_index \tab predicted_label
  • Multilabel: The output for multilabel graphs is a ranking for every node. Each line follows the format node_index: \tab pred_1 pred_2 ... pred_c, where pred_i is the i-th most probable label for this node.

COMPILATION

Dependencies: blas and pthread must be installed

Command line: make clean and then make

EXECUTION

Command line: ./SSL [OPTIONS]

OPTIONS

Command line optional arguments with values:

ARGUMENT VALUES DEFAULT DESCRIPTION
--mode test
predict
test Operational mode (see Overview)
--method Tuned_RwR
AdaDIF
PPR
AdaDIF Selection of prediction method (see Overview)
--graph_file (adjacency list).txt graphs/BlogCatalog/adj.txt See Input Files Format
--label_file (label list or one-hot).txt graphs/BlogCatalog/class.txt See Input Files Format
--outfile (predicted labels).txt out/label_predictions.txt File where predictions are stored when in --mode = __predict__ (see Output Files Format)
--num_seeds [1, 2^16] 1030 Number of nodes that are labeled ( only works when --mode = __test__ )
--walk_length [1, 2^16] 10 Length of AdaDIF (and/or PPR) random walk.
--lambda_trwr >=0.0 1.0 Regularization parameter for Tuned RwR method
--lambda_addf >=0.0 5.0 Smoothness over the graph regularization parameter for AdaDIF method
--num_iters [1, 2^16] 1 Number of experiments performed ( only works when --mode = __test__ )

Default values can be changed by editing defs.h

Command line optional arguments without values:

ARGUMENT RESULT
--unconstrained switches AdaDIF to unconstrained mode
--single_thread forces single thread execution
--multiclass specifies multiclass input / output (default is multilabel)