GitXplorerGitXplorer
D

SSL_lib

public
5 stars
3 forks
0 issues

Commits

List of commits on branch master.
Unverified
8ca9ca6f595a5da718850f3dcf3607c9ebf51c92

updated defaults

DDimBer committed 6 years ago
Unverified
96726553e387ad3b94827967538d16f343d89548

changed defs

DDimBer committed 6 years ago
Unverified
4ed70318061d2289bb744e721f484a7ef05e664b

Added license

DDimBer committed 7 years ago
Unverified
5fc69e3cfa43f02ae731a3c0ddc86022c7a0912f

Added customizable integer types

DDimBer committed 7 years ago
Unverified
bcfffea584e5310acd791d73c6776ae5079afe49

Added customizable integer types

DDimBer committed 7 years ago
Unverified
6d50ebda60b75e4dbf9c4ae1c7e36ab46c34d066

Added customizable integer types

DDimBer committed 7 years ago

README

The README file for this repository.

OVERVIEW

Programm SSL implements and runs tests for different semi-supervised learing methods on multiclass or multilabel graphs with available groundtruth labels.

Two modes available:

  • test: Takes as input a graph and labels over all nodes. Randomly sumples a number of nodes (num_seeds) and predicts the labels of teh remaining ones. Experiments are repeated for a predefined number of times (num_iters) and the mean Micro F1 and Macro F1 scores are reported.
  • predict: This is the operational mode. A graph is given and a file with a subset of nodes and its labels. The selected method is implemented and the predicted labels over all the nodes of the graph in a predefined (outfile) output file.

Methods included:

  • PPR: Personalized PageRank
  • TunedRwR: Tuned random walk with restarts ( see here )
  • AdaDIF: Adaptive Diffusions ( see here )

INPUT FILES FORMAT

SSL loads the graph in adjacency list format from a .txt file that contains edges as tab separated pairs of node indexes in the format: node1_index \tab node2_index. Node indexes should be in range [1 , 2^64 ].

For multiclass graphs, the labels are loaded from a .txt file where each line is of the format: node_index \tab label . Labels have to be integers in [-127,127].

For multilabel graphs, labels are loaded from a .txt file in compressed one-hot-matrix form (see graphs/HomoSapiens/class.txt for example).

when in test mode, all nodes must be labeled (present in the label file).

When in predict(ion) mode, any subset of nodes can be labeled.

OUTPUT FILES FORMAT

  • Multiclass: Similar to input, each line is node_index \tab predicted_label
  • Multilabel: The output for multilabel graphs is a ranking for every node. Each line follows the format node_index: \tab pred_1 pred_2 ... pred_c, where pred_i is the i-th most probable label for this node.

COMPILATION

Dependencies: blas and pthread must be installed

Command line: make clean and then make

EXECUTION

Command line: ./SSL [OPTIONS]

OPTIONS

Command line optional arguments with values:

ARGUMENT VALUES DEFAULT DESCRIPTION
--mode test
predict
test Operational mode (see Overview)
--method Tuned_RwR
AdaDIF
PPR
AdaDIF Selection of prediction method (see Overview)
--graph_file (adjacency list).txt graphs/BlogCatalog/adj.txt See Input Files Format
--label_file (label list or one-hot).txt graphs/BlogCatalog/class.txt See Input Files Format
--outfile (predicted labels).txt out/label_predictions.txt File where predictions are stored when in --mode = __predict__ (see Output Files Format)
--num_seeds [1, 2^16] 1030 Number of nodes that are labeled ( only works when --mode = __test__ )
--walk_length [1, 2^16] 10 Length of AdaDIF (and/or PPR) random walk.
--lambda_trwr >=0.0 1.0 Regularization parameter for Tuned RwR method
--lambda_addf >=0.0 5.0 Smoothness over the graph regularization parameter for AdaDIF method
--num_iters [1, 2^16] 1 Number of experiments performed ( only works when --mode = __test__ )

Default values can be changed by editing defs.h

Command line optional arguments without values:

ARGUMENT RESULT
--unconstrained switches AdaDIF to unconstrained mode
--single_thread forces single thread execution
--multiclass specifies multiclass input / output (default is multilabel)