DECADES Datasets ---------------- This folder contains three groups of datasets for graph/sparse applications: 1. small - datasets that contain fewer than 1 million nodes * Kronecker_15.el (synthetic) * Amazon.tsv * Sinkhorn.tsv * YouTube.tsv 2. big - datasets that contain at least 1 million nodes * Kronecker_21/ (synthetic) * Kronecker_25/ (synthetic) * LiveJournal/ * Orkut/ * Pokec/ * Wiki/ * Sd1_Arc/ * Twitter/ * Wikipedia/ 3. bipartite - datasets that are bipartite networks (partitions contain fewer than 1 million nodes) * Amazon.tsv * Dbpedia.tsv * Power.tsv (synthetic) * YouTube.tsv The "small" and "bipartite" folders contain a single edgelist file per dataset. The "big" folder contains four binary files for each dataset (which can be parsed using Compressed Sparse Row format): 1. num_nodes_edges.txt - contains information about the size (nodes, edges) of the dataset 2. node_array.bin - binary representation of the node pointers in the dataset 3. edge_array.bin - binary representation of the edge pointers in the dataset 4. edge_values.bin - binary representation of the edge values in the dataset We have provided a C++ script "parse_bin_files.cpp" to parse these binary files. It can be compiled and run as follows: g++ -std=c++11 -o parse_bin_files parse_bin_files.cpp ./parse_bin_files [DATASET_DIRECTORY] Permissions and Licenses ------------------------ We have obtained real-world datasets from KONECT (the Koblenz Network Collection): http://konect.uni-koblenz.de/. KONECT networks are licensed under the Creative Commons Attribution-ShareAlike 2.0 Germany License.