This folder contains three groups of datasets for graph/sparse applications:
small - datasets that contain fewer than 1 million nodes
big - datasets that contain at least 1 million nodes
bipartite - datasets that are bipartite networks (partitions contain fewer than 1 million nodes)
The "small" and "bipartite" folders contain a single edgelist file per dataset. The "big" folder contains four binary files for each dataset (which can be parsed using Compressed Sparse Row format):
num_nodes_edges.txt
- contains information about the size (nodes, edges) of the dataset
node_array.bin
- binary representation of the node pointers in the dataset
edge_array.bin
- binary representation of the edge pointers in the dataset
edge_values.bin
- binary representation of the edge values in the dataset
We have provided a C++ script "parse_bin_files.cpp" to parse these binary files. It can be compiled and run as follows:
g++ -std=c++11 -o parse_bin_files parse_bin_files.cpp
./parse_bin_files [DATASET_DIRECTORY]
We have obtained real-world datasets from KONECT (the Koblenz Network Collection): http://konect.uni-koblenz.de/ and SNAP (Stanford Network Analysis Project): http://snap.stanford.edu/index.html. KONECT networks are licensed under the Creative Commons Attribution-ShareAlike 2.0 Germany License.