datasets

GATNE dataset

class cogdl.datasets.gatne.AmazonDataset(data_path='data')[source]

Bases: cogdl.datasets.gatne.GatneDataset

class cogdl.datasets.gatne.GatneDataset(root, name)[source]

Bases: cogdl.data.dataset.Dataset

The network datasets “Amazon”, “Twitter” and “YouTube” from the “Representation Learning for Attributed Multiplex Heterogeneous Network” paper.

Args:

root (string): Root directory where the dataset should be saved. name (string): The name of the dataset ("Amazon",

"Twitter", "YouTube").
download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

url = 'https://github.com/THUDM/GATNE/raw/master/data'
class cogdl.datasets.gatne.TwitterDataset(data_path='data')[source]

Bases: cogdl.datasets.gatne.GatneDataset

class cogdl.datasets.gatne.YouTubeDataset(data_path='data')[source]

Bases: cogdl.datasets.gatne.GatneDataset

cogdl.datasets.gatne.read_gatne_data(folder)[source]

GCC dataset

class cogdl.datasets.gcc_data.Edgelist(root, name)[source]

Bases: cogdl.data.dataset.Dataset

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

url = 'https://github.com/cenyk1230/gcc-data/raw/master'
class cogdl.datasets.gcc_data.GCCDataset(root, name)[source]

Bases: cogdl.data.dataset.Dataset

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

preprocess(root, name)[source]
processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

url = 'https://github.com/cenyk1230/gcc-data/raw/master'
class cogdl.datasets.gcc_data.KDD_ICDM_GCCDataset(data_path='data')[source]

Bases: cogdl.datasets.gcc_data.GCCDataset

class cogdl.datasets.gcc_data.SIGIR_CIKM_GCCDataset(data_path='data')[source]

Bases: cogdl.datasets.gcc_data.GCCDataset

class cogdl.datasets.gcc_data.SIGMOD_ICDE_GCCDataset(data_path='data')[source]

Bases: cogdl.datasets.gcc_data.GCCDataset

class cogdl.datasets.gcc_data.USAAirportDataset(data_path='data')[source]

Bases: cogdl.datasets.gcc_data.Edgelist

GTN dataset

class cogdl.datasets.gtn_data.ACM_GTNDataset(data_path='data')[source]

Bases: cogdl.datasets.gtn_data.GTNDataset

class cogdl.datasets.gtn_data.DBLP_GTNDataset(data_path='data')[source]

Bases: cogdl.datasets.gtn_data.GTNDataset

class cogdl.datasets.gtn_data.GTNDataset(root, name)[source]

Bases: cogdl.data.dataset.Dataset

The network datasets “ACM”, “DBLP” and “IMDB” from the “Graph Transformer Networks” paper.

Args:

root (string): Root directory where the dataset should be saved. name (string): The name of the dataset ("gtn-acm",

"gtn-dblp", "gtn-imdb").
apply_to_device(device)[source]
download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

read_gtn_data(folder)[source]
class cogdl.datasets.gtn_data.IMDB_GTNDataset(data_path='data')[source]

Bases: cogdl.datasets.gtn_data.GTNDataset

HAN dataset

class cogdl.datasets.han_data.ACM_HANDataset(data_path='data')[source]

Bases: cogdl.datasets.han_data.HANDataset

class cogdl.datasets.han_data.DBLP_HANDataset(data_path='data')[source]

Bases: cogdl.datasets.han_data.HANDataset

class cogdl.datasets.han_data.HANDataset(root, name)[source]

Bases: cogdl.data.dataset.Dataset

The network datasets “ACM”, “DBLP” and “IMDB” from the “Heterogeneous Graph Attention Network” paper.

Args:

root (string): Root directory where the dataset should be saved. name (string): The name of the dataset ("han-acm",

"han-dblp", "han-imdb").
apply_to_device(device)[source]
download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

read_gtn_data(folder)[source]
class cogdl.datasets.han_data.IMDB_HANDataset(data_path='data')[source]

Bases: cogdl.datasets.han_data.HANDataset

cogdl.datasets.han_data.sample_mask(idx, length)[source]

Create mask.

KG dataset

class cogdl.datasets.kg_data.FB13Datset(data_path='data')[source]

Bases: cogdl.datasets.kg_data.KnowledgeGraphDataset

class cogdl.datasets.kg_data.FB13SDatset(data_path='data')[source]

Bases: cogdl.datasets.kg_data.KnowledgeGraphDataset

class cogdl.datasets.kg_data.FB15k237Datset(data_path='data')[source]

Bases: cogdl.datasets.kg_data.KnowledgeGraphDataset

class cogdl.datasets.kg_data.FB15kDatset(data_path='data')[source]

Bases: cogdl.datasets.kg_data.KnowledgeGraphDataset

class cogdl.datasets.kg_data.KnowledgeGraphDataset(root, name)[source]

Bases: cogdl.data.dataset.Dataset

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

num_entities
num_relations
process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

test_start_idx
train_start_idx
url = 'https://cloud.tsinghua.edu.cn/d/b567292338f2488699b7/files/?p=%2F{}%2F{}&dl=1'
valid_start_idx
class cogdl.datasets.kg_data.WN18Datset(data_path='data')[source]

Bases: cogdl.datasets.kg_data.KnowledgeGraphDataset

class cogdl.datasets.kg_data.WN18RRDataset(data_path='data')[source]

Bases: cogdl.datasets.kg_data.KnowledgeGraphDataset

cogdl.datasets.kg_data.read_triplet_data(folder)[source]

Matlab matrix dataset

class cogdl.datasets.matlab_matrix.BlogcatalogDataset(data_path='data')[source]

Bases: cogdl.datasets.matlab_matrix.MatlabMatrix

class cogdl.datasets.matlab_matrix.DblpNEDataset(data_path='data')[source]

Bases: cogdl.datasets.matlab_matrix.NetworkEmbeddingCMTYDataset

class cogdl.datasets.matlab_matrix.FlickrDataset(data_path='data')[source]

Bases: cogdl.datasets.matlab_matrix.MatlabMatrix

class cogdl.datasets.matlab_matrix.MatlabMatrix(root, name, url)[source]

Bases: cogdl.data.dataset.Dataset

networks from the http://leitang.net/code/social-dimension/data/ or http://snap.stanford.edu/node2vec/

Args:
root (string): Root directory where the dataset should be saved. name (string): The name of the dataset ("Blogcatalog").
download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

num_classes

The number of classes in the dataset.

num_nodes
process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

class cogdl.datasets.matlab_matrix.NetworkEmbeddingCMTYDataset(root, name, url)[source]

Bases: cogdl.data.dataset.Dataset

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

num_classes

The number of classes in the dataset.

num_nodes
process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

class cogdl.datasets.matlab_matrix.PPIDataset(data_path='data')[source]

Bases: cogdl.datasets.matlab_matrix.MatlabMatrix

class cogdl.datasets.matlab_matrix.WikipediaDataset(data_path='data')[source]

Bases: cogdl.datasets.matlab_matrix.MatlabMatrix

class cogdl.datasets.matlab_matrix.YoutubeNEDataset(data_path='data')[source]

Bases: cogdl.datasets.matlab_matrix.NetworkEmbeddingCMTYDataset

PyG OGB dataset

class cogdl.datasets.ogb.OGBArxivDataset(data_path='data')[source]

Bases: cogdl.datasets.ogb.OGBNDataset

get_evaluator()[source]
class cogdl.datasets.ogb.OGBCodeDataset(data_path='data')[source]

Bases: cogdl.datasets.ogb.OGBGDataset

class cogdl.datasets.ogb.OGBGDataset(root, name)[source]

Bases: cogdl.data.dataset.Dataset

get(idx)[source]

Gets the data object at index idx.

get_loader(args)[source]
get_subset(subset)[source]
num_classes

The number of classes in the dataset.

class cogdl.datasets.ogb.OGBMolbaceDataset(data_path='data')[source]

Bases: cogdl.datasets.ogb.OGBGDataset

class cogdl.datasets.ogb.OGBMolhivDataset(data_path='data')[source]

Bases: cogdl.datasets.ogb.OGBGDataset

class cogdl.datasets.ogb.OGBMolpcbaDataset(data_path='data')[source]

Bases: cogdl.datasets.ogb.OGBGDataset

class cogdl.datasets.ogb.OGBNDataset(root, name, transform=None)[source]

Bases: cogdl.data.dataset.Dataset

get(idx)[source]

Gets the data object at index idx.

get_evaluator()[source]
get_loss_fn()[source]
process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

class cogdl.datasets.ogb.OGBPapers100MDataset(data_path='data')[source]

Bases: cogdl.datasets.ogb.OGBNDataset

class cogdl.datasets.ogb.OGBPpaDataset[source]

Bases: cogdl.datasets.ogb.OGBGDataset

class cogdl.datasets.ogb.OGBProductsDataset(data_path='data')[source]

Bases: cogdl.datasets.ogb.OGBNDataset

class cogdl.datasets.ogb.OGBProteinsDataset(data_path='data')[source]

Bases: cogdl.datasets.ogb.OGBNDataset

edge_attr_size
get_evaluator()[source]
get_loss_fn()[source]
process()[source]

Processes the dataset to the self.processed_dir folder.

TU dataset

class cogdl.datasets.tu_data.CollabDataset(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.ENZYMES(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.ImdbBinaryDataset(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.ImdbMultiDataset(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.MUTAGDataset(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.NCI109Dataset(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.NCI1Dataset(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.PTCMRDataset(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.ProteinsDataset(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.RedditBinary(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.RedditMulti12K(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.RedditMulti5K(data_path='data')[source]

Bases: cogdl.datasets.tu_data.TUDataset

class cogdl.datasets.tu_data.TUDataset(root, name)[source]

Bases: cogdl.data.dataset.MultiGraphDataset

download()[source]

Downloads the dataset to the self.raw_dir folder.

num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

url = 'https://www.chrsmrrs.com/graphkerneldatasets'
cogdl.datasets.tu_data.cat(seq)[source]
cogdl.datasets.tu_data.coalesce(index, value, m, n)[source]
cogdl.datasets.tu_data.normalize_feature(data)[source]
cogdl.datasets.tu_data.num_edge_attributes(edge_attr=None)[source]
cogdl.datasets.tu_data.num_edge_labels(edge_attr=None)[source]
cogdl.datasets.tu_data.num_node_attributes(x=None)[source]
cogdl.datasets.tu_data.num_node_labels(x=None)[source]
cogdl.datasets.tu_data.parse_txt_array(src, sep=None, start=0, end=None, dtype=None, device=None)[source]
cogdl.datasets.tu_data.read_file(folder, prefix, name, dtype=None)[source]
cogdl.datasets.tu_data.read_tu_data(folder, prefix)[source]
cogdl.datasets.tu_data.read_txt_array(path, sep=None, start=0, end=None, dtype=None, device=None)[source]
cogdl.datasets.tu_data.segment(src, indptr)[source]

Module contents

cogdl.datasets.build_dataset(args)[source]
cogdl.datasets.build_dataset_from_name(dataset)[source]
cogdl.datasets.build_dataset_from_path(data_path, dataset=None)[source]
cogdl.datasets.register_dataset(name)[source]

New dataset types can be added to cogdl with the register_dataset() function decorator.

For example:

@register_dataset('my_dataset')
class MyDataset():
    (...)
Args:
name (str): the name of the dataset
cogdl.datasets.try_adding_dataset_args(dataset, parser)[source]