I'm conducting research with temporal graph data using Pytorch-geometric.
I'm facing some issues of memory usage when making PyG data in dense format (with to_dense_batch() and to_dense_adj()).
I have tried 3 kinds of batching approaches, but I got stuck in the issues of 1) memory usage, 2) inconsistent tensor sizes, and 3) overly sparse snapshots:
- each batch contains multiple edges (e.g., 400 edges per batch)
- each batch contains multiple snapshots (1 snapshot for each timestamp)
- each batch contains multiple graph sequences (e.g., each sequence containing 5 snapshots)
I'm wondering:
Is it possible to treat a batch of snapshots (with different numbers of nodes) as a sequence? If yes, how can one feed such a batch to LSTM or Transformer-based architectures?
Is it possible to batch multiple graph sequences (e.g., a batch of 4 sequences, with each sequence containing 5 snapshots) and feed their dense node/edge embeddings into NN models with LSTM or Transformer architectures? Or it is suggested to use sparse matrices?
How to split csv data with rows of records (including columns
['source', 'target', 'interaction_type', 'timestamp']) so that 1) the density of snapshots can be sufficiently high (e.g., more than 0.5); 2) the number of nodes in each snapshot keeps consistent?
Looking forward to suggestions from anyone having experiences in handling temporal graphs data. Any suggestion will be greatly appriciated. Thanks.