Lecture 4: Analyzing Massive Graphs - Part I¶

The Art of Analyzing Big Data - The Data Scientist’s Toolbox - Lecture 2 ¶

By Dr. Michael Fire¶

0. Package Setup¶

For this lecture, we are going to use the Kaggle, TuriCreate, Networkx, igraph packages. Let's set them up:

# Installing the Kaggle package
!pip install kaggle 

#Important Note: complete this with your own key - after running this for the first time remmember to **remove** your API_KEY
api_token = {"username":"<Insert Your Kaggle User Name>","key":"<Insert Your Kaggle API key>"}

# creating kaggle.json file with the personal API-Key details 
# You can also put this file on your Google Drive
with open('~/.kaggle/kaggle.json', 'w') as file:
  json.dump(api_token, file)
!chmod 600 ~/.kaggle/kaggle.json

!pip install turicreate

Requirement already satisfied: turicreate in /anaconda3/envs/massivedata/lib/python3.6/site-packages (6.1)
Requirement already satisfied: decorator>=4.0.9 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (4.4.0)
Requirement already satisfied: requests>=2.9.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (2.22.0)
Requirement already satisfied: resampy==0.2.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (0.2.1)
Requirement already satisfied: pillow>=5.2.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (6.2.0)
Requirement already satisfied: prettytable==0.7.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (0.7.2)
Requirement already satisfied: six>=1.10.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (1.12.0)
Requirement already satisfied: tensorflow>=2.0.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (2.1.0)
Requirement already satisfied: pandas>=0.23.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (0.25.1)
Requirement already satisfied: scipy>=1.1.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (1.3.1)
Requirement already satisfied: coremltools==3.3 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (3.3)
Requirement already satisfied: numpy in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from turicreate) (1.17.2)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests>=2.9.1->turicreate) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests>=2.9.1->turicreate) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests>=2.9.1->turicreate) (1.24.2)
Requirement already satisfied: idna<2.9,>=2.5 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests>=2.9.1->turicreate) (2.8)
Requirement already satisfied: numba>=0.32 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from resampy==0.2.1->turicreate) (0.45.1)
Requirement already satisfied: wrapt>=1.11.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (1.11.2)
Requirement already satisfied: gast==0.2.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (0.2.2)
Requirement already satisfied: google-pasta>=0.1.6 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (0.1.8)
Requirement already satisfied: astor>=0.6.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (0.8.1)
Requirement already satisfied: keras-preprocessing>=1.1.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (1.1.0)
Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (2.1.0)
Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (2.1.0)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (0.33.6)
Requirement already satisfied: keras-applications>=1.0.8 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (1.0.8)
Requirement already satisfied: absl-py>=0.7.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (0.9.0)
Requirement already satisfied: termcolor>=1.1.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (1.1.0)
Requirement already satisfied: protobuf>=3.8.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (3.11.3)
Requirement already satisfied: opt-einsum>=2.3.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (3.1.0)
Requirement already satisfied: grpcio>=1.8.6 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorflow>=2.0.0->turicreate) (1.27.2)
Requirement already satisfied: python-dateutil>=2.6.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from pandas>=0.23.2->turicreate) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from pandas>=0.23.2->turicreate) (2019.3)
Requirement already satisfied: llvmlite>=0.29.0dev0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from numba>=0.32->resampy==0.2.1->turicreate) (0.29.0)
Requirement already satisfied: markdown>=2.6.8 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (3.2.1)
Requirement already satisfied: werkzeug>=0.11.15 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (0.16.0)
Requirement already satisfied: setuptools>=41.0.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (41.4.0)
Requirement already satisfied: google-auth<2,>=1.6.3 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (1.11.2)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (0.4.1)
Requirement already satisfied: h5py in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from keras-applications>=1.0.8->tensorflow>=2.0.0->turicreate) (2.9.0)
Requirement already satisfied: rsa<4.1,>=3.1.4 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (4.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (0.2.8)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (4.0.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (1.3.0)
Requirement already satisfied: pyasn1>=0.1.3 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow>=2.0.0->turicreate) (3.1.0)

!pip install networkx
!pip install python-igraph

Requirement already satisfied: networkx in /anaconda3/envs/massivedata/lib/python3.6/site-packages (2.3)
Requirement already satisfied: decorator>=4.3.0 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from networkx) (4.4.0)
Requirement already satisfied: python-igraph in /anaconda3/envs/massivedata/lib/python3.6/site-packages (0.8.0)
Requirement already satisfied: texttable>=1.6.2 in /anaconda3/envs/massivedata/lib/python3.6/site-packages (from python-igraph) (1.6.2)

Example 1: Marvel Superheroes - Working with Networkx¶

In this example, we will learn how to work with graphs using the Marvel Universe Social Network dataset. First, let's download the dataset, and use it to construct an undirected graph:

# Creating a dataset directory
!mkdir ./datasets
!mkdir ./datasets/the-marvel-universe-social-network

# download the dataset from Kaggle and unzip it
!kaggle datasets download csanhueza/the-marvel-universe-social-network -p ./datasets/the-marvel-universe-social-network
!unzip ./datasets/the-marvel-universe-social-network/*.zip  -d ./datasets/the-marvel-universe-social-network/

mkdir: ./datasets: File exists
mkdir: ./datasets/the-marvel-universe-social-network: File exists
the-marvel-universe-social-network.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  ./datasets/the-marvel-universe-social-network/the-marvel-universe-social-network.zip
replace ./datasets/the-marvel-universe-social-network/edges.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C

import networkx as nx
import turicreate as tc 

n_sf = tc.SFrame.read_csv("./datasets/the-marvel-universe-social-network/nodes.csv")
e_sf = tc.SFrame.read_csv("./datasets/the-marvel-universe-social-network/hero-network.csv")

n_sf

Finished parsing file /Users/michael/Dropbox (BGU)/massive data mining/ 2020/notebooks/datasets/the-marvel-universe-social-network/nodes.csv

Parsing completed. Parsed 100 lines in 0.03434 secs.

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------

Finished parsing file /Users/michael/Dropbox (BGU)/massive data mining/ 2020/notebooks/datasets/the-marvel-universe-social-network/nodes.csv

Parsing completed. Parsed 19090 lines in 0.011335 secs.

Finished parsing file /Users/michael/Dropbox (BGU)/massive data mining/ 2020/notebooks/datasets/the-marvel-universe-social-network/hero-network.csv

Parsing completed. Parsed 100 lines in 0.285698 secs.

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------

Finished parsing file /Users/michael/Dropbox (BGU)/massive data mining/ 2020/notebooks/datasets/the-marvel-universe-social-network/hero-network.csv

Parsing completed. Parsed 574467 lines in 0.301575 secs.

e_sf

Now let's load the nodes (vertices) and edges (links) data into a graph object. We can create the graph by inserting each node and each edge one after the other, or by inserting the nodes and edges all at once:

%%timeit
g = nx.Graph() # Creating Undirected Graph

# adding each node and edge one after the other
for n in n_sf['node']:
    g.add_node(n)
    
for r in e_sf:
    g.add_edge(r['hero1'], r['hero2'])

2.28 s ± 31.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
g = nx.Graph() # Creating Undirected Graph
# adding all nodes and vertices at once
g.add_nodes_from(n_sf['node'])
g.add_edges_from([(r['hero1'],r['hero2']) for r in e_sf])

2.25 s ± 10.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

g = nx.Graph() # Creating Undirected Graph
g.add_nodes_from(n_sf['node'])
g.add_edges_from([(r['hero1'],r['hero2']) for r in e_sf])
print(nx.info(g))

Name: 
Type: Graph
Number of nodes: 19232
Number of edges: 167219
Average degree:  17.3897

We can see that the constructed graph has over 19,000 nodes and over 167,000 edges. Let's use the graph structure to answer several questions.

Question: Who is the most friendly superhero?

Note: If we wanted to answer this question using DataFrame, it wouldn't be trivial because for each hero we would need to count the number of distinct friends both when the hero appears in the Hero1 column and the Hero2 column. However, answering this question using a graph object is relatively easy; we simply need to find the node with the maximal degree).

Let's calculate the degree of each vertex:

d = g.degree()
list(dict(d).items())[:20]

[('2001 10', 0),
 ('2001 8', 0),
 ('2001 9', 0),
 ('24-HOUR MAN/EMMANUEL', 5),
 ('3-D MAN/CHARLES CHAN', 122),
 ('4-D MAN/MERCURIO', 72),
 ('8-BALL/', 14),
 ("A '00", 0),
 ("A '01", 0),
 ('A 100', 0),
 ('A 101', 0),
 ('A 102', 0),
 ('A 103', 0),
 ('A 104', 0),
 ('A 105', 0),
 ('A 106', 0),
 ('A 107', 0),
 ('A 108', 0),
 ('A 109', 0),
 ('A 10', 0)]

print("There are %s superheroes connected to Black Panter"  %
      d["BLACK PANTHER/T'CHAL"])

There are 711 superheroes connected to Black Panter

Let's find the vertex with the highest degree:

import operator
max(dict(d).items(), key=operator.itemgetter(1))

('CAPTAIN AMERICA', 1908)

So, using the degree, we discovered that the "most friendly" superhero is Captain America who is connected to 1908 heroes. Let's use seaborn to calculate the graph degree distribution:

import seaborn as sns
%matplotlib inline
sns.set()
sns.distplot([v for v in dict(d).values()])

<matplotlib.axes._subplots.AxesSubplot at 0xa22043e48>

From the above plot, we can see that many nodes have 0 or 1 degree, i.e. these heroes are not connected to any other hero, or connected to only a single hero. Let's create a subgraph without these nodes:

# let's create a list with nodes that have degree > 1
selected_nodes_list = [n for n,d in dict(d).items() if d > 1]
# create a subgraph with only nodes from the above list
h = g.subgraph(selected_nodes_list)
print(nx.info(h))

Name: 
Type: Graph
Number of nodes: 6373
Number of edges: 167167
Average degree:  52.4610

We were left with only 6373 heroes out of 19232 heroes. Among the wonderful things that are useful using graphs as data structures is the ability to separate them into communities, i.e., disjoint subgraphs. Let's use Clauset-Newman-Moore greedy modularity maximization to separate the graph into communities, and answer the following question:

Question: What is the largest community in the graph?

from networkx.algorithms.community import greedy_modularity_communities
cc = greedy_modularity_communities(h) # this can take some time
len(cc)

68

list(cc[0])[:20]

['GLITTER/',
 'WATCHLORD/',
 'TEN-THIRTIFOR',
 'HUMAN TORCH ANDROID/',
 'AVRIL, YVETTE',
 'EEL/LEOPOLD STRYKE',
 'HELA [ASGARDIAN]',
 'KID COLT',
 "IRON FIST H'YLTHRI I",
 'SPLICE/CHANDRA KU',
 'ASBERY, SHAMARI',
 'SHROUD/MAXIMILLIAN Q',
 'AUNTIE FREEZE/',
 'DEIMOS',
 'FORTHWARD, KENT',
 'VOLSTAGG',
 'BYRD, SEN. HARRINGTO',
 'MAXXAM',
 'NAPIER, RAMONA DR.',
 'HAROKIN [ASGARDIAN]']

Using the community detection algorithm, we detected 66 communities of different sizes. Let's view the size of the distribution of the community sizes:

import matplotlib.pyplot as plt
community_size_list = [len(c) for c in cc]
plt.hist(community_size_list)

(array([64.,  1.,  0.,  0.,  0.,  1.,  0.,  1.,  0.,  1.]),
 array([   3. ,  237.1,  471.2,  705.3,  939.4, 1173.5, 1407.6, 1641.7,
        1875.8, 2109.9, 2344. ]),
 <a list of 10 Patch objects>)

We can see that most communities are relatively small. Let's find a community that is larger than 100 but smaller than 500:

selected_community_list = [c for c in cc if 500 > len(c) > 100]
len(selected_community_list)

2

Let's draw both communities:

plt.figure(figsize=(20,20))
c1 = h.subgraph(selected_community_list[0])
nx.draw_kamada_kawai(c1, with_labels=True)

/anaconda3/envs/massivedata/lib/python3.6/site-packages/networkx/drawing/nx_pylab.py:579: MatplotlibDeprecationWarning: 
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
  if not cb.iterable(width):

plt.figure(figsize=(20,20))
c2 = h.subgraph(selected_community_list[1])
nx.draw_kamada_kawai(c2, with_labels=True)

There are many centrality measures that can help to identify the most central heroes. Let's use PageRank to find key heroes in each community:

#According to PageRank who is the most centeral hero:
d = nx.pagerank(g)
max(dict(d).items(), key=operator.itemgetter(1))

('SPIDER-MAN/PETER PAR', 0.004089106017928688)

#According to Closeness Centrality who is the most central hero:
d = nx.closeness_centrality(g) # can take some time to run

max(dict(d).items(), key=operator.itemgetter(1))

('CAPTAIN AMERICA', 0.19500786893170108)

def find_centeral_node(graph):
    print("-"*100)
    print(nx.info(graph))
    d = nx.degree_centrality(graph)
    hero = max(dict(d).items(), key=operator.itemgetter(1))[0]
    print("The most central role according to Degree Centrality is %s" % hero)

    d = nx.pagerank(graph)
    hero = max(dict(d).items(), key=operator.itemgetter(1))[0]
    print("The most central  role according to PageRank is %s" % hero)

    d = nx.closeness_centrality(graph)
    hero = max(dict(d).items(), key=operator.itemgetter(1))[0]
    print("The most centcentral role according to Closeness Centrality is %s" % hero)

for c in cc:
    if len(c) < 10: # skip small communities with only few nodes
        continue
    h = g.subgraph(c)
    find_centeral_node(h)

----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 2344
Number of edges: 67965
Average degree:  57.9906
The most central role according to Degree Centrality is CAPTAIN AMERICA
The most central  role according to PageRank is CAPTAIN AMERICA
The most centcentral role according to Closeness Centrality is CAPTAIN AMERICA
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 1699
Number of edges: 22210
Average degree:  26.1448
The most central role according to Degree Centrality is SPIDER-MAN/PETER PAR
The most central  role according to PageRank is SPIDER-MAN/PETER PAR
The most centcentral role according to Closeness Centrality is SPIDER-MAN/PETER PAR
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 1343
Number of edges: 24053
Average degree:  35.8198
The most central role according to Degree Centrality is WOLVERINE/LOGAN 
The most central  role according to PageRank is WOLVERINE/LOGAN 
The most centcentral role according to Closeness Centrality is WOLVERINE/LOGAN 
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 252
Number of edges: 1501
Average degree:  11.9127
The most central role according to Degree Centrality is FURY, COL. NICHOLAS 
The most central  role according to PageRank is FURY, COL. NICHOLAS 
The most centcentral role according to Closeness Centrality is FURY, COL. NICHOLAS 
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 118
Number of edges: 2149
Average degree:  36.4237
The most central role according to Degree Centrality is BLOODSTORM | MUTANT 
The most central  role according to PageRank is BLOODSTORM | MUTANT 
The most centcentral role according to Closeness Centrality is BLOODSTORM | MUTANT 
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 68
Number of edges: 548
Average degree:  16.1176
The most central role according to Degree Centrality is RADIUS/JARED CORBO
The most central  role according to PageRank is RADIUS/JARED CORBO
The most centcentral role according to Closeness Centrality is RADIUS/JARED CORBO
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 52
Number of edges: 418
Average degree:  16.0769
The most central role according to Degree Centrality is STARSHINE II/BRANDY 
The most central  role according to PageRank is STARSHINE II/BRANDY 
The most centcentral role according to Closeness Centrality is STARSHINE II/BRANDY 
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 40
Number of edges: 317
Average degree:  15.8500
The most central role according to Degree Centrality is UNI-LORD
The most central  role according to PageRank is UNI-LORD
The most centcentral role according to Closeness Centrality is UNI-LORD
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 32
Number of edges: 179
Average degree:  11.1875
The most central role according to Degree Centrality is M'SHULLA
The most central  role according to PageRank is M'SHULLA
The most centcentral role according to Closeness Centrality is M'SHULLA
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 31
Number of edges: 172
Average degree:  11.0968
The most central role according to Degree Centrality is SHERIDAN, RICHARD RI
The most central  role according to PageRank is SHERIDAN, RICHARD RI
The most centcentral role according to Closeness Centrality is SHERIDAN, RICHARD RI
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 22
Number of edges: 108
Average degree:   9.8182
The most central role according to Degree Centrality is JOSEPH, SHEVA
The most central  role according to PageRank is JOSEPH, SHEVA
The most centcentral role according to Closeness Centrality is JOSEPH, SHEVA
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 21
Number of edges: 163
Average degree:  15.5238
The most central role according to Degree Centrality is JONES, JANIS
The most central  role according to PageRank is JONES, JANIS
The most centcentral role according to Closeness Centrality is JONES, JANIS
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 17
Number of edges: 49
Average degree:   5.7647
The most central role according to Degree Centrality is MELLACE, LUCY SANTIN
The most central  role according to PageRank is MELLACE, LUCY SANTIN
The most centcentral role according to Closeness Centrality is MELLACE, LUCY SANTIN
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 17
Number of edges: 72
Average degree:   8.4706
The most central role according to Degree Centrality is SIMON, FELIX
The most central  role according to PageRank is KNUTZ, CINDY
The most centcentral role according to Closeness Centrality is SIMON, FELIX
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 13
Number of edges: 39
Average degree:   6.0000
The most central role according to Degree Centrality is PRINCE BAYAN
The most central  role according to PageRank is PRINCE BAYAN
The most centcentral role according to Closeness Centrality is PRINCE BAYAN
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 13
Number of edges: 22
Average degree:   3.3846
The most central role according to Degree Centrality is PARADOX
The most central  role according to PageRank is PARADOX
The most centcentral role according to Closeness Centrality is PARADOX
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 10
Number of edges: 25
Average degree:   5.0000
The most central role according to Degree Centrality is SIEGEL, DR. AVRAM
The most central  role according to PageRank is SIEGEL, DR. AVRAM
The most centcentral role according to Closeness Centrality is SIEGEL, DR. AVRAM
----------------------------------------------------------------------------------------------------
Name: 
Type: Graph
Number of nodes: 10
Number of edges: 45
Average degree:   9.0000
The most central role according to Degree Centrality is MYSTERIUM/DR. JOSEPH
The most central  role according to PageRank is MYSTERIUM/DR. JOSEPH
The most centcentral role according to Closeness Centrality is MYSTERIUM/DR. JOSEPH

We can also use Networkx to find the shortest path between vertices. Let's use the shortest path algorithm to find the distance between the Black Panther and the Vulture II:

nx.shortest_path(g, "BLACK PANTHER/T'CHAL", "VULTURE II/BLACKIE D")

["BLACK PANTHER/T'CHAL", 'SPIDER-MAN/PETER PAR', 'VULTURE II/BLACKIE D']

The shortest path from the Black Panther to the Vulture is via Spiderman.We can also use Networkx to find the maximal clique of superheroes:

#%%timeit
# Will run for a very very long time
#max_clique_graph = nx.make_max_clique_graph(g)

Finding the maximal clique can take a very long time using Networkx. Let's use igraph to find the maximal clique. Let's create the Marvel Superheroes network as an igraph object:

import igraph

def create_igraph_object(vertices_list, edges_list, is_directed):
    ig = igraph.Graph(directed=is_directed)
    ig.add_vertices(len(vertices_list))
    ig.vs["name"] = vertices_list
    v_dict = {vertices_list[i]:i  for i in range(len(vertices_list))}
    # Need to be careful! If edges_list contains both (a,b) and (b,a) they will
    # inserted as different edges
    edges_list = [(v_dict[e[0]], v_dict[e[1]]) for e in edges_list]
    ig.add_edges(edges_list)
    return ig

ig = create_igraph_object(list(g.nodes()), list(g.edges()), False)
print(f"Verticies {ig.vcount()} and Links {ig.ecount()}")

Verticies 19232 and Links 167219

%%timeit
largest_c = ig.largest_cliques()

16.7 s ± 72.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

largest_c = ig.largest_cliques()
print("Largest clique with %s vertcies" % len(largest_c[0]))
h = ig.subgraph(largest_c[0])
h.vs["name"]

Largest clique with 111 vertcies

['3-D MAN/CHARLES CHAN',
 'AJAK/TECUMOTZIN [ETE',
 'ANGEL/WARREN KENNETH',
 'ANT-MAN II/SCOTT HAR',
 'ARABIAN KNIGHT/ABDUL',
 'BEAST/HENRY &HANK& P',
 'BLACK BOLT/BLACKANTO',
 "BLACK PANTHER/T'CHAL",
 'BLITZKRIEG/',
 'BROTHER VOODOO/DANIE',
 'CAGE, LUKE/CARL LUCA',
 'CAPTAIN AMERICA',
 'CAPTAIN BRITAIN/BRIA',
 'COLLECTIVE MAN',
 'COLOSSUS II/PETER RA',
 'CRIMSON DYNAMO V/DIM',
 'CRYSTAL [INHUMAN]',
 'CYCLOPS/SCOTT SUMMER',
 'DAREDEVIL/MATT MURDO',
 'DARKSTAR/LAYNIA SERG',
 'DAZZLER II/ALLISON B',
 'DEFENSOR',
 'DEVIL-SLAYER/ERIC SI',
 'DOC SAMSON/DR. LEONA',
 'DR. DRUID/ANTHONY LU',
 'DRUMM, JERICHO',
 'FALCON/SAM WILSON',
 'GARGOYLE II/ISAAC CH',
 'GORGON [INHUMAN]',
 'GRANDMASTER/EN DWI G',
 'GUARDIAN/JAMES MACDO',
 'GYPSY MOTH',
 'HAWK',
 'HELLCAT/PATSY WALKER',
 'HELLSTORM/DAIMON HEL',
 'HERCULES [GREEK GOD]',
 'HULK/DR. ROBERT BRUC',
 'HUMAN TORCH/JOHNNY S',
 'IGOR',
 'IKARIS/IKE HARRIS [E',
 'IRON FIST/DANIEL RAN',
 'JOCASTA',
 'KARNAK [INHUMAN]',
 'KA-ZAR/KEVIN PLUNDER',
 'LIVING MUMMY',
 'LOBO',
 'LOCKJAW [INHUMAN]',
 'MACHINE MAN/X-51',
 'MEDUSA/MEDUSALITH AM',
 'MIKHLO',
 'MOCKINGBIRD/DR. BARB',
 'MOONDRAGON/HEATHER D',
 'MOON KNIGHT/MARC SPE',
 'MR. FANTASTIC/REED R',
 'NIGHTCRAWLER/KURT WA',
 'NIGHTHAWK/KYLE RICHM',
 'NORRISS, SISTER BARB',
 "O'BRIEN, MICHAEL",
 'PALADIN/PAUL DENNIS',
 'PEATOR',
 'PEREGRINE, LE/FRANCK',
 'QUASAR III/WENDELL V',
 'QUICKSILVER/PIETRO M',
 'RED GHOST/IVAN KRAGO',
 'REDWING',
 'RED WOLF III/WILL TA',
 'ROM, SPACEKNIGHT',
 'SABRA/RUTH BAT-SERAP',
 'SASQUATCH/WALTER LAN',
 'SERSI/SYLVIA',
 'SHAMAN/MICHAEL TWOYO',
 'SHAMROCK/MOLLY FITZG',
 "SHANNA/SHANNA O'HARA",
 'SHE-HULK/JENNIFER WA',
 'SHROUD/MAXIMILLIAN Q',
 'SPIDER-WOMAN/JESSICA',
 'STORM/ORORO MUNROE S',
 'SUB-MARINER/NAMOR MA',
 'SUNFIRE/SHIRO YOSHID',
 'TALISMAN',
 'TEXAS TWISTER/DREW D',
 'THING/BENJAMIN J. GR',
 'THOR/DR. DONALD BLAK',
 'TIGRA/GREER NELSON',
 'TORPEDO III/BROCK JO',
 'TRITON',
 'URSA MAJOR/MIKHAIL U',
 'VALINOR',
 'VANGUARD/NICOLAI KRY',
 'WEREWOLF BY NIGHT/JA',
 'WHIZZER/ROBERT L. FR',
 'WONDER MAN/SIMON WIL',
 'IRON MAN/TONY STARK ',
 'SPIDER-MAN/PETER PAR',
 'WASP/JANET VAN DYNE ',
 'SCARLET WITCH/WANDA ',
 'WOLVERINE/LOGAN ',
 'VISION ',
 'INVISIBLE WOMAN/SUE ',
 'DR. STRANGE/STEPHEN ',
 'PROFESSOR X/CHARLES ',
 'ICEMAN/ROBERT BOBBY ',
 'HAVOK/ALEX SUMMERS ',
 'BLACK WIDOW/NATASHA ',
 'JACK OF HEARTS/JACK ',
 'SHADOWCAT/KATHERINE ',
 'BLACK KNIGHT V/DANE ',
 'STINGRAY/DR. WALTER ',
 'AURORA/JEANNE-MARIE ',
 'NORTHSTAR/JEAN-PAUL ',
 'SNOWBIRD/NARYA/ANNE ']

And we can return back to using Networkx:

plt.figure(figsize=(20,20))
h = g.subgraph(h.vs["name"])
nx.draw_circular(h, with_labels=True)

Example 2: Lord of The Rings - Working with Cytoscape¶

In the next example, we will use networks created from the subtitles of the Lord of The Rings movie trilogy. Let's start by loading each movie network from our sub2network project, and joining them into a single network:

# Creating a dataset directory
!mkdir ./datasets/LTOR-networks
!wget https://www.dropbox.com/s/qk36gdgh1lmrdea/LTOR-networks.zip -O ./datasets/LTOR-networks/LTOR-networks.zip
!unzip ./datasets/LTOR-networks/*.zip  -d ./datasets/LTOR-networks/
!ls ./datasets/LTOR-networks/

mkdir: ./datasets/LTOR-networks: File exists
/bin/sh: wget: command not found
unzip:  cannot find or open ./datasets/LTOR-networks/*.zip, ./datasets/LTOR-networks/*.zip.zip or ./datasets/LTOR-networks/*.zip.ZIP.

No zipfiles found.
(2001) - The Lord of the Rings: The Fellowship of the Ring.json
(2002) - The Lord of the Rings: The Two Towers.json
(2003) - The Lord of the Rings: The Return of the King.json

import networkx as nx
from networkx.readwrite import json_graph
import json
import turicreate as tc 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

j = json.load(open("./datasets/LTOR-networks/(2001) - The Lord of the Rings: The Fellowship of the Ring.json"))
g1 = json_graph.node_link_graph(j)
plt.figure(figsize=(10,10))
nx.draw_kamada_kawai(g1, with_labels=True)

In these networks, each edge has attributes, such as weight:

g1['Samwise "Sam" Gamgee']

AtlasView({'Frodo Baggins': {'first': 12, 'last': 3471, 'weight': 234}, 'Peregrin "Pippin" Took': {'first': 28, 'last': 3421, 'weight': 41}, 'Voice of the Ring': {'first': 34, 'last': 3441, 'weight': 53}, 'Lobelia Sackville-Baggins (extended edition)': {'first': 82, 'last': 1227, 'weight': 3}, 'Sauron / Lugdush': {'first': 89, 'last': 3007, 'weight': 13}, 'Meriadoc "Merry" Brandybuck': {'first': 182, 'last': 3421, 'weight': 30}, 'Legolas': {'first': 225, 'last': 2989, 'weight': 10}, 'Isildur': {'first': 227, 'last': 2173, 'weight': 4}, 'Bilbo Baggins': {'first': 374, 'last': 2812, 'weight': 55}, 'Hamfast "Gaffer" Gamgee (extended edition)': {'first': 374, 'last': 2173, 'weight': 3}, 'Elrond': {'first': 374, 'last': 2789, 'weight': 16}, 'Boromir': {'first': 728, 'last': 3303, 'weight': 16}, 'Gandalf the Grey': {'first': 873, 'last': 3333, 'weight': 25}, 'Mrs. Bracegirdle (extended edition)': {'first': 1101, 'last': 1101, 'weight': 3}, 'Arwen Evenstar': {'first': 1153, 'last': 3096, 'weight': 8}, 'Rose "Rosie" Cotton': {'first': 1227, 'last': 2173, 'weight': 6}, 'Gimli': {'first': 1741, 'last': 2173, 'weight': 5}, 'Cute Hobbit Child': {'first': 1793, 'last': 2989, 'weight': 4}, 'Saruman': {'first': 1811, 'last': 1839, 'weight': 4}, 'Galadriel': {'first': 2222, 'last': 3096, 'weight': 8}})

g1['Samwise "Sam" Gamgee']['Frodo Baggins']

{'first': 12, 'last': 3471, 'weight': 234}

g1['Frodo Baggins']['Samwise "Sam" Gamgee']

{'first': 12, 'last': 3471, 'weight': 234}

Let's load the two other networks and join all the networks into a single large network:

j = json.load(open("./datasets/LTOR-networks/(2002) - The Lord of the Rings: The Two Towers.json"))
g2 = json_graph.node_link_graph(j)
plt.figure(figsize=(10,10))
nx.draw_kamada_kawai(g2, with_labels=True)

j = json.load(open("./datasets/LTOR-networks/(2003) - The Lord of the Rings: The Return of the King.json"))
g3 = json_graph.node_link_graph(j)
plt.figure(figsize=(10,10))
nx.draw_kamada_kawai(g3, with_labels=True)

Let's create the new large network:

lotr_graph = nx.Graph()
l = [g1,g2,g3]
nodes = set()
edges = set()
for g in l:
    nodes |= g.nodes()
    edges |= g.edges()

lotr_graph.add_nodes_from(nodes)
lotr_graph.add_edges_from(edges)

#let's add weights
for e in lotr_graph.edges():
    lotr_graph[e[0]][e[1]]['weight'] = 0

for g in l:
    for e in g.edges():
        lotr_graph[e[0]][e[1]]['weight'] += g[e[0]][e[1]]['weight']
        
print(nx.info(lotr_graph))

Name: 
Type: Graph
Number of nodes: 53
Number of edges: 295
Average degree:  11.1321

plt.figure(figsize=(30,30))
nx.draw_kamada_kawai(lotr_graph, with_labels=True)

Let's clean the network data by removing nodes from the "extended edition":

remove_list = [n for n in lotr_graph.nodes() if "(extended edition)" in n]
lotr_graph.remove_nodes_from(remove_list)
plt.figure(figsize=(20,20))
nx.draw_kamada_kawai(lotr_graph, with_labels=True)

from networkx.algorithms.community.label_propagation import label_propagation_communities
cc = list(label_propagation_communities(lotr_graph))
cc

[{'Arwen Evenstar',
  'Bilbo Baggins',
  'Boromir',
  'Celeborn',
  'Cute Hobbit Child',
  'Cute Rohan Refugee Child',
  'Denethor',
  'Elendil',
  'Elrond',
  'Faramir',
  'Farmer Maggot',
  'Freda',
  'Frodo Baggins',
  'Frodo Gamgee',
  'Galadriel',
  'Gamling',
  'Gandalf the Grey',
  'Gandalf the White',
  'Gimli',
  'Gimli / Treebeard',
  'Gollum / Snaga / Mauhúr',
  'Gollum / Witch-king of Angmar',
  'Gothmog / Witch-king of Angmar',
  'Grimbold',
  'Gríma Wormtongue',
  'Haleth',
  'Hero Orc',
  'Háma',
  'Isildur',
  'King of the Dead',
  'Legolas',
  'Madril',
  'Meriadoc "Merry" Brandybuck',
  'Mordor Orc',
  'Morwen',
  'Orc Lieutenant 1',
  'Peregrin "Pippin" Took',
  'Ranger 1',
  'Rose "Rosie" Cotton',
  'Rosie Cotton',
  'Samwise "Sam" Gamgee',
  'Saruman',
  'Sauron / Lugdush',
  'Shagrat / Corsair of Umbar',
  'Voice of Treebeard',
  'Voice of the Ring'}]

Let's save the network to a file and load it using Cytoscape and Gephi.

nx.write_gexf(lotr_graph, "./datasets/LTOR-networks/lotr_network_full.gexf")
nx.write_gml(lotr_graph, "./datasets/LTOR-networks/lotr_network_full.gml")

Example 3: Bitcoin Transactions - Working with Massive Networks and SGraph¶

For this example, we will use The Bitcoin Transactions Network. Let's load the directed network into an SGraph object:

Note: An SGraph can only be used as a directed graph. To represent an undirected graph using SGraph, we can use double links, i.e., a undirected link (u,v) can be represented by two directed links (u,v) and (v,u)

!mkdir ./datasets/
!mkdir ./datasets/bitcoin
!wget http://dynamics.cs.washington.edu/nobackup/bitcoin/bitcoin.tar.gz -O ./datasets/bitcoin/bitcoin.tar.gz
!tar -xf ./datasets/bitcoin/bitcoin.tar.gz -C ./datasets/bitcoin/
!ls ./datasets/bitcoin/

--2020-03-11 13:30:30--  http://dynamics.cs.washington.edu/nobackup/bitcoin/bitcoin.tar.gz
Resolving dynamics.cs.washington.edu (dynamics.cs.washington.edu)... 128.208.3.120
Connecting to dynamics.cs.washington.edu (dynamics.cs.washington.edu)|128.208.3.120|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 678858686 (647M) [application/x-gzip]
Saving to: ‘./datasets/bitcoin/bitcoin.tar.gz’

./datasets/bitcoin/ 100%[===================>] 647.41M   756KB/s    in 10m 37s 

2020-03-11 13:41:10 (1.02 MB/s) - ‘./datasets/bitcoin/bitcoin.tar.gz’ saved [678858686/678858686]

bitcoin        bitcoin.tar.gz

import turicreate as tc 
import networkx as nx
import igraph
import matplotlib.pyplot as plt
%matplotlib inline

v_sf = tc.load_sframe("./datasets/bitcoin/bitcoin/bitcoin.vertices.sframe")
v_sf

l_sf = tc.load_sframe("./datasets/bitcoin/bitcoin/bitcoin.links.sframe")
l_sf

sg = tc.SGraph(vertices=v_sf, edges=l_sf, vid_field="vid", src_field="src_id", dst_field="dst_id")
sg.summary()

{'num_edges': 16057711, 'num_vertices': 6336769}

Using SGraph, we can run the following algorithms: connected components, degree counting, graph coloring, k-Core, Label Propagation, PageRank, shortest path, and triangle counting.

Let's start by calculating vertices' degrees and PageRank:

pr = tc.pagerank.create(sg)
pr

Counting out degree

Done counting out degree

+-----------+-----------------------+

| Iteration | L1 change in pagerank |

+-----------+-----------------------+

| 1         | 5.33827e+06           |

| 2         | 3.0098e+06            |

| 3         | 1.54351e+06           |

| 4         | 792221                |

| 5         | 417110                |

pr['pagerank']

sg.vertices['pagerank'] = pr['graph'].vertices['pagerank'] #pr['graph'] is a graph in which each vertex has pagerank value
sg.vertices

degree = tc.degree_counting.create(sg)
degree['graph']

SGraph({'num_edges': 16057711, 'num_vertices': 6336769})
Vertex Fields:['__id', 'in_degree', 'out_degree', 'total_degree']
Edge Fields:['__src_id', '__dst_id']

# Addding in,out, and total degree to the vertices attributes
sg.vertices['total_degree'] = degree['graph'].vertices['total_degree']
sg.vertices['in_degree'] = degree['graph'].vertices['in_degree']
sg.vertices['out_degree'] = degree['graph'].vertices['out_degree']
sg.vertices.sort("total_degree", ascending=False)

As can be seen, there are some accounts that have extremely high degrees (of over 100,000). Let's compare SGraph performances to those of Networkx and iGraph:

def sgraph2nxgraph(sgraph, is_directed=True, add_vertices_attributes=True, add_edges_attributes=True):
    if is_directed:
        nx_g = nx.DiGraph()
    else:
        nx_g = nx.Graph()
    if add_vertices_attributes:
        vertices = [(r['__id'] , r) for r in sgraph.vertices]
    else:
        vertices = list(sgraph.get_vertices()['__id'])

    if add_edges_attributes:
        edges = [(r['__src_id'], r['__dst_id'], r) for r in sgraph.edges]
    else:
        edges = [(e['__src_id'], e['__dst_id']) for e in sgraph.get_edges()]
    nx_g.add_nodes_from(vertices)
    nx_g.add_edges_from(edges)
    return nx_g


ng = sgraph2nxgraph(sg)
print("Networkx: %s" % nx.info(ng))

Networkx: Name: 
Type: DiGraph
Number of nodes: 6336769
Number of edges: 16057711
Average in degree:   2.5341
Average out degree:   2.5341

import igraph
def sgraph2igraph(sgraph, is_directed=True):
    g = igraph.Graph(directed=is_directed)
    vertices = list(sgraph.vertices['__id'])
    g.add_vertices(len(vertices))
    g.vs["name"] = vertices
    v_dict = {vertices[i]: i for i in range(len(vertices))}
    edges = [(v_dict[e['__src_id']], v_dict[e['__dst_id']]) for e in  sgraph.edges]
    g.add_edges([e[0], e[1]] for e in edges)
    
    return g
  
ig = sgraph2igraph(sg)
print("iGraph: Vertices %s and Links %s" % (ig.vcount(), ig.ecount()))

iGraph: Vertices 6336769 and Links 16057711

# the may be difference in the input parameters 
%timeit ig.pagerank(niter=1000)
%timeit tc.pagerank.create(sg,verbose=False, max_iterations=1000)

16.7 s ± 128 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
57.7 s ± 2.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

# %timeit nx.pagerank(ng) # will take very long time

The Bitcoin Transaction network is too large to be visualized. Let's split the network into weakly connected components:

wcc =  sorted(nx.weakly_connected_components(ng),key=len, reverse=True)
[len(c) for c in wcc][:20]

[6296406, 51, 28, 25, 19, 15, 14, 13, 13, 10, 9, 8, 8, 7, 7, 7, 7, 6, 6, 6]

As can see from the above, we have one large weakly connected component and many components with only one vertex. Let's draw the second large component with 51 vertices:

h = ng.subgraph(wcc[2])
plt.figure(figsize=(10,10))
nx.draw_kamada_kawai(h, with_labels=True)

/anaconda3/envs/massivedata/lib/python3.6/site-packages/networkx/drawing/nx_pylab.py:676: MatplotlibDeprecationWarning: 
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
  if cb.iterable(node_size):  # many node sizes

Let's try to find the strongly connected components:

scc = sorted(nx.strongly_connected_components(ng), key=len, reverse=True)
[len(c) for c in scc][:20]

[4611197,
 3205,
 2029,
 1328,
 1286,
 1240,
 845,
 633,
 321,
 126,
 97,
 91,
 71,
 45,
 41,
 41,
 37,
 32,
 31,
 28]

Let's draw the component with 321 vertices:

h = ng.subgraph(scc[8])
plt.figure(figsize=(20,20))
nx.draw_kamada_kawai(h)

Another way to visualize large networks is using the K-Core decomposition algorithm. Let's use it to visualize the vertices with the degree of above 1000:

#we can also use tc.kcore.create(sg). However, this need computational power and more time
v_list = sg.vertices[sg.vertices['total_degree'] > 200]['__id'] 
len(v_list)

7649

h = ng.subgraph(v_list)
plt.figure(figsize=(20,20))
nx.draw_kamada_kawai(h)

hero1	hero2
LITTLE, ABNER	PRINCESS ZANDA
LITTLE, ABNER	BLACK PANTHER/T'CHAL
BLACK PANTHER/T'CHAL	PRINCESS ZANDA
LITTLE, ABNER	PRINCESS ZANDA
LITTLE, ABNER	BLACK PANTHER/T'CHAL
BLACK PANTHER/T'CHAL	PRINCESS ZANDA
STEELE, SIMON/WOLFGA	FORTUNE, DOMINIC
STEELE, SIMON/WOLFGA	ERWIN, CLYTEMNESTRA
STEELE, SIMON/WOLFGA	IRON MAN/TONY STARK
STEELE, SIMON/WOLFGA	IRON MAN IV/JAMES R.

vid	first_transaction_date
4395186	2012-10-20 15:48:14
1579034	2011-11-22 20:54:59
4243589	2012-11-09 10:25:30
4410153	2012-10-22 22:11:36
801488	2011-09-09 09:15:17
168163	2013-03-02 13:44:11
6172882	2011-06-15 08:49:34
3625555	2012-12-30 19:26:02
3445329	2012-02-16 10:58:59
3066678	2011-06-11 08:29:10

dst_id	src_id	count	maxdate	mindate
323	3078975	1	2013-01-24 08:13:47	2013-01-24 08:13:47
402879	5973967	1	2012-06-25 18:36:26	2012-06-25 18:36:26
486496	1249276	1	2012-02-09 08:08:30	2012-02-09 08:08:30
3329289	2304250	1	2011-08-10 20:08:32	2011-08-10 20:08:32
2147733	487172	1	2012-07-23 22:18:52	2012-07-23 22:18:52
5954510	5954808	1	2012-06-27 05:37:25	2012-06-27 05:37:25
4353234	4353424	1	2012-10-28 23:37:10	2012-10-28 23:37:10
447	2332	1	2013-01-22 19:17:41	2013-01-22 19:17:41
2516599	3037958	1	2011-07-14 15:51:02	2011-07-14 15:51:02
39648	122633	1	2012-01-10 00:06:20	2012-01-10 00:06:20

__id	pagerank	delta
2489888	0.260869592538586	3.6963969329839585e-08
6053155	0.260869592538586	3.6963969329839585e-08
5950367	2.4097704236384656	0.00614223712590789
4535632	0.4139464431617136	0.00020096828954152546
1595474	0.26088544553011794	1.303254452578173e-06
2167547	0.26206541115032644	6.423895975049554e-06
6033210	0.2646819521125806	2.952188296467817e-05
26507	0.6057743767745043	0.0005785980045071026
228795	0.4160546105066548	0.00018974187558518096
4890216	0.4961839171030402	0.0005758725372384532

__id	first_transaction_date	pagerank
2489888	2011-07-25 19:21:34	0.260869592538586
6053155	2012-02-19 17:45:34	0.260869592538586
5950367	2012-06-27 11:33:42	2.4097704236384656
4535632	2011-07-26 16:34:35	0.4139464431617136
1595474	2011-11-18 15:52:04	0.26088544553011794
2167547	2013-02-25 20:39:50	0.26206541115032644
6033210	2012-06-18 16:48:36	0.2646819521125806
26507	2013-04-08 05:48:10	0.6057743767745043
228795	2013-02-06 19:41:35	0.4160546105066548
4890216	2012-09-11 07:44:51	0.4961839171030402

node	type
2001 10	comic
2001 8	comic
2001 9	comic
24-HOUR MAN/EMMANUEL	hero
3-D MAN/CHARLES CHAN	hero
4-D MAN/MERCURIO	hero
8-BALL/	hero
A '00	comic
A '01	comic
A 100	comic

__id	first_transaction_date	pagerank	total_degree	in_degree	out_degree
25	2012-04-18 10:46:16	100423.31634155997	610712	315492	295220
11	2010-11-07 20:29:49	103649.18860840582	610017	364118	245899
29	2012-06-17 20:02:15	90021.86345817539	300405	256900	43505
74	2011-05-03 18:14:27	39379.687739596164	225866	194385	31481
27	2011-08-08 09:40:32	24831.655129941617	130918	105986	24932
12564	2011-06-18 02:57:46	35912.81699058713	130377	101990	28387
12061	2012-10-11 19:05:25	13489.041112298517	121967	121716	251
870051	2011-07-05 13:23:23	0.8946729305093974	119082	3	119079
1877737	2011-07-05 16:31:41	0.15000638628449428	118909	1	118908
5104	2010-04-26 21:30:05	10317.817475328271	61308	52564	8744