How to use Python to figure out why your friends have more friends than you

Originally posted on thenextweb.

Allow me to demonstrate: The Friendship Paradox

Do your friends, on average, have more friends than you? If you are an average person, there is a high chance that you have fewer friends than your friends.

This is called the friendship paradox. This phenomenon states that most people have fewer friends than their friends have, on average.

In this article, I will demonstrate why such a paradox exists, and whether we can find the paradox in Facebook data.

GIF by author

Minimal example

Want your files back? Pay me

How to protect yourself from ransomware attacks

To understand why the friend paradox exists, let’s start with a minimal example. We will create a network of people. Two people are friends if they are listed in the same Python tuple.

import pandas as pd
import numpy as np
network = [
(“Ben”, “Khuyen”),
(“Ben”, “Thinh”),
(“Ben”, “Michael”),
(“Ben”, “Lauren”),
(“Ben”, “Josh”),
(“Lauren”, “Khuyen”),
(“Thinh”, “Michael”),
(“Khuyen”, “Josh”),
]
friends = pd.DataFrame(network, columns=[“person1”, “person2”])
friends
view rawcreate_data.py hosted with ❤ by GitHub

 

Now we will use Pyvis to visualize the network.

pip install pyvis

 

from pyvis.network import Network
net = Network(notebook=True)
# Get a unique list of friends
people = list(set(friends.person1).union(set(friends.person2)))
# Add nodes and edges
net.add_nodes(people)
net.add_edges(friends.values.tolist())
net.show(“minimal_example_with_edges.html”)
view rawvisualize.py hosted with ❤ by GitHub

 

If you move the nodes around, you can see that Ben is the center of the circle of friends.

We are interested in finding what percentage of people in this network have fewer friends than their friends do on average. We will create multiple functions that help us answer this question. The functions are:

  • Get friends of a specific person:

 

def get_friends(data: pd.DataFrame, person_id):
“””Get friends of a person with specified id”””
return (
data[data[“person1”] == person_id][“person2”].values.tolist()
+ data[data[“person2”] == person_id][“person1”].values.tolist()
)
view rawget_friends.py hosted with ❤ by GitHub

 

For example, Lauren’s friends are:

>>> get_friends(friends, "Lauren")
['Khuyen', 'Ben']
  • Get the numbers of friends a specific person has:

 

def get_num_friends_map(data: pd.DataFrame):
“””Get a dictionary of people and their number of friends”””
all_people = list(set(data[“person1”]).union(set(data[“person2”])))
return {name: get_num_friends(friends, name) for name in all_people}
def get_num_friends_of_a_person_friends(
data: pd.DataFrame, person_id, num_friends_map: dict
):
friends = get_friends(data, person_id)
return [num_friends_map[friend_id] for friend_id in friends]

 

>>> num_friends_map = get_num_friends_map(friends)
>>> get_num_friends_of_a_person_friends(friends, "Lauren",
...                                     num_friends_map)

[3, 5]

The result shows that Khuyen has 3 friends and Ben has 5 friends.

  • Get the number of friends a person’s friends have on average:

 

import numpy as np
def get_average_friends_of_a_person_friends(data: pd.DataFrame, person_id):
“””Get the average number of friends a person’s friends have”””
num_friends_map = get_num_friends_map(friends)
num_friends_of_friends = get_num_friends_of_a_person_friends(
data, person_id, num_friends_map
)
return np.mean(num_friends_of_friends)

 

>>> get_average_friends_of_a_person_friends(friends, "Lauren")
4.0

The output shows that the number of friends Lauren’s friends have on average is 4, which is higher than the number of friends she has.

Image by author

If we observe the picture above carefully, we can see that Ben’s number of friends gives a boost to the number of friends Lauren’s friends have on average.

Since Ben has many friends, many of his friends will be in a similar situation as Lauren. In other words, the number of friends their friends have on average is higher than their number of friends because one or two of their friends are the influencers.

  • Get the number of friends for all people in the network

 

def get_friends_df(data: pd.DataFrame):
“””Get the number of friends for all people in the network”””
all_people = list(set(data[“person1”]).union(set(data[“person2”])))
num_friends = [
{
“person_id”: person_id,
“num_friends”: get_num_friends(data, person_id),
“avg_friends_of_friends”: round(
get_average_friends_of_a_person_friends(data, person_id), 2
),
}
for person_id in all_people
]
return pd.DataFrame(num_friends)

 

 

num_friends_sample = get_friends_df(friends)
# Find whether a person’s friends have more friends than him/her on average
num_friends_sample = num_friends_sample.assign(
friends_have_more_friends=lambda df_: df_.avg_friends_of_friends > df_.num_friends
)
num_friends_sample
view rawget_friends.py hosted with ❤ by GitHub

 

In the table above,

  • The column num_friends shows the number of friends a person has.
  • The column avg_friends_of_friends shows the number of friends a person’s friends have on average.
  • The column friends_have_more_friends indicates whether a person’s friends have more friends than himself/herself on average.

Let’s find out what percentage of people in the network have fewer friends than their friends have on average.

 

num_friends_sample.friends_have_more_friends.sum() / len(num_friends_sample)

 

0.8333333333333334

Analyze Facebook network

The Facebook data consists of friend lists from Facebook. Facebook data was collected from survey participants, and the users in this data have been anonymized.

You can download the data from here. After downloading the data, unzip and save it as facebook_combined.txt

 

data = pd.read_csv(“facebook_combined.txt”, sep=” “, header=None)
data.columns = [“person1”, “person2”]
data
view rawread_data.py hosted with ❤ by GitHub

 

We will use the previous functions to get the number of friends people in the network have.

Let’s find out what percentage of people in the network have fewer friends than their friends have on average.

 

num_friends.friends_have_more_friends.sum() / len(num_friends)

 

 

num_friends.friends_have_more_friends.sum() / len(num_friends)

 

0.874721465709334

87% of people in the network have fewer friends than their friends have on average!

Visualize the influencers

Which nodes are the influencers in the network? Let’s visualize them using Pyvis.
Start with adding nodes to the network.

 

net = Network(“1000px”, “1000px”)
all_people = list(map(str, num_friends.person_id.values.tolist()))
net.add_nodes(all_people)
view rawnetwork.py hosted with ❤ by GitHub

 

We will define influencers as those who have more friends than their friends have on average. We will mark the nodes that are defined as influencers in red.

 

# Define influencers
influencers = num_friends[
num_friends[“friends_have_more_friends”] == False
].person_id.tolist()
# Mark the influencers red
net.nodes = [
{“id”: node[“id”], “label”: node[“id”], “shape”: “dot”, “color”: “#eb4034”}
if node[“id”] in influencers
else node
for node in net.nodes
]
view rawinfluencers.py hosted with ❤ by GitHub

 

Add edges and show the network graph:

 

edges = data[[“person1”, “person2”]].values.tolist()
net.add_edges(edges)
net.show(“all_people_with_edges.html”)
view rawdraw.py hosted with ❤ by GitHub

 

GIF by author

As we can see, the red nodes (people who have more friends than their friends have on average) tend to be in the center of the graph. If we move a red node, many blue nodes will move with it. This indicates that these red nodes are the influencers of one or two subgroups in the network.

GIF by author

Conclusion

Congratulations! You have just learned what the friend paradox is, and how to observe this paradox in Facebook data using Python. I hope this article will give you the motivation to observe other paradoxes around you using visualization and data science techniques.

Feel free to fork and play with the code for this article in this repo.

This article was originally published on Towards Data Science, you can find it here. Khuyen writes about basic data science concepts and enjoys playing with different algorithms and data science tools. You can connect with them on LinkedIn and Twitter.

 

Source: thenextweb