All Collections
Case Studies
Network Mapping with Gephi and CrowdTangle
Network Mapping with Gephi and CrowdTangle

Instructions for formatting CrowdTangle data for Gephi

C
Written by Christina Fan
Updated over a week ago

Gephi is an open-source network mapping tool that allows you to track the spread of URLs or hashtags within groups or pages. Download Gephi here.

This document explains how to map the spread of links in groups or pages that CrowdTangle tracks. We can see which links got the most interactions, and which groups/pages shared links the most frequently. We can also break down the interactions by type, or add additional metrics to look at -- for example, which links got the most shares, or views (if the post is a Facebook video).

Note: This is NOT a comprehensive overview of Gephi - it is a tutorial for one method of formatting CrowdTangle data for Gephi. Gephi is a powerful tool with many other uses -- please look to Gephi tutorials on YouTube to fully understand this tool.

Network graphs made using this methodology can tell you:

  • Which accounts shared certain URLs the most frequently

  • Which URLs are shared the most often between those accounts

  • Which accounts shared content the most frequently with each other (e.g. are clustered together)

The graphs cannot tell you:

  • Whether the interactions were organic, or done by bots

  • Which users posted links to groups

Definitions:

  • Nodes are the circles/dots in a network map. In our case, nodes are both accounts and links.

  • Edges are the lines that connect the nodes. In our case, the edges are the connection between the entity and the URL that it shared.

How to format CT data for Gephi:

Download a CSV of data for a set of accounts for a given time period. This could include:

  • A search for accounts using specific keywords

  • Accounts sharing specific URLs

  • A set of accounts you’re investigating, without any pre-set common keywords or URLs.

You may want to start with a smaller number of posts at first (a few thousand, as opposed to tens of thousands), because Gephi uses a lot of memory on your computer.

You will be creating two different CSVs from this CSV -- one for nodes (the dots in the map), and one for edges (the lines connecting the dots). This example uses a search for #WalkAway in Groups, but you can do this for Pages as well, and use a URL, domain, or set of keywords instead.

NODES

A. Open a new spreadsheet. Name it NodesPrep, and save. From the CrowdTangle CSV, copy over the columns for Group Name, as well as any metrics that you want to take into account with the mapping (likes, comments, shares, total interaction, etc). Here, I've pasted Group Name in column A, and Total Interactions in column C.

B. Now, we need to dedupe the entities. Copy the Group Name column into a new column (column G). Click on “Data” in the toolbar, and then highlight the copied columns and click “Remove duplicates”. Name this column "ID". Copy column G into column H. Name this column "Label".

C. We need to calculate the total number of times each node appears, as well as the total metrics for each. Label column I “Appearance”. Then, in the cell I2, use the COUNTIF Excel formula to count the number of metrics for the first group:

=countif(A:A, H2)

Extend this formula to the whole column to calculate the number of times each node appears. You can do this by double clicking on the lower-right-hand corner of the cell.

D. Now, sum the interactions by type for each node. To do this, use the Excel formula SUMIF with column A as the range, cell H2 as the criteria, and column C (or whatever column you put whatever your metric of interest in) as the sum range:

=SUMIF(A:A, H2, C:C)

Repeat this process for the other metrics of interest by replacing C:C with the column letter of the metric as you originally pulled it from the CrowdTangle CSV download. In the below example, E:E is the column that lists the total number of interactions (see point A above).

Lastly, add a column for the type of node. In this example, we’ll call it “fb group”. Again, extend this label to the entire column.

e. Now that we’ve added the group nodes, we need to add the URL nodes. You can do this either with the domains (isolate them using the Links columns and the Text to Columns tool), or with the raw URLs. We recommend simplifying the URLs by removing any queries from the end. Create a new sheet in your Nodes spreadsheet, and name it links. Paste in the list of links from the original CSV, and then repeat steps b through d to create a deduped list of links with the # of appearances and sum of each interaction of interest. Name the type “website”.

f. Lastly, it's time to create your final list of nodes. Create a new spreadsheet (not a new sheet in the same workbook) and name it NodesFinal. Here, paste special > values only the cells with deduped nodes (both groups and links), appearance counts, and metric counts, stacked on top of each other. Make sure to save this as a CSV.

The top of your spreadsheet should look like this:

And there’s no need to re-paste the column labels for the links, just paste them in the same columns directly below:

EDGES

a. Now, we need to create a spreadsheet for the edges. Create a new spreadsheet and name it Edges. Make sure to save it as a CSV. In column A, paste the list of group names from the original CSV download - not the deduped list. Title this column “source”.

b. In column B, paste the list of URLs from the original CSV download - not the deduped list. Title this column “target”.

c. Save the spreadsheet. Our CSV prep is done!

MAPPING IN GEPHI

Now, we’re ready to import the nodes and the edges CSVs into Gephi. After installing Gephi on your computer, click File > Import spreadsheet. Select your NodesFinal CSV, and make sure the graph type is directed. You can import to a new workspace.

Now, you should see this:

Let’s add the edges in. Again, select Import Spreadsheet, and this time, import the Edges CSV. Again, make sure the graph is directed, but this time, append the spreadsheet to the existing workspace.

Now, we’re ready to map.

Run Modularity in the statistics tab. Don’t worry about the settings, just click "ok." Close the window that pops up afterward.

Now, click on the color palette icon, and click partition. Select Modularity Class from the dropdown, and click “apply”. Your map will now have colors.

Let’s adjust the node size. Click on the concentric circles (Size), and set your preferred node size. We’ve chosen 10. Then click Apply.

Lastly, let’s change the size ranking. Still under size, click “Ranking.” Start by selecting In-Degree from the dropdown -- this will allow us to see the URLs that were most frequently shared in this set. Set your minimum size to 10 (to match the size you chose earlier), and the max to 100, in this case. You can play with these settings later.

Click Apply.

Now, choose your layout. Select Force Atlas 2, and click Run. This layout will bring accounts that frequently shared amongst each other closer together.

Click Stop after about 10 seconds, as most of the meaningful mapping will have happened by then, and it takes up a lot of memory.

Now, let’s add labels. Click the black T at the bottom of your workspace. You may need to zoom out by two-finger scrolling away from yourself on your trackpad.

Then, select Node size to make the labels proportionate to the nodes:

Slide the label size slider all the way to the left so you can see the labels more clearly:

And now, you can see the links that were most shared!

To see the groups that shared the most links, switch your size ranking to out-degree, and click apply:

Now, you can see the groups that shared the most links. You can change the label size if necessary using the same slider:

Groups that are clustered closely together represent instances where links were shared frequently between those groups. If a group is farther away, that means that there weren't many sharing connections between that group and other clusters of groups.

If you’re interested in seeing which groups got the most of a certain metric, you can change your size ranking to that metric (e.g. total interactions, views, shares, etc.) You would have needed to add those metrics into the Node CSV in order to use them (make sure to use SUMIF as we did for Total Interactions), so you may need to go back, add them to the CSV, and recreate your map.

Happy mapping!

Thanks to Adi Cohen of Storyful for helping to create this walkthrough.

Did this answer your question?