Introduction to Graph Computing Training Course

Overview

Many real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set of tools and processes — these tools and processes can be referred to as Graph Computing (also known as Graph Analytics).

In this instructor-led, live training, participants will learn about the technology offerings and implementation approaches for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using a Graph Computing (also known as Graph Analytics and Distributed Graph Processing) approach. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.

By the end of this training, participants will be able to:

Understand how graph data is persisted and traversed.
Select the best framework for a given task (from graph databases to batch processing frameworks.)
Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel.
View real-world big data problems in terms of graphs, processes and traversals.

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Requirements

An undersanding of Java programming and frameworks
A general understanding of Python is helpful but not required
A general understanding of database concepts

Audience

Developers

Course Outline

Introduction

Graph databases and libraries

Understanding Graph Data

The graph as a data structure
Using vertices (dots) and edges (lines) to model real-world scenarios

Using Graph Databases to Model, Persist and Process Graph Data

Local graph algorithms/traversals
neo4j, OrientDB and Titan

Exercise: Modeling Graph Data with neo4j

Whiteboard data modeling

Beyond Graph Databases: Graph Computing

Understanding the property graph
Graph modeling different scenarios (software graph, discussion graph, concept graph)

Solving Real-World Problems with Traversals

Algorithmic/directed walk over the graph
Determining circular cependencies

Case Study: Ranking Discussion Contributors

Ranking by number and depth of contributed discussions
A note on sentiment and concept analysis

Graph Computing: Local, In-Memory Graph toolkits

Graph analysis and visualization
JUNG, NetworkX, and iGraph

Exercise: Modeling Graph Data with NetworkX

Using NetworkX to model a complex system

Graph Computing: Batch Processing Graph Frameworks

Leveraging Hadoop for storage (HDFS) and processing (MapReduce)
Overview of iterative algorithms
Hama, Giraph, and GraphLab

Graph Computing: Graph-Parallel Computation

Unifying ETL, exploratory analysis, and iterative graph computation within a single system
GraphX

Setup and Installation

Hadoop and Spark

GraphX Operators

Property, structural, join, neighborhood aggregation, caching and uncaching

Iterating with Pregel API

Passing arguments for sending, receiving and computing

Building a Graph

Using vertices and edges in an RDD or on disk

Designing Scalable Algorithms

GraphX Optimization

Accessing Additional Algorithms

PageRank, Connected Components, Triangle Counting

Exercis: Page Rank and Top Users

Building and processing graph data using text files as input

Deploying to Production

Closing Remarks

Posts

Overview

Requirements

Course Outline

Leave a Reply Cancel reply