The Final hack of the Gephi Streaming Plugin

We saw that we can use Gephi Streaming Plugin to push data into Gephi. It works well , but it brings a lot of constrains such as :

  • You need to know Gephi Instance Address, which makes it tricky when you want to do over internet communication.
  • It’s HTTP Push based, which technicaly isn’t optimum, especially for high throughput stream.

It could be nice to have reverse working : Gephi connecting to an entrypoint and generates graph from it.

Gephi as consumer of an entry point

I manage to find how to do that : HTTP Streaming. I won’t go to the details, because spoiler alert : it’s nice but not that much.

The trick is to create a HTTP Server that streams the data from an entrypoint that you access with the client part of the Gephi Stream plugin. That works, and that’s interesting because you can then provide an entrypoint on your webserver in internet and share it to other gephi users to have the same stream.

It might be interesting for some use cases so I share you the code of a simple web server here : https://github.com/totetmatt/graphStreamServer

But with this method, Gephi behaves weird when you use filters and I found it more unstable than when you’re using it as a « Master ».

So here is the deal, we would like to share across the internet a Graph Stream, but with Gephi set as Master. It’s theorically feasable, but it involves networking operation in your home. Not acceptable, we are targeting a solution that (almost) everybody can do.

Websocket strikes back

Websocket is bi-directonnal communication, which means you can receive and send data within the same « connection » and asyncronously (you still have a Client / Server relation, but the Client can be as talkative as needed). If you follow this blog, I’ve already talked about Websocket and the Gephi Streaming plugin , where I explain that Gephi was able to push data of it’s own graph to client over websocket protocol.

Now, we want the opposite. It’s possible…… and it works smooth !

What we need to test is a simple http web  page, that will connect to Gephi and send data to generate a Graph.

Gist file : https://gist.github.com/totetmatt/49d2c5dd243918068320

<html>
<head>
    <script type="text/javascript">
        window.onload = function () {

            // Gephi Streaming Master should be launch in your computer 
            var websocket = new WebSocket("ws://localhost:8080/workspace0?action=updateGraph")
            websocket.onopen = function (event) {

                    // Sending event following the API and message structure defined
                    websocket.send('{"an":{"a":{"label":"a"}}}')
                    websocket.send('{"an":{"b":{"label":"b"}}}')
                    websocket.send('{"ae":{"ab":{"source":"a","target":"b"}}}')

                    randomGenerate()

            };

            // A Quick & Dirty example to see the "real-time" graph 
            function randomGenerate(){
                  setTimeout(function(){
                     var test = (Math.floor( Math.random() * 50 ) + 1)%2 ;
                      for (i = 0; i < 10; i++) { 
                            var id = Math.floor( Math.random() * 50 ) + 1 ;
                            websocket.send('{"an":{"'+id+'":{"label":"'+id+'"}}}')
                        }
                        for (i = 0; i < 10; i++) { 
                            var source = Math.floor( Math.random() * 50 ) + 1 ;
                            var target = Math.floor( Math.random() * 50 ) + 1 ;
                           

                            if(test===0) {
                                websocket.send('{"ae":{"'+source+'-'+target+'":{"source":"'+source+'","target":"'+target+'"}}}')
                            } else {
                                websocket.send('{"de":{"'+source+'-'+target+'":{"source":"'+source+'","target":"'+target+'"}}}')
                            }

                        }
                        randomGenerate()
                    }, 2000)
            }
            // Triggered when we received a message
            // Here Gephi is actually propagating any changes to all the client (including yourself)
            // It can be usefull but can also be ignored
            
            /*
            websocket.onmessage = function(message){
                console.log(message)
            } 
            */
      
        }
    </script>
    <title>Websocket Gephi</title>
</head>
<body>
    Run Gephi and Gephi Streaming Master, then refresh this page.
</body>
</html>

(Pro Tip, Websocket isn’t bounded to javascript within Html page, if you try with other languages with a good websocket library, it will works the same way)

The graph is quite trivial, but you just need to extand that with a backend or another JavaScript process within your page to generate complex graph data.

We still need to connect to the master, why is it an improvment then ?

It’s not out of the box, but actually your html page can be a « proxy » page that do the bridge between a server and a gephi instance.

Which means you can create application with this architecture :streamWebSocket

With this, you can generate a graph within your remote server, send it to all clients and then the client decide where to send the data. No too complex operation for both part, and no big issue with Gephi, as it runs as a master.

We can even think about having data processing within the html page, with the advantage that you could let the web application doing a custom computation before generating the graph.

streamWebSocket2
This is madness ? THIS IS LIVE GRAPH !

Websocket vs REST call

What is the advantage of this technics over the old one that consist to generate one call per actions (used in Naoyun for example) ?

On the main advantage is technical : It creates a unique connection in Gephi that stays until it stops. I didn’t perform proper test, but I might expect it helps a lot the JVM in terms of memory an garbage collector.

It doesn’t really solves the fact that you need to have a Gephi Master that listen to event rather than a consumer, but the solution proposed here is quite convenient without touching the source of the plugin.

Next hack

With the differents post about Gephi Streaming Plugin, I think we are reaching the limit of hacking around the plugin. Most of the improvment need to be coded in the plugin. I’ve tried several time to contribute to the code, but I never manage to get used with the Gephi Java API and the way the plugin is coded.
With the release of Gephi 0.9 approaching, and the various change it will bring, I think spending time on it won’t be a good bet. I prefer to wait the release of Gephi 0.9, check the compatibility, and try to release a complete new Streaming Plugin. Mainly could be nice to have a default client / server websocket + other new broker system like Kafka or Redis pubsub. The internal data managment need to be review also to be more flexible and stable than today.

It doesn’t mean no more fancy live graph ! There is multiple technologies evolution today that fit exactly into subject of live graph analysis and that will be interesting to deep dive, like Spark with the Streaming and GraphX module for live graph analysis.

One of the next challenge is to « get rid » of Gephi by going full web, which sounds promising with libraries like Linkurious and Vivagraph.js .

Stay tuned 🙂