We saw that we can use Gephi Streaming Plugin to push data into Gephi. It works well , but it brings a lot of constrains such as :
- You need to know Gephi Instance Address, which makes it tricky when you want to do over internet communication.
- It’s HTTP Push based, which technicaly isn’t optimum, especially for high throughput stream.
It could be nice to have reverse working : Gephi connecting to an entrypoint and generates graph from it.
Gephi as consumer of an entry point
I manage to find how to do that : HTTP Streaming. I won’t go to the details, because spoiler alert : it’s nice but not that much.
The trick is to create a HTTP Server that streams the data from an entrypoint that you access with the client part of the Gephi Stream plugin. That works, and that’s interesting because you can then provide an entrypoint on your webserver in internet and share it to other gephi users to have the same stream.
It might be interesting for some use cases so I share you the code of a simple web server here : https://github.com/totetmatt/graphStreamServer
But with this method, Gephi behaves weird when you use filters and I found it more unstable than when you’re using it as a « Master ».
So here is the deal, we would like to share across the internet a Graph Stream, but with Gephi set as Master. It’s theorically feasable, but it involves networking operation in your home. Not acceptable, we are targeting a solution that (almost) everybody can do.
Websocket strikes back
Websocket is bi-directonnal communication, which means you can receive and send data within the same « connection » and asyncronously (you still have a Client / Server relation, but the Client can be as talkative as needed). If you follow this blog, I’ve already talked about Websocket and the Gephi Streaming plugin , where I explain that Gephi was able to push data of it’s own graph to client over websocket protocol.
Now, we want the opposite. It’s possible…… and it works smooth !
What we need to test is a simple http web page, that will connect to Gephi and send data to generate a Graph.
Gist file : https://gist.github.com/totetmatt/49d2c5dd243918068320
We still need to connect to the master, why is it an improvment then ?
It’s not out of the box, but actually your html page can be a « proxy » page that do the bridge between a server and a gephi instance.
Which means you can create application with this architecture :
With this, you can generate a graph within your remote server, send it to all clients and then the client decide where to send the data. No too complex operation for both part, and no big issue with Gephi, as it runs as a master.
We can even think about having data processing within the html page, with the advantage that you could let the web application doing a custom computation before generating the graph.
Websocket vs REST call
What is the advantage of this technics over the old one that consist to generate one call per actions (used in Naoyun for example) ?
On the main advantage is technical : It creates a unique connection in Gephi that stays until it stops. I didn’t perform proper test, but I might expect it helps a lot the JVM in terms of memory an garbage collector.
It doesn’t really solves the fact that you need to have a Gephi Master that listen to event rather than a consumer, but the solution proposed here is quite convenient without touching the source of the plugin.
With the differents post about Gephi Streaming Plugin, I think we are reaching the limit of hacking around the plugin. Most of the improvment need to be coded in the plugin. I’ve tried several time to contribute to the code, but I never manage to get used with the Gephi Java API and the way the plugin is coded.
With the release of Gephi 0.9 approaching, and the various change it will bring, I think spending time on it won’t be a good bet. I prefer to wait the release of Gephi 0.9, check the compatibility, and try to release a complete new Streaming Plugin. Mainly could be nice to have a default client / server websocket + other new broker system like Kafka or Redis pubsub. The internal data managment need to be review also to be more flexible and stable than today.
It doesn’t mean no more fancy live graph ! There is multiple technologies evolution today that fit exactly into subject of live graph analysis and that will be interesting to deep dive, like Spark with the Streaming and GraphX module for live graph analysis.
One of the next challenge is to « get rid » of Gephi by going full web, which sounds promising with libraries like Linkurious and Vivagraph.js .
Stay tuned 🙂