Version | Comments | Download |
1.0 | Download |
ArrowV is a quick and manual command line crawler. It’s very inspired from the Firefox plug-in « Navicrawler ».
Basically you « jump » from siteweb to siteweb, and the program collect all the hyperlinks and create graph.
Ok great, but how it’s work ?
First: you need Python (I tested on v2.7.2, not sure it will work on v3, sorry)
Turn on your terminal (cmd or linux shell)
Launch the ArrowV like this
>> python ArrowV.py
It launch the program and you have a very shiny ascii art that welcome you .
Launch the component ‘navcom‘
>> Arrow V : navcom
You normally arrive at the navcom prompt.
Navcom
Navcom is the component developed to crawl the web manualy.
Jump command
First step is to jump to a website. For exemple, let’s jump to www.utc.fr (my university 😉
Arrow V [Navcom] > : jump http://www.utc.Fr http://www.utc.Fr > Downloading http://www.utc.Fr > Downloaded http://www.utc.Fr > Analyzing Page ** You Are at http://www.utc.Fr ** ** Here are links founded ** > [0] : u'http://abc-innovation.utc.fr' > [1] : u'http://interactions.utc.fr' > [2] : u'http://utcenligne.utc.fr' > [3] : u'http://www.tremplin-utc.asso.fr' > [4] : u'http://bibliotheque.utc.fr' > [5] : u'http://wwwassos.utc.fr' > [6] : u'http://ent.utc.fr' > [7] : u'http://www.facebook.com' > [8] : u'http://twitter.com' > [9] : u'http://www.youtube.com'
Don’t run away. I’m going to explain you.
The web page has been analysed to find all the hyperlinks (<a>) in the page. Internally, the program have created the graph.
The hyperlinks founded are displayed and you can jump just by using the number in front of the link.
For exemple if I want to jump to http://twitter.com now, I can type:
Arrow V [Navcom] > : jump 8
Scan command
With the scan command, the navcom will analyse all the neighbor hyperlinks to create the network.
For exemple, if you jump http://www.utc.fr and you perfom a scan, the program will automatically « jump » to every neibourght to analyse them and come back to the original point.
Arrow V [Navcom] > @www.utc.Fr : scan http://abc-innovation.utc.fr > Downloading http://abc-innovation.utc.fr > Downloaded http://abc-innovation.utc.fr > Analyzing Page http://interactions.utc.fr > Downloading http://interactions.utc.fr > Downloaded http://interactions.utc.fr > Analyzing Page http://utcenligne.utc.fr > Downloading http://utcenligne.utc.fr > Downloaded http://utcenligne.utc.fr > Analyzing Page http://www.tremplin-utc.asso.fr > Downloading http://www.tremplin-utc.asso.fr > Downloaded http://www.tremplin-utc.asso.fr > Analyzing Page http://bibliotheque.utc.fr > Downloading http://bibliotheque.utc.fr > Downloaded http://bibliotheque.utc.fr > Analyzing Page http://wwwassos.utc.fr > Downloading http://wwwassos.utc.fr > Downloaded http://wwwassos.utc.fr > Analyzing Page http://ent.utc.fr > Downloading http://ent.utc.fr > Downloaded http://ent.utc.fr > Analyzing Page http://www.facebook.com > Downloading http://www.facebook.com > Downloaded http://www.facebook.com > Analyzing Page http://twitter.com > Downloading http://twitter.com > Downloaded http://twitter.com > Analyzing Page http://www.youtube.com > Downloading http://www.youtube.com > Downloaded http://www.youtube.com > Analyzing Page
The option -d is the distance. By default, scan will only look on neighbor (so distance = 1). If you use
Arrow V [Navcom] > @www.utc.Fr : scan -d 2
The scan will look on neighbor and neighbor’s neighbor (understand?)
Gehpi Stream
I’m a gephi lover , and particulary the Stream function of Gephi (<3)
So you can activate the connection with gephi by typing upgephi.
Be sure that gephi is launched and the Streaming Master Server is running .
If you want to stop the streaming, just use downgephi
Arrow V [Navcom] > @www.utc.Fr : upgephi Gephi connector is ON Arrow V [Navcom] > @www.utc.Fr : downgephi Gephi connector is OFF
Other commands
info : to check where you are now
history : to see your jump history
map : to see the actual map
save <name> : to save your session (gdf file)
What the FAQ
Hey ! There is some bug / Hey I wanna change something
I know there is bugs and I’m sorry by advance, I’ll try to fix it asap.
But please, feel free to change the code. It’s in python and it’s open source !
If lot of people are interested, I’ll try to github / sourceforece / googlecode it.
Why name ArrowV ?
Because I love Wing Commander III (good old game that show lolcats will rule the universe)