GeoCitiesRank My SiteTake A TourMy GuestbookChat
Pages Like MineSearchSend This PageForums
Email Me
CapitolHill

  Number: 4203773

Started on 7-2-1998

Mission: To write a program in Java, to download entire sites to your hard-drive (Possibly the entire Internet YIKES !!!)!.

Well thats the aim although it isn't what I use it for.

I wrote SiteGrabber because I was sick to death of having to meaninglessly click my way through tens of links to download mp3's and other useful and err useless junk. Especially if half the links were broken or the sites didn't exist. It is also a painful process of either clicking on 30 files one by one or have 30 download windows all at once with v. slow and unreliable download rates.

What SiteGrabber will do is list all the files you can download from a given URL and then give you the option of copying them to your hard-drive one by one in a very fast and more reliable fashion. If not all the files are linked in that URL then you can ask SiteGrabber to do further flexible searches through the site. Everything is multi-threaded so you can copy, search, directory browse, view images at the same time

Now its this flexible searching that allows you to download entire sites and if you turn of host-name restrictions and wait a couple of lifetimes the entire linked Internet. (Not that you would have the space to save it either)

I am also developing a flexible FileViewer class as a side-line, to allow you to play audio files, movies, view graphics files and html after or during your downloads. (maybe using the Java Media thingo, but I don't really want to......)

Download Version 1.14(Beta) Here

Don't have java installed ? download the latest Sun JVM here for Windows 95/NT
Checkout the Screenshots and features here
Run SiteGrabber here (Currently not working?!?!?)
Email me here :)

Saturd
 
Friday 26th July
Not sitegrabber..

Uni offically over :)  Passed all my subjects, and I'm now going to graduate  yippi !!!!!!!!

As for SiteGrabber, I've been to busy doing other stuff.  Im currently making an effort to kill the searching bug thats stopping me from releasing it
y
 
Friday 7th July
SiteGrabber 1.15day 2

Ok, today I wasted time instead of Studying to do some SiteGrabber coding...

Finally put in my completely new html parsing routines, this new routine fixes the loading bugs and greatly improves the performance and efficency of the parser.  The new parser also allows me to easily add options to see the attributed text in the list instead of the actual links, and can even go as far as to implement a basic web-browser.  It also allows me to easily add multithreaded searching operations which I'm currently lookin at.

I've also removed all the restrictions as to which files will be searched, previously it would only search html, shtml, asp and text files.  Now it tries to determine what type of file it is and whether or not it can be searched.

Before I release it I'l be doing some rigerous testing on some interesting sites :)
 
 
 
 
Monday 22nd June
New News

Well, I've finally look to be finishing up at uni after 3 and a half years… I've taken up a position doing NT Admin in the City (Sydney) with pretty good pay and a car.

That worry of mine (being unemployed) out of the way, I'm going to continue on with SiteGrabber adding more features and whatnot.

Cost:

SiteGrabber will remain free always. However what I do require for you to do is send me a postcard from where-ever you live to my address :
 

SiteGrabber

P.O Box 23

Boronia Park 2111

Sydney, Australia
 

Wednesday 27th May 1998
1.14 released

Ok, I've decided to release 1.14 as is.

There is a known problem with loading files on windows/dos machines. I'm currently fixing this, and hopefully will get 1.15 out shortly. It was either delay the release for another week or get it out with all the groovy new features now.

*update*

Someone asked me if it was an example of how java isn't platform independent. This is not quite true, the problem is that Java doesn't understand drive letters in URLS. So instead of using the existing code structure I have, I'm going to respec the entire parsing system and rewrite it so that it works on streams rather than URLS. In other words, to get load to work I've had to hack the existing code to do something I never originally intended it to do. Unfortuantly, opening file://C/directory doesn't work in Java as it does in Netscape.

A workaround for the loading bug is to type in the filename using the file:// URL style.

If you come up with any new bugs or features you want, let me know !!!!!
 
 
Tuesday 19th May 1998
1.14 release soon

I've just got to clean up the file loading code, test it, then its release time. I think you'll find the wait worth it :)
 
 
Sunday 10th May 1998
Screenshots

I spent some time doing some screenshots today so that you can see what it does, and provide some help on how to do things..

As you will see I've started hacking and changing the UI to my kind of style where the emphasis is on simplicity, but allows advanced configuration of everything through the menubar.

I've still got to think of a way to make the search options more seamless and intuitive, always a problem judging real-estate penalties with better intuitiveness..

ImageViewer has been completely rewritten from the original crappy version that was distributed up till now. Its got panning installed, screen centering, full screen options. All of which is configurable from the menu.
 
 
Friday 8th May 1998
Sitegrabber

Version 1.1.6 of the Sun jdk has been released along with Swing version 1.0.2. 1.1.6 fixes the slow list item addition problem that I have mentioned before. Hooray, its now as fast as Microsofts JVM which is cool. I've also at long last fixed up the sort routines. I'll probably release my quicksorting and cruddy bubblesort code for the Vector sorting stuff, since Java won't support it until 1.2 with its collections class. Thats the one thing I couldn't believe they forgot/didn't put in in the first place, because sorting is the only bit that almost all applications use.

Saving your links list is now in, saved as a html file which is loadable by your browser. I've turned off save paths as default.

Heads Up
err, Seems the SiteGrabber applet runner is tempermental and doesn't always work ?!?!?? I'll have a look when I get a chance, its not on my major priority list. Netscape complains about running out of mem ? IE4 runs it fine then doesn't.... To be honest, run SiteGrabber on your system and not through the browser. You can't do much with it until I get a commercial certificate and start signing the archives. I'll put up a screenshot and Help page too, to give you an idea of how everything looks and works for thetime being including the Swing Version.
 
 
Wednesday 29th April 1998
SiteGrabber Swing

Well, the swing version looks good although there are some issues which requires a lot of work to figure out why they are occuring. The problems don't appear to make any sense. e.g I do FromList.setListData(VectorList); and
it doesn't automatically refresh, do a ToList.setListData(VectorDirectoryList) and it does.. Is there something
I'm doing wrong here ? Maybe it comes down to using GridBayLayout with Swing components, possible I guess,may even be a bug or too.
 
 
Saturday 25th April 1998 Anzac Day
Doing Stuff

Just made the Microsoft Development studio understand Swing.... Now I can write my swing code in my favorite environment, pity the Visual Java stuff is useless. I wonder if Microsoft will incorporate Swing when Java 1.2 is released ? Bit like shooting themselves in the foot when they are promoting AFC. Personally AFC shows that their programmers rely on the principle, the more complicated it is to use, the better it is. At least Swing keeps some sort of compatibility with AWT, wheras AFC is platform specific, hard to convert from AWT, and difficult to understand. It took me around half a day to convert SiteGrabber over to swing, with most of the time spent converting the List handling code over to JList and all I had was the online specs that come with it.

Oh well... For once I think Microsoft has gone down the complete wrong track with its AFC components, I mean the main reason people program in Java is for the Platform independence. If you want to target a specific platform,then why not use C++ ? If you want better performance but still use Java then develop a true platform specific executable code compiler (not a JVM) for Java. There are apparently a couple of good ones around, Symantec is good so I've heard, speed increases of up to 3 times than that of running the same program under the JVM.
 
 
Sunday 19th April 1998
Swinging along

Converted SiteGrabber to Swing today. Looks really cool, the Metal look is pretty nifty although I'm going to put
in a menu to allow you to change the L&F on the fly so to speak. I've only just scratched the surface of what you can do with it, and I've only really just converted awt stuff to its swing equivilant.
 
 
Friday 17th April 1998
Groan, car is in the workshop. Bus's make me get home sleepy

More weird and wonderful experiences with the Microsoft java compiler, this time it was being screwy about
innerclass's or interfaces accessing private members. Of course the error it spews out has nothing
to do with the problem but hey at least I knew what the problem was. Yet another bug the Sun jdk's fixed but Microsoft somehow overlooked. Now everything wears protection ;o)

Bugs:
erm, well save paths is broken, and so is the File overwrite. I've fixed both problems, i'll release another version
when the save and mkdirs are done
 
 
Tuesday 14th April 1998
ok, I'm releasing version 1.10, today.

Known issues:

This version will work with all JVM's, I haven't got round to implementing a save search option nor a mkdir function yet. I'll do that once I finish testing all the new code. Everything is done in separate threads, so you can have
multiple copy sessions going at once, you can select and copy stuff whilst the search is going on.

I don't know of any bugs yet, I think I've squished all the ones I know about. The user interface is a bit unintuitive still, the main problem is finding a way to put all the searching stuff together. I've got a few ideas though.
 
 
Easter Sunday April 12th1998
Finally crunched out a gross hack to fix the list/Directory Traversal under Microsofts JVM. I had a really good look at the problem today, and came up more mistified than ever. For some reason, a List.removeAll operation inside the ItemListener(1.1) or for that matter handleEvent (1.0) functions causes if the indexed item clicked is greater than 0 an invalid component exception error when it tries to delete that item internally(i.e inside the JVM). Now even if you code in a deselect, or delete that item b4 the removeAll operation it still comes up with the same Exception error. In the end, I had to remove the ToList component from the layout and then readd a new one, which looks rather gross. Considering Sun's JVM works fine, it has to be a obscure bug inside the JVM.

Anyway, that aside all the Copy stuff is now done in a separate thread again.
 
 
Good Friday April 10th1998
Been busy lately....First day of decent coding for weeks..

Upgraded all the event code to version java 1.1. Only took an hour or so, and everything looks groovy. At long last no more deprecation compile errors !!!!! In fact, I wish I had done it much sooner, the model is far more powerful in some ways. In the middle of adding a save option, which is sorely needed I think. More people are becoming interested in what I'm doing so at least that is something. Crunched out quite a few bugs, all the search stuff is now done in threads.

Swing:
I'm quite impressed by Swing. Looks heaps better than AWT, and is much more flexible. In fact I have decided to convert SiteGrabber to swing just for the better looks. From my very brief read of the docs you can change the look and feel on the fly just like x-windows. Anything from a win95 look to a motif look can be done by just a few commands.. KOOL !!!!

Anyway, I'm going to firstly try some test layout examples to get a feel of how everything works and fits together before doing anything to SiteGrabber. One of my major rules of coding, do some research FIRST before fiddling around with your code. It's a mistake I learnt very early on when I was doing the EzyESKY C++ project for Ezycom BBS software. You end up wasting more time hacking and making mistakes with your code to get things to work,
instead of having a clear and fresh understanding of what needs to be changed and why.

Pop3Manager:
Coming soon.... Watch this space, basically an applet to manage your pop3 accounts better than the mainstream
mail programs do. By the way, I really don't care if someone has done this before. I write this stuff because I would rather pay myself and learn something in the process than pay someone else in another part of the world and get no support.
 
 
Sunday 29th March 1998
Ok, we are up to version 1.06 (Beta)

Known issues:

Haven't Implemented .cab or .jar signing yet. So you need to have a Java JVM installed on your machine other than
your browser.

Sun's JVM:
7-6-98 Update... 1.1.6 fixes the Slow List bug. Sun JVM is now faster than the Microsoft one.
SiteGrabber currently runs on Sun's JVM with NO PROBLEMS, its just moderatly slow with its AWT implementation
as opposed to Microsofts. The performance pack does help somewhat.

Sun's List item addition is slow as hell on Windows machines. Looks like they have a X^2 algorithm as opposed to
microsofts linear version when adding or removing items from Lists.

Animated Gifs are displayed very slowly (Looking into this).

Microsoft JVM:

12-4-96 Update... I managed to code in a gross hack for this problem. It is definately a bug in the JVM.
Microsoft JVM causes problems when traversing the destination directory tree. This does not directly appear to be a programming fault on my part, it seems more to be some obscure bug of some sort in the List implementation and the use
of the GridBagLayout manager. The problem comes down to a single ToList.removeAll() operation, for some reason it gives weird exception errors. Mind you even the latest v 3.0 Miccysoft beta Java sdk does the exact same thing, so I'll keep lookin.

Animated Gifs are displayed very slowly (Looking into this ).

Fixed Problems:

Finally got around to fixing the root directory bug under the Microsoft JVM. Actually it isn't Microsofts fault, its something
Sun fixed but Microsoft didn't...
Added dos drive path support
Added Status bar.
Added Searching options...
Added Path save to option.
Added Countless Status messages when searching.
Added Hostname search restrictions
Added better messages when you have insufficient priviledges to list directories
 
 
Tuesday 17th March 1998
Ok I've been really slack in updating this page. The Project is coming on very nicely and is nearing completion.

Things still todo:
Add Status Dialog box when searching through http files..
Implement Exhaustive searching (This was the main aim originally)
Add Path save to option... So that you can get an exact copy of the site locally complete with path correctness
Migrate Event code to 1.1.
Clean up the User interface a bit. It looks a bit messy and unituitive

Major things have been added lately:

Migrated the code to java 1.1.5
Added Destination Directory browsing ( That was a pain in the arse)
Added Enhanced Html searching including hostname restrictions
Added ImageViewer (soon to become FileViewer) system
Added Proxy Setup.
Tested everything on Win95 and Solaris

Countless bugs have been hunted down.

I've looked at the AFC in Microsoft JV 1.1+ and decided that its not worth the trouble.
Going to look at Java Beans/Swing.
I've looked at adding Security Manager code to allow SiteGrabber be run on browsers (No decision yet)
 
 
Sunday 15th Febuary 1998
= Added mp3 filelist filtering tags

= Added fileoverwrite checkbox
= Added Movie file filtering tags
= Rewrote Filtering routines to make them more flexible
= redid the list item deletion properly. Now works FAST !!!!
= Fixed up the Copying status box...
= Fixed up 0 length file problems,
= Reduced the number of unneccessary FromList updates...
= Fixed Cancel button
? Problem with select all then copy ? Fixed
= Filtering with large lists very slow ? I made it much better
= Groan Programming Directory browsing in Java is a pain in the Arse !!!!
Why can't you just have a Select Directory option in the
File Dialog class ????? HMMMMMMm WELL !!!!!
 
Friday 13th Febuary 1998
Ok, I've worked on testing everything and getting it right. I still haven't finished the Destination directory tree browsing routines

you will just have to remember path names for now. To try it out, (Note it probably won't do much due to Browser
security settings) just click here PLEASE PLEASE NOTE, this is still an Alpha version, the core code is bugfree whilst
the User interface is still very very basic. I tested the program overnight, downloading something like 80 megs of graphics
stuff from around 5 sites all in one session.

You can download the class files here

To run, just do a " Java SiteGrabber.class"

worklog..
= Added filelist filtering
= Added Error Dialog boxes
= Changed clear button to a delete button
= Added Selectall, deselectall options
= Added File Overwriting check...
= Fixed a parsing bug, causing an index out of bounds error
= Widened the temporary destination path box
= Added list to brower window..
= Added CopyBox class
= Optimized delete FromList elements routine (Still lots room to make it better)
= Added threading to Copybox class
 

8th Febuary 1998
Finished most of the basic routines.....

Currently only downloads image files but umm thats what I'll mainly use it for.....

Here is my current work-log
? Haven't decided about the Destination browse problem
Either I ask for a directory via a textfield (Easy option)
or I make the List field tree browsable (Hard option)
or I use the FileDialog class, with the problems that incurs (Easy option)
but with nasty side-effects (Bloody java, why can't they have a select directory dialogbox ?????)
Fixed URL anchor problem.
- Started work on copy
= setup copy to copy just to a file called image.jpg in the current directory
for testing....
= Copy routine works !!!!!! Hey hey hey
= Paths now work..... hey hey hey !!!!
 
 
7th Febuary 1998
Primary note... Started work 7-2-98

I'm trying to make this work without assuming anything about the format of the tags. Any formatting that has been assumed for temporary simplicity is noted with comments, as no doubt it will cause problems with some html files.
As to why I'm writing this.... Umm to revise my java skills :)