A proposal to convert ODP/DMOZ to use Free Software has surfaced again and, in order to best present a case for it, I have found it helpful to do some research on how the current ODP hardware and software works. I'm creating links here to any information I find so that other can find it easily.
1999 ODP using P450s and Linux
ca 1999 story describes ODP as running on dual P450s running Red Hat Linux.
Advogato's trust metric
Paper describing the trust metric system used on Advogato and other mod_virgule sites. This system would lend itself very easily to an ODP editor community trust metric system.
Aggregating Recommendations using RDF
Research paper describing methods of agregating URL metadata using XML and RDF. The paper compares various method to ODP and discusses weaknesses of ODP.
Another PHP ODP RDF parser.
Forum thread: Dendograms and ODP data
Proposal to use machine learning to cluster and classify ODP cats. There is also a bit of into on the content of the terms.rdf.u8 file.
Forum thread: discussion about RDF format
Includes info about how "cooled" sites are encoded in the ODP RDF.
Forum thread: dmoz backend - what is it like?
Forum discussion on the nature of the ODP/DMOZ backend. Some software and hardware info. Implication is that the backend is a giant, centralized system running on a single computer rather than the expected distributed system. The combination of a non-distributed system and a usage that has far exceeded the original design, probably account for many of the current problems.
Forum thread: Dmoz Mirrors (bandwidth usage)
This thread contains the only clue I've found to bandwidth usage by dmoz/ODP. The dmoz mirror site, ch.dmoz.org, reports 12.5GB/month for Dec of 2002.
Forum thread: DMOZ Moved to Sun E4500
Forum discussion announcing the transition to a donated Sun E4500 server running Solaris and Apache.
Forum thread: Is dmoz running on a DBMS?
Forum discussion about the backend data storage. Conclusion of the thread seems to be that ODP uses a combination of flat files, the Berkely DB for hash pointers, and Perl. skrenta points out that the current design would make distributing ODP across multiple servers hard.
Forum thread: ODP Search MPL'd
Forum discussion about the one small piece of the ODP backend that has been released as open source.
Forum thread: RDF speculation
Forum discussion of ODP outgrowing its original design. Mostly speculation by other editors.
Forum thread: Request for source
Most recent forum discussion in which ODP editors request the opening of the ODP backend source so that they can help with development and bug fixes.
Forum thread: Why so slow?
A forum discussion about why ODP is so slow. Lots of editor speculation but few facts from the staff.
Free the ODP Software
tnt's bookmarks list forum discussions and other links to previous attempts at freeing the ODP software. It appears editors have been trying to get the source opened up since at least 1999 but it would appear a pretty big clue stick is going to be needed to make an impression on AOL/Netscape.
jmoz - the java ontology
An attempt to build an Java-based ODP-like system capable of importing the ODP RDF files. Project appears inactive but CVS contains functional code.
List of the ODP Staff
This is the only list I've been able to find of people who appear to be the staff of ODP. Both the "Chief Engineer" and the "Editor and Chief" claim not to have the power to open source the backend. I take this to mean there are more staff higher up still to be discovered.
Nurey's list of ODP RDF bugs
links to forum threads about specific RDF bugs that have been reported.
ODP RDF cleaner
Perl script to fix most of the out of spec RDF/XML in the ODP RDF dumps. Also attempts to remove the illegal chars that are present in most dumps.
ODP RDF cleaners
Unix shell scripts that use SED to convert the ODP RDF output into legal RDF format.
Sourceforge ODP Tools project. Most of the interesting stuff can be found in CVS including scattered notes on the inner workings of the ODP database and PERL code intended to parse the RDF files.
C Source code for ODPSearch. Also includes some Perl scripts that offer clues about the internal flat-file database formats used internally by the ODP backend.
PHP DMOZ RDF parser into MySQL
PHP Package parses the ODP RDF files into MySQL tables.
C Source for library that parses both real RDF and the RDF-like XML used by ODP
sfromis ODP RDF bookmarks
A collection of links related to the ODP RDF dumps
Sun Microsystems - 4500 Server Specifications
Specs for the E4500 server used by ODP.
Using ODP RDF to generate blacklists
Perl script that parses the ODP RDF and generates blacklists and whitelists for use in child-safe filters. Interesting in that it does not parse RDF or XML it just greps out the URLs.
Last update:January 2, 2007 at 17:25:24 UTC