Yes you will be extracting categories and URLs from Dmoz but you won’t just be scraping the exact same information, you’ll be extracting new meta titles and descriptions for each website so your content will be unique.
This single tool was the answer to my prayers when I needed lots of unique content and needed it fast. With a single extraction you are able to spider thousands of websites all relative to your main directories theme helping you to seed your directory with relevant listings.
So here’s the software I use. It’s called DMOZ Extractor 2 from PJLtechnology.com and it costs a measly $35. I think it is worth so much more than that.
When you see this software in action I think you will be impressed. You can literally watch it as it jumps through each subcategory extracting each page for the categories and links. And once it’s finished you have a database of links which the software will begin to spider at the press of a button. Simply press the spider button and let it do its thing.
When you first open up the dmoz extractor it looks like the screenshot above. You’ll see what looks like the homepage of dmoz. It contains all the main top categories in dmoz. You can click on any category which will take you to the subcategories. From here you can select the category you want to extract the links from. Here I’ve clicked on the Fitness category…
Once you are at the subcategory you want to extract the links from all you have to do is click on the “Auto Extract Deep” button (see below). This will start the extraction process. The time it takes to extract depends on how many categories and links there are in your selected category.

The extraction process can take from a few minutes to process to a few hours depending on how many categories and links you want to extract.
Once it’s finished you’ll see a screen like the one below. You just have to click on “View Database” at the top of the left navbar.
You’ll see all the URL’s it pulled from DMOZ exactly as they are found in DMOZ. You don’t want to use this exact content because it will be duplicated and you want unique content for your directory. That is why it is important to now spider each URL for new information.

Once the dmoz extractor has finished extracting all the categories and links you can view all your data by clicking on the “View Database” button at the top of the left hand navbar.
This data is exactly what is listed at dmoz. To get around being penalized for duplicate content you can spider each URL for a new title, description, and keywords. The software will even pull an email address off each site if it finds one.
To start spidering the URL’s you need to click on “Spider For Tags” at the bottom of the lefthand navbar. Click all the spidering options and click OK. This will start the spidering process. The time it takes to spider all the URL’s in your new database depends on your internet connection. If you have DSL or Cable this will go fairly fast.
Once the spider process has finished you’ll have something similar to the below graphic.

Once the spidering process is complete you can now export the data into files you can easily import into Indexu.
To export the data go to “Output” and you’ll see a few options. To import the files successfully into Indexu you need to select “Gossamer Threads Links 2”
When you click on “Gossamer Threads Links 2” a dialog box will pop up asking you were you want to export the data. Select a folder either under your sites main folder or somewhere you want to keep all your databases. I’ve created a folder solely for my dmoz extraction files.
When you export the files you’ll have several but you only need two of them, the categories.db and the links.db files (see below).
These two files are your categories and your links (URL’s).
As you can see this tool is very easy to use. You can set it up and hit run and come back in an hour or so and have a complete database of categories and URL’s ready to import into your Indexu directory.
You can then edit the URL’S as you see fit and delete the ones that are bad ie. They don’t have a title, wrong title, no description or contain errors. You can do all this from Indexu’s link search function.
To import the database you need to log into your Indexu directories admin section and go to Import which is in the left hand column down towards the bottom.
Through doing this I’ve found that it only works with about 6,000 tables at a time. So if you have a database with 30,000 links you have to break apart your .db files into 6,000 tables each using a simple text editor like notepad. So your links.db file will now be broken into links1.db with tables from 1 – 6,000, then links2.db will contain links 6,001 – 12,000 and links3.db will contain links 12,001 – 18,000 and so on.
If you have more than 6,000 categories then you’ll have to do the same thing although it’s most likely you will never have that many categories. So you’ll just have to do it on the links database.
Once you have your links.db files broken down you can then start importing them. You first start with the category.db file as you need your categories set up first before you set up the links. In your Indexu’s admin section, simply login and click on Import under the Database area of the left hand nav bar.
You will see a field to upload your files and a database selection dropdown menu. Select GT2.0 as the database type and browse your computer for the category.db file. You also need to make sure the “category” field is selected. Select your category file and import it. It should take a few seconds to import.
Once you do that it’s good to make sure they were added correctly. So click under the category section “rebuild category structure” and that will build all the categories for your directory. Navigate to the home page and you can now see that all your categories are there.
Now it’s time to import all your links. You do this the same way as the categories except you select the “links” field and you select the first links1.db file you edited. You then select “starting from 0” because this is the first file and it starts with 0. It will take a few seconds or minutes then you can continue with the second file.
You do this the same way but your starting number will be 6,001 instead of 0 because you already have the first 6,000 listings in your database. You do this down the line until you have imported every links.db file.
So you have your categories and links imported so now it’s time to do some house cleaning. A lot of dmoz’s links are bad or the sites listed have no titles or titles with meaningless names, or worse you get 404 errors, Denied or Error in the title. So you have to search for those words and delete those listings or edit them if you want. I usually delete them because it’s faster but you might want to edit them if you want to keep them in your database.
These are the types of words that I search for to clean up the listings:
- Error
- 404
- Declined
- Denied
- Moved
- Title
- Exists
It helps to browse your directory now that all the links have been imported and just check to see if you have any weird listings showing up. A lot of the time you’ll have sites listed that are not that great, or they are no longer the same site ie. someone bought the expired domain and started a whole other site under the domain. This is when you want to go through and delete any peculiarities.
Voila, you now have a directory completely seeded with listings!






Recent Comments