- SEJ – Search Engine Journal
- http://blog.promptcloud.com – blog on the theories and other technicalities surrounding web crawling and data extraction
- http://linkeddata.org – resources on Linked Data on the Web and the Semantic Web. Contains tutorial and guides, e.g., this Intro to the Semantic Web
- Top 33 Free Data Mining Software – with a brief description of each. My personal take: RapidMiner has the fastest (a few hours) learning curve, it is the best for fast learning and applying various ML algorithms, without going into much details; R has a clear logic and a pleasure to work with, although it is harder to learn, especially for non-programmers, but if you want to really be flexible in R you have to master programming in it, and it cannot be done for a couple of weeks. I struggled with Rattle GUI for R, and gave up without any luck running it on my Mac.
- Top 25 Free Data Analysis Software – with a brief description of each
- Top 30 Software (a third is free) for Text Mining – both RapidMiner and R have free modules for text analysis. STATISTICA has a module for text analysis; while it is not free (and expensive) but solidly done software — I used it a lot in the past and liked it.