Technologies

  • SEJ – Search Engine Journal
  • http://blog.promptcloud.com – blog on the theories and other technicalities surrounding web crawling and data extraction
  • http://linkeddata.org – resources on Linked Data on the Web and the Semantic Web. Contains tutorial and guides, e.g., this Intro to the Semantic Web
  • Top 33 Free Data Mining Software – with a brief description of each. My personal take: RapidMiner has the fastest (a few hours) learning curve, it is the best for fast learning and applying various ML algorithms, without going into much details; R has a clear logic and a pleasure to work with, although it is harder to learn, especially for non-programmers, but if you want to really be flexible in R you have to master programming in it, and it cannot be done for a couple of weeks. I struggled with Rattle GUI for R, and gave up without any luck running it on my Mac.
  • Top 25 Free Data Analysis Software – with a brief description of each
  • Top 30 Software (a third is free) for Text Mining – both RapidMiner and R have free modules for text analysis. STATISTICA has a module for text analysis; while it is not free (and expensive) but solidly done software — I used it a lot in the past and liked it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s