Below is the list of articles I would put in the must-read category for everyone (never complete, obviously):
- Biology: The big challenges of big data. Nature 498, 255–260 (13 June 2013) – Discusses the status of Big Data in molecular biology and provides some numbers on amount of data in molecular biology. Thus, the European Bioinformatics Institute (EBI) in Hinxton, UK, stores 20 petabytes of data and back-ups about genes, proteins and small molecules; with genomic data accounting for 2 petabytes of that. EBI hosts its the most heavily used resource for genome analysis, Ensembl Genome Browser, on Amazon Web Services’ Elastic Compute Cloud (EC2). EBI is building a cloud-based infrastructure called Helix Nebula — The Science Cloud. Among commercial providers to choose are Rackspace, VMware, IBM, Microsoft. There are discussed the problems of packaging and transferring big files with sequence information (a single file with sequenced information of human genome can exceed 100GB).