On March 12, I attended an event organized by UCSD Extension “Big Data at Work: A Conversation with the Experts”. There were presentations from
- Larry Smarr, Ph.D., Founding Director, CALIT2
- Mike Norman, Ph.D., Director, San Diego Supercomputer Center
- Stefan Savage, Ph.D., Professor, Computer Science & Engineering, UC San Diego
- Michael Zeller, Ph.D., Chief Executive Officer, Zementis
Natasha Balac, Ph.D., Director, Predictive Analytics Center of Excellence, moderated the discussion panel.
Larry Smarr sounded exciting and optimistic. To illustrate the tsunami of data, he started with the old telling about rice and chessboard. On Wikipedia, it is going under “Wheat and chessboard problem”. If to start with one grain and double the amount of grains on each next square (1+2+4+8+16+32+64+ ….), on the 64th square of the chessboard alone there will be 263 = 9,223,372,036,854,775,808 grains of rice.
“On the entire chessboard there would be 264 − 1 = 18,446,744,073,709,551,615 grains of rice, weighing 461,168,602,000 metric tons, which would be a heap of rice larger than Mount Everest. This is around 1,000 times the global production of rice in 2010 (464,000,000 metric tons).”
Larry Smarr is one of the first adopter of monitoring his health using genome sequencing technologies as he sequences his gut microbiome as often as possible:
“If in the past just several variables from a blood test and weight defined me. Now, Billions of numbers define me! …
Healthcare and education are still pre-digital.”
In regard to efforts to harvest human genomics and microbiomics data, Larry mentioned the recent launches of the Human Longevity Inc. (here is more news on HLI from PR Newswire) and similar initiatives by Leroy Hood and George Church.
Mike Norman talked about Big Data initiatives at SDSC. To demonstrate the amount of available data in various domains, he kindly asked me to show the slide I made last summer (above; I borrowed the concept of circles and all data except biological, which is my estimate based on data available in public registered databases exclusively, from the Wired magazine). He also mentioned IntegromeDB among four Big Data projects running at SDSC. Among the two major challenges in the Big Data field Mike mentioned education and providing a computing environment for data storage, sharing and analytics.
Stefan Savage gave a fascinating talk about his research on the Internet security, abusive advertisement, web spam, bitcoin operations, and his live super-fast URL classification system with millions of features with online training (I am interested to do some research and write more about this system):
“Security is becoming a data-driven discipline. …
Security today is about understanding the environment. …
The data won’t be in personal possession. “
Michael Zeller talked about two groups of application of Big Data analytics, people & behavior and sensors & devices:
“Big Data buzz creates new business opportunities to disrupt existing market and to develop new platforms with new capabilities. … The challenge on the industry side is cutting through the noise of many existing solutions. … The future will bring a lot of data-driven applications — agents that will make decisions on your behalf. It will be seen on every level of life.”
In the end of the panel (remarks from which I provided above), Larry encouraged to watch the movie “Her”:
“It is going to be beyond science fiction in our life time. Everyone will have an intelligent system knowing much more about ourselves that we do.”
Declaimer: The citations provided above are not exact. They are provided based on my writing during the event.