Robert Grossman's Home Page
This web site contains some technical publications and talks about computing with data written by Robert Grossman. There are over 120 publications available online.
Robert Grossman's blog is available at: blog.rgrossman.com.
Recent News
- Here are some recent talks and papers on cloud computing.
- The Open Cloud Consortium is developing standards for open clouds and for frameworks for interoperating different clouds.
- Sector version 1.19 was released on February 23, 2009.
- The site was last updated on April 10, 2009.
About Robert Grossman
Robert Grossman is the Managing Partner of Open Data Group. Open Data helps companies develop and improve their analytic strategies and provides outsourced analytic services so that companies can increase revenues, decrease costs, and improve business processes. Please contact him at info at opendatagroup dot com if you would like to work with him.
He is also the Director of the Laboratory for Advanced Computing (LAC) and the National Center for Data Mining (NCDM) at the University of Illinois at Chicago. LAC and NCDM develop open source technology for internet-scale and cloud computing, such as UDT and Sector and hosts the development of standards, such as the Predictive Model Markup Language (PMML). Please contact info at lac.uic.edu if you would like more information about the Laboratory or Center.
Robert Grossman provides services to support litigation and as an expert witness in the general areas of Internet technology, predictive modeling and data minining, risk modeling, e-business, e-marketing, analytic architectures, and high performance computing and networking. He has been active in these areas since the mid 1980's and can address these areas from his personal experience, from a business perspective, or from a research perspective. Please contact him at info at opendatagroup dot com if you would like to work with him.
He is a Member of the Board of Directors of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) for the term 2005-2009.
Biographical material can be found here.
News
- December, 2008. NCDM selected as one of 10 ten cool networking labs. The National Center for Data Mining was one of "10 really cool university networking labs" described in the December 16, 2008 issue of Network World in an article by Bob Brown. See slide 11 of the 12 slides or see page 4 of the article.
- December, 2008. Sterling Commerce adopts UDT. Sterling Commerce, an AT&T Inc (NYSE: T) company, announced Sterling File Accelerator (SFA). SFA combines the power of the company's Connect:Direct point-to-point file transfer software optimised for high-volume, secure, assured delivery of files with a new UDP Data Transfer-based file transport (UDT) - an application-level data transport protocol that overcomes the latency issues associated with transmission control protocol (TCP)-based transmissions. Source: iTWire.com. UDT is an open source application that was developed by the NCDM and is available via Source Forge.
- December, 2008. Some recent work on clouds. For some of my recent talks and papers on cloud computing, see the following page on cloud computing.
- November, 2008, Sector/Sphere and the Open Cloud Testbed win SC 08 Bandwidth Challenge. The NCDM and Open Cloud Consortium entry that highlighted several applications developed with the Sector storage cloud and Sphere compute cloud and running on the the Open Cloud Testbed won the SC 08 Bandwidth Challenge.
- November, 2008, a HIPAA-compliant version of the Sector Cloud. Sector/Sphere was demonstrated at SC 08 in Austin from November 15-20, 2008. A HIPAA-compliant version of Sector/Sphere was announced at SC 08, which should be available for general release in early December, 2008.
- November, 2008, Interoperability of storage clouds. The Open Cloud Consortium completed the first part of a study looking at how different storage clouds could interoperate with a common API. Thrift was used for the first part of the study. Based upon measurements using the Open Cloud Testbed, the overhead of using Thrift to access Hadoop and Sector was minimal. The study is on-going and results should be released in January, 2009.
- November, 2008, Cistrack and Flynet released. For the past few years, we have been developing a lightweight, but scalable, infrastructure for data intensive bioinformatics called the Chicago Utilities for Biological Sciences (CUBioS), which is pronounced like the name for young bears. CUBioS combines a lightweight database for managing data, a cloud for archiving data and performing data intensive computations, and Web 2.0 widgets for interacting with the system. CUBioS was used to develop the Cistrack database, which contains the ModENCODE fly data produced by the White Lab at the University of Chicago. Flynet integrates publically available Cistrack data, with third party data, and uses a visual genome browser from the Holmes Lab as the user interface.
- November, 2008, Recent cloud computing talks.
Recent talks on cloud computing, include:
- A talk at a small workshop on November 5, 2008 in Sunnyvale on Cloud Computing. As part of the talk, I stressed the need for standards for clouds and for interoperability between clouds.
- A talk at Cloud Computing at its Applications CCA 08 in Chicago on Oct 23, 2008 that looked at cloud computing from a viewpoint in which the data center is the unit of computation.
- A Plenary Talk on cloud computing the UK e-Science All Hands Meeting 2008 in Edinburgh, UK on September 9, 2008.
- A talk on August 26, 2008 at KDD 08 in Las Vegas comparing Sector/Sphere to Hadoop. Sector is about twice as fast.
- A Keynote Talk on cloud computing at the 2008 IEEE Congress on Services (Services 2008) on July 10, 2008 in Hawaii.
- October, 2008, Open Cloud Testbed upgraded. We upgraded the nodes on the Open Cloud Testbed. The testbed now contains 480 cores, distributed in four locations, with all locations connected with 10 Gb/s networks. This completes Phase 1 of the roll out. As part of Phase 2, we will be adding a fifth and sixth location and upgrading the capacity of the testbed in Chicago. Phase 2 is expected to be completed by February, 2009. See the Open Cloud Consortium's web site www.opencloudconsortium.org for more information or to join the consortium.
- September, 2008, Sector Demonstration. Sector/Sphere was demonstrated at the CENIC CalREN HPC/XD Workshop in San Diego from September 15-16, 2008.
- July, 2008, Sector 1.11 is released. This version has improved performance and several bugs have been fixed. A paper describing the design and implementation of Sector can be found on arxiv.
- June, 2008, Open Cloud Testbed launched. The Open Cloud Consortium has set up a distributed 120 node (240 core) testbed, with racks at UIC (Chicago), StarLight (Chicago), JHU (Baltimore) and Calit2 (San Diego). The racks are all connected with 10 Gb/s (or higher ) networks. As far I know, this is the first wide area cloud testbed that makes uses of 10 Gb/s networks. See the Open Cloud Consortium's web site www.opencloudconsortium.org
- June, 2008: Sector Version 1.8 Released. Version 1.8 of Sector was released on June 27, 2008. It can be obtained from Source Forge at the project site sector.sf.net. Sector is a wide area high performance storage and compute cloud. For the past couple of years, Sector has been used to distribute the Sloan Digital Sky Survey (SDSS) via the web site sdss.ncdm.uic.edu The current version of Sector also includes high performance distributed computing services (a generalization of MapReduce) and security services.
- Sector is approximately twice as fast as Hadoop as measured by the Terasort Benchmark. In a recent paper, we describe the design and architecture of Sector. The paper also describes some preliminary experimental studies comparing the performance of Sector and Hadoop. On the clusters and distributed clusters used, Sector was about twice as fast as Hadoop on the Terasort Benchmark. Sector is designed to be used on clusters within a data center, as well as on distributed clusters across data centers that are connected by wide area area high performance 10 Gbps networks. The paper can be found here and will be presented at KDD 2008.
- Open Cloud Consortium. Robert Grossman is the initial Chair of the Open Cloud Consortium (OCC). The OCC supports the development of open source software for cloud based computing; develops standards and standard based interfaces for interoperating different software supporting cloud based computing; and manages a wide area, distributed testbed for cloud computing called the Open Cloud Testbed.
- March, 2008: UDT, Version 4.2 released. UDT is an application layer high performance network transport protocol that is available from Source Forge at udt.sf.net. Version 4 of UDT was released on March 19, 2008.
- UDT is part of Globus. Beginning with Globus Version 4.2, one can choose an option in GridFTP so that TCP is replaced with UDT, which will speed up large data transfers.
- Angle Project wins SC 07 Analytics Challenge. On November 15, 2007, The Angle Project won First Place in the 2007 Analytics Challenge at the ACM/IEEE International Conference for High Performance Computing and Communications 2007 (SC07). The title of the project was "Angle: Detecting Anomalies and Emergent Behavior from Distributed Data in Near Real Time."
- A new product on heap-ordered trees. Consider the set of rooted trees with n+1 nodes that are labeled with the integers 1, ..., n (we don't label the root). We say that the tree is heap-ordered in case the label of a node is larger than the label of its parent. It is easy to see by induction that the number of heap-ordered trees on n+1 nodes is n!. Let kS[n] denote the group algebra of the symmetric group S[n] over a field k. Consider the vector space kHOT[n] over a field k whose basis is the set of heap ordered trees. In a recent paper, we describe a product on heap ordered trees and an homomorphism from kHOT[n] to kS[n]. See the paper arxiv.org/abs/0706.1327 for details, which will be published as: Robert L Grossman and Richard G. Larson, Hopf Algebras of Heap Ordered Trees and Permutations, Communications in Algebra, 2008. Note that this product is different than a product on heap-ordered trees that we described in 1989 (arxiv.org/abs/0711.3877) and that is used, for example, to derive numerical algorithms.
- 2007 SIGKDD Service Award. In July 2007, I was awarded the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) Service Award for my "... role in the development of open and scalable architectures and standards for the SIGKDD and Global KDD Communities."
- Winner of Second Data Mining Practice Prize. The paper "Data Quality Models for High Volume Transaction Streams: A Case Study" by Joesph Bugajski, Robert Grossman, Chris Curry, David Locke and Steve Vejcik won the second annual Data Mining Practice Prize at KDD 2007. The prize is awarded each year "for work that has had a significant and quantitative impact in the application in which it was applied."
- New book. I have just finished writing a book called Digital Beauty. There is a more information here.
- UDT wins SC 2006 Bandwidth Challenge. Our team won first place in the Bandwidth Challenge at the ACM/IEEE International Conference for High Performance Computing and Communications 2006 (SC 06) with the project: Distributing the Sloan Digital Sky Survey Using Sector. To win the challenge, we transported over 1.2TB of SDSS data from Chicago to Tampa disk to disk at a sustained bandwidth of over 8.1 Gbps and a peak bandwidth of 9.18 Gbps. For a trace of the transfer, see the SC 06 Bandwidth Challenge web site
- CDCM wins First Place in the SC 05 Analytics Challenge. The project Real Time Change Detection of Highway Sensor Data won First Place in the Analytics Challenge at the ACM/IEEE International Conference for High Performance Computing and Communications 2005 (SC05). The project used a technique we introduced called Change Detection Using Cubes of Models or CDCM.
- More news. For more news, see the News section.
Finding Material on this Site
You can use Google to search for a particular term on this site by entering the term below: