Robert Grossman's Home Page
This web site contains some technical publications and talks about computing with data written by Robert Grossman. There are over 100 publications available online.
The site was last updated on July 1, 2008.
Recent News
- June, 2008: Sector Version 1.8 Released. Version 1.8 of Sector was released on June 27, 2008. It can be obtained from Source Forge at the project site sector.sf.net. Sector is a wide area high performance storage and compute cloud. For the past couple of years, Sector has been used to distribute the Sloan Digital Sky Survey (SDSS) via the web site sdss.ncdm.uic.edu The current version of Sector also includes high performance distributed computing services (a generalization of MapReduce) and security services.
- Sector is approximately twice as fast as Hadoop as measured by the Terasort Benchmark. In a recent paper, we describe the design and architecture of Sector. The paper also describes some preliminary experimental studies comparing the performance of Sector and Hadoop. On the clusters and distributed clusters used, Sector was about twice as fast as Hadoop on the Terasort Benchmark. Sector is designed to be used on clusters within a data center, as well as on distributed clusters across data centers that are connected by wide area area high performance 10 Gbps networks. The paper can be found here and will be presented at KDD 2008.
- Open Cloud Consortium. Robert Grossman is the initial Chair of the Open Cloud Consortium (OCC). The OCC supports the development of open source software for cloud based computing; develops standards and standard based interfaces for interoperating different software supporting cloud based computing; and manages a wide area, distributed testbed for cloud computing called the Open Cloud Testbed.
- March, 2008: UDT, Version 4.2 released. UDT is an application layer high performance network transport protocol that is available from Source Forge at udt.sf.net. Version 4 of UDT was released on March 19, 2008.
- UDT is part of Globus. Beginning with Globus Version 4.2, one can choose an option in GridFTP so that TCP is replaced with UDT, which will speed up large data transfers.
- Angle Project wins SC 07 Analytics Challenge. On November 15, 2007, The Angle Project won First Place in the 2007 Analytics Challenge at the ACM/IEEE International Conference for High Performance Computing and Communications 2007 (SC07). The title of the project was "Angle: Detecting Anomalies and Emergent Behavior from Distributed Data in Near Real Time."
- A new product on heap-ordered trees. Consider the set of rooted trees with n+1 nodes that are labeled with the integers 1, ..., n (we don't label the root). We say that the tree is heap-ordered in case the label of a node is larger than the label of its parent. It is easy to see by induction that the number of heap-ordered trees on n+1 nodes is n!. Let kS[n] denote the group algebra of the symmetric group S[n] over a field k. Consider the vector space kHOT[n] over a field k whose basis is the set of heap ordered trees. In a recent paper, we describe a product on heap ordered trees and an homomorphism from kHOT[n] to kS[n]. See the paper arxiv.org/abs/0706.1327 for details, which will be published as: Robert L Grossman and Richard G. Larson, Hopf Algebras of Heap Ordered Trees and Permutations, Communications in Algebra, 2008. Note that this product is different than a product on heap-ordered trees that we described in 1989 (arxiv.org/abs/0711.3877) and that is used, for example, to derive numerical algorithms.
- 2007 SIGKDD Service Award. In July 2007, I was awarded the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) Service Award for my "... role in the development of open and scalable architectures and standards for the SIGKDD and Global KDD Communities."
- Winner of Second Data Mining Practice Prize. The paper "Data Quality Models for High Volume Transaction Streams: A Case Study" by Joesph Bugajski, Robert Grossman, Chris Curry, David Locke and Steve Vejcik won the second annual Data Mining Practice Prize at KDD 2007. The prize is awarded each year "for work that has had a significant and quantitative impact in the application in which it was applied."
- New book. I have just finished writing a book called Digital Beauty. There is a more information here.
- UDT wins SC 2006 Bandwidth Challenge. Our team won first place in the Bandwidth Challenge at the ACM/IEEE International Conference for High Performance Computing and Communications 2006 (SC 06) with the project: Distributing the Sloan Digital Sky Survey Using Sector. To win the challenge, we transported over 1.2TB of SDSS data from Chicago to Tampa disk to disk at a sustained bandwidth of over 8.1 Gbps and a peak bandwidth of 9.18 Gbps. For a trace of the transfer, see the SC 06 Bandwidth Challenge web site
- CDCM wins First Place in the SC 05 Analytics Challenge. The project Real Time Change Detection of Highway Sensor Data won First Place in the Analytics Challenge at the ACM/IEEE International Conference for High Performance Computing and Communications 2005 (SC05). The project used a technique we introduced called Change Detection Using Cubes of Models or CDCM.
- More news. For more news, see the News section.
About the Author
Robert Grossman is the Managing Partner of Open Data Group. Open Data helps companies develop and improve their analytic strategies and provides outsourced analytic services so that companies can increase revenues, decrease costs, and improve business processes. Please contact him at info at opendatagroup dot com if you would like to work with him.
He is also the Director of the Laboratory for Advanced Computing (LAC) and the National Center for Data Mining (NCDM) at the University of Illinois at Chicago. LAC and NCDM develop open source technology for internet-scale and cloud computing, such as UDT and Sector and host the development of standards, such as the Predictive Model Markup Language (PMML). Please contact info at lac.uic.edu if you would like more information about the Laboratory or Center.
Biographical material can be found here.
Finding Material on this Site
You can use Google to search for a particular term on this site by entering the term below: