Data Mining and Alert Management Systems FAQ
Q. What is data mining?
Data mining is the semi-automatic extraction of patterns, changes, associations, anomalies, and other statistically significant structures from large data sets. Here is a FAQ about data mining.
Q. What are alert management systems?
Alert management systems use data mining technology to process real time events to create alerts. Alert management systems are important for a variety of applications, including detecting fraudulent credit card transactions, detecting network intrusions, detecting fraudulent insurance claims, detecting threats of various types, uncovering insurance fraud rings, etc. To say it another way, alert management systems are used to catch bad guys doing bad things.
Q. Does this mean that computers and data mining can be used to "connect the dots" and find bad guys?
In my opinion no, although I read all the time about the need to the "connect the dots". Simply put: computers don't connect the dots, humans do. In more detail: the role of data mining and related technologies is to priortize information that human analysts can review in detail. Given the correct information, analysts can then connect the dots.
To understand this distinction in more detail, consider a system whose goal is to identify fraudulent credit card transactions or fraudulent insurance claims. Systems like these typically assign scores, say say from 1 to 1000, to each credit card transaction or insurance claim. Numbers above a certain threshold are examined by analysts in detail. The more analysts, the more instances that can be examined, and the more bad guys that can be caught. To summarize, the role of data mining is simply to send alerts that humans can analyze in detail. A more accurate way to think of these types of systems is as alert management systems.
Q. How is the success of a data mining model measured?
The quality of data mining models of this type is measured along two dimensions. Say one million transactions are scored as described above and there are 1000 transactions with a score about 900 that are examined in detail. Say that these 1000 transactions contain 100 bad ones. Assume also that the one million transactions contain 200 bad ones. Then 100/200 or 50% of the bad ones are detected, so we say that the detection rate is 50%. In addition, for each bad one found, 10 other good transactions are examined, so we say that the false positive rate is 10 to 1. Notice that the incidence of fraudulent transactions is 200/1,000,000 or .0002 = .02%. To catch more bad guys requires more analysts and bothering more good guys. If more analysts were available, then one might lower the threshold of transactions to examine from 900 to 850.
Q. Is this the same general process used to identify credit card fraud rings or insurance fraud rings?
Sometimes yes, but generally other techniques are used. The basic idea is that bad guys tend to hang out with other bad guys. For example, a fraud ring faking accidents might all use the same doctor, out patient clinic, or automobile body shop. Sometimes the drivers have different names, but share a common phone number or common address. By searching for common phone numbers, similar addresses, nearby addresses, and related types of patterns, suspicious accounts or claims can be identified for further investigation. Comparing the number of claims of various types processed by doctors, clinics, and body shops also singles out suspicious accounts or claims for further investigations. This type of analysis is sometimes called ring analysis. As part of a criminal investigation of a fraud ring, telephone calls may be analyzed. For example, the accident victim may have called the fraud ring leader prior to the accident, and the fraud ring leader may have called the clinic and automobile body shop prior to the accident. This type of analysis is called contact chaining.
Q. What are profiles?
Alert management systems usually work with two types of data: transactional data and summary level data. A simple example is provided by credit card transactions. Each credit card transaction is associated with an account owner. Given a stream of credit card transactions, statistical characterizations of the corresponding account holders can be assembled by aggregating and analyzing information from the different transactions associated with the same account holder. This is a very common situation. In other words, one can think of a stream of transactions and imagine building summaries which statistically pull together information from those multiple transactions. More generally, transactions are examples of what are called events and the statistical summaries are examples of what are called profiles.
Here are some examples:a. Credit Cards. First, let's look at credit card transactions in a bit more detail. The number of credit card transactions, the total dollar amount, the number of credit card transactions related to travel and entertainment, etc. are some of the features that would be part of a profile. With event-based processing, each time a new credit card transaction is processed, the profile is updated. With batch processing of events, credit card transactions for a day, a week or a month are processsed at the same time and then the profiles are updated.
b. Telephone Calls. Information about individual telephone calls is stored in records called call detail records and those records are examples of events. Summary level data about the number of phone calls a person makes each day, the time of day the calls are made, the percentage that are local, regional, long distance, and international, etc. are part of the profile. Again, the profiles may be updated with each new event, or in batches at regular intervals.
c. Airline Reservations. Information about airline trips are stored in records call Passenger Name Records or PNRs. PNRs are examples of events. PNRs can be processed to create profiles of good travelers and travelers who present risk.
Q. This doesn't sound correct. I thought profiles were bad.
Profiles are not the same as what is sometimes called profiling. Profiling can be thought of as the use in decision making of: a) inappropriate data or b) data that is specifically excluded by policy. For example, using zip codes in a model for determining whether to offer bank loans is a specific violation of federal regulations which were developed to end inappropriate lending practices based upon red-lining. Federal and corporate policies also prohibit using certain types of information or combining certain types of information.
Decisions based upon models derived from data are usually expected to be empirically derived and statistically sound. That is, first, they must be derived from the data itself, and not the biases of the person building the model. Second, they must be based upon generally acceptable statistical procedures. For example, the arbitrary exclusion of data can result in models that are biased in some fashion.
Because profiles can be easily confused with profiling, some people prefer to call profiles by alternate names such as feature vectors or summary vectors.
Q. I read on an airplane that neural networks can catch bad guys. Why can't we just build a big neural network?
Although most things one reads from magazines and newspapers while flying on airplanes turn out to be true, this particular factoid is more complex. A neural network is one particular type of data mining algorithm. There are many different types of data mining algorithms, and more generally, many different types of statistical algorithms. Some of these are discussed in the accompanying FAQ.
Given the appropriate data sources, the appropriate processes, the appropriate systems, and the appropriate algorithms, data mining systems with good detection rates and good false positive rates can be developed. The detection rates and false positive rates will be a function primarily of the quality of the data and the quality of the business processes and methodologies used to process the data and build the models. The detection rates and false positive rates will be secondarily a function of the particular algorithm used. There are at least half a dozen types of common algorithms and all can be used successfully; all have different trade-offs and benefits.
The good news is that sometimes certain vendors favor certain algorithms. So if you know which vendor you like, it is easy to know which algorithm to like.
Q. Can profiles include information derived from multiple transactional data sources?
Yes, it has been common for many years to build profiles for marketing purposes by merging multiple sources of information. For example, profiles for direct marketing can be created by aggregating in-house purchases with third party information collected by a credit bureau.
Aggregating information from multiple sources requires determining whether two records from two different sources refer to the same person, business, or other entity. Sometimes this is called householding. For small numbers of transactions and solutions which are not required to be completely accurate, there is commerical software which does an adequate job.
The problem is that as the number of data sources grows, as the total number of records grows, and as the expectation for accuracy grows, the problem becomes harder and harder. For example, 90% accuracy might be just fine for marketing purposes, 95% accuracy might be just fine for credit related purposes, while 98% might not be accurate enough for many people for homeland security purposes. The higher the requirement for accuracy the harder the problem. Improving accuracy from 80% to 90% can be twice as hard as improving accuracy from 70% to 80%; improving accuracy from 90% to 95% can be twice as hard as improving accuracy from 80% to 90%; improving accuracy from 95% to 98% can be twice as hard as improving accuracy from 90% to 95%, etc.
Just a one percent error for 100 million profiles, means that 1 million profiles are inaccurate. Some people consider this to be a large number. On the other hand, a system in which 25% of the profiles have some type of inaccuracy might still accurately identify fraud 50% of the time. Given these trade-offs, striking the proper balance requires the wisdom of King Solomon, the public relationship skills of Ronald Reagan, and the luck of a lottery winner.
Q. Isn't it a bad idea to combine data from different transactional sources? Won't this destroy privacy?
Combining data from multiple transactional sources doesn't destroy privacy --- bad policies and bad designs do, as well as errors and misuse.
Good policies which protect privacy to the extent possible and good designs which guarantee privacy to the extent possible are the two most important factors affecting privacy. For example, most statistical algorithms which score events and profiles do not require personally identifying information (PII), such as names, addresses, or phone numbers. For this reason, good design dictates that PII should be stored separately from profiles, and that linking profiles with PII should not be possible without audit trails.
Unfortunately, as the number of transactions and profiles grows so does the number of errors. As the number of people involved with the system grows, so does the chance of misuse.
In general, a system which combines data from two transactional sources is more accurate than one which uses a single source. On the other hand, a system which combines data from all available transactional sources is clearly a bad idea. Unfortunately, knowing where in the middle one wants to be is a very difficult, and largely empirical, decision.
Q. How do alert management systems work?
Alert management systems (AMS) provide six main functions.
- Scoring. An AMS must score both events and profiles. This is done by using data mining or statistical models to assign events and profiles scores. Generally, the scores range from 1 to 1000. The higher the score, the more suspicious the event or profile.
- Linking. An AMS must follow links to understand, for example, whether an unexpected number of transactions share the same address or phone number. Linking is also used to follow the contacts and associations of suspicious entities.
- Matching. An AMS must match accounts with lists provided by regulatory agencies and other internal and third party sources.
- Routing. An AMS must be able to aggregate the various scoring and checking activities into actionable groups and facilitate taking the appropriate steps when suspicious activities are identified.
- Checking. An AMS must check each transaction and account against rules and policies.
- Reporting. An AMS must supply reports summarizing the results of the scoring, linking, matching and checking.
Q. Are these systems expensive to build?
The expense of these systems depends upon the number of different data sources, the amount of data, the complexity of the data, the detection rates and false positive rates required, the complexity of the routing and reporting, and the quality of service or uptime required.
For example, a system for capturing telephone fraud with two data sources, operating for 24 hours a day, for 7 days a week, and for 365 days a year, with a 30% accuracy rate and a 25 to 1 false positive rate, might require $2M - $4M to build and $3M to $5M per year to operate.
Q. Where can I get more technical information about alert management systems?
A high level over view of alert management systems can be found in the following publication: Robert L. Grossman, Alert Management Systems: A Quick Introduction, in Managing Cyber Threats: Issues, Approaches and Challenges, edited by Vipin Kumar, Jaideep Srivastava, Aleksandar Lazarevic, Kluwer Academic Publisher, 2004, to appear. pdf
Copyright Robert L. Grossman, 1999-2003, revised December 24, 2003.