Total Number of Subscribers: 451   

 



Powered by Prime Academy  
In pursuit of excellence    

    Date: 8th August 2008   

Compiled by Mr. M. Sathya Kumar  

 

 

Identification of Losses Due to Electronic Frauds 

Fraud or scams — euphemistically called economic offences — are a dominant white-collar crime in today’s business environment. An unfortunate but rather well known fact is that many businesses and government organisations, particularly in financial and related services, suffer from frauds of various kinds.

 

Frauds bleed businesses to the tune of hundreds of billions of dollars worldwide, annually. Continued prevalence of this malpractice on a large scale can have disastrous long-term consequences not only for the businesses involved but also for the investors, financial institutions, government, and economy, in general.

 

Today’s highly automated business systems collect vast amounts of data regarding almost all kinds of business transactions and activities. With the advent of data warehousing and corporate memory systems, both current and historical business data can be accessed. Clearly, evidence of fraud and fraudulent activities is partly hidden in these enormous quantities of data. Data analysis techniques

can help businesses perform effective fraud management  to prevent losses and bring the culprits to justice.

 

Fraud management involves a whole gamut of activities: early warnings and alarms; telltale  symptoms and patterns of various types of fraud; profiles of users and activities; fraud detection, prevention, and avoidance; minimising false alarms and avoiding customer dissatisfaction; estimating losses; risk analysis; surveillance and monitoring; security (of computers, data, networks, and physical facilities); data and records management; collection of evidence from data and other sources; reports; summaries; data visualization links to management information systems and operation systems (such as billing and accounting); and control actions (such as prosecution, employee education and ethics programs, hotlines, and cooperation with partners and law enforcement agencies). Several critical issues make building fraud management systems a challenging and difficult task.

 

These include enormous volumes of data with complex structure; changing behaviour of users, business activities, and fraudsters; continuous evolution of newer frauds particularly to bypass existing detection techniques; need for fast and accurate fraud detection without undue burden on business operations; risks or false alarms; and social issues such as privacy and discrimination.

 

There are a number of means and processes, in particular software-based techniques, that can be used to detect, investigate, and prevent frauds.

 

What is Fraud?

 

Oxford Advanced Learner’s Dictionary defines fraud as “an act of deceiving illegally in order to make money or obtain goods”. Indeed, in fraud, groups of unscrupulous (or “morally challenged”,) individuals manipulate or influence the activities of a target business with the intention of making money or obtaining goods through illegal or unfair means. Fraud cheats the target organisation

of its legitimate income and results in a loss of goods, money, and even goodwill and reputation. Fraud often employs illegal and always immoral or unfair means. Outright criminal activities — typically involving violence or other physical means — such as break-in thefts, industrial espionage, sabotage, attacks and robberies, and so forth are usually excluded from the scope of fraud.

 

But even within a particular organisation, the full scope of what exactly constitutes fraud isn’t al-ways clear. A particular difficulty is distinguishing fraud from losses due to incompetence, procedural lapses, accidents, mismanagement, wrong decisions, or business risks. General economic offenses also include criminal acts other than fraud: money laundering, financing of criminal or anti-national activities, corruption, bribery, kickbacks, and so on. Nevertheless, due to their potential for significant negative impact, fraud has been studied in-depth as a phenomenon. Luckily, fraud falls into typical similar types that share common characteristics, means, and methods.

 

Just as a house theft can occur in only some specific ways — break-in, lock picking, gaining entry and confidence by misrepresenting identity — fraud shares similar modus operandi. Consequently, an organisation can take advantage of  these commonalties to establish business practices to protect itself from fraud and resultant losses. Of course, any particular fraud in an organisation need not meet all of these characteristics. Fraud often consists of many instances or incidents involving repeated transgressions using the same method. Fraud instances can be similar in content and appearance but usually aren’t identical.

 

Fraud investigation is complex, time-consuming, and tedious activity and requires  a great deal of knowledge of finance, economics, business practices, market  analysis and business conditions, investigative skills, and law. A comprehensive investigative and surveillance business process for fraud management (often set up in t he form of a fraud control center within an organisation) often includes a number of steps, activities, and deliverables. The core of this business process is data analysis.

 

Data Analysis Techniques for Fraud Detection

 

The techniques used for fraud detection fall in two primary categories: statistical techniques and  artificial intelligence (AI) techniques. Many commercial tools are available for fraud detection that provide a variety of techniques from either of these areas, although usually not in any single integrated tool. Important statistical data analysis techniques for fraud detection are:

 

Data preprocessing techniques for detection, validation, error correction, and filling up (estimation) of missing or incorrect data

 

Calculation of various statistical parameters such as averages, quartiles, per- formance metrics, probability distributions, and so on. For example, the averages may include average length of call, average number of calls per month (or per day), and average delays in bill payment.

 

Models and probability distributions of various business activities either in terms of various parameters or probability distributions.

 

Computing user profiles (classifications of users, customers, and orders into various categories) and statistical characterization of these profiles (in terms of parameters, probability distributions, and so forth)

 

Time-series analysis of time-dependent data.

 

Clustering and classification to find patterns and associations among groups of data.

 

Matching algorithms to detect anomalies in the behavior of transactions or users as compared to previously known models and profiles. Techniques are also needed to eliminate false alarms, estimate risks, and predict future of current transactions or users

 

In addition, a number of auxiliary tools can help surveillance personnel quickly grasp the nature of business data and activities. These include canned queries, summary reports, data visualization in various forms, software filters in the form of early warning indicators, alarm conditions, and so on.

Usually, these techniques require considerable human expertise and active participation.

 

Also, they’re used in a sort of iterating manner, where suspicious transactions are first identified and then further investigated to locate the victims, suspects, and their methods, which are then investigated to enable prevention or gather evidence.

 

As already remarked, fraud management is a knowledge intensive activity. Therefore, applications of knowledge based techniques from AI is a natural idea. Important AI techniques used for fraud management include:

 

  • Data mining to classify, cluster, and segment the data and automatically find associations and rules
  • in the data that may signify interesting patterns, including those related to fraud.
  • Expert systems to encode expertise for detecting fraud in the form of rules. 
  • Pattern recognition to detect approximate classes, clusters, or patterns of suspicious behavior either automatically (unsupervised) or to match given inputs 
  • l Machine learning techniques to automatically identify characteristics of fraud. 
  • l Neural networks that can learn suspicious patterns from samples and use them later to detect such patterns in the production data.

Other techniques such as

 

Bayesian networks, decision theory, and sequence matching are also used for fraud detection.

 

Data analysis can be a strategic weapon in the management and control of fraud

 

Medical Fraud

 

An illustration of some of these techniques to handle the problem of fraud detection in a hypothetical and highly simplified medical insurance claims database is shown below. The database (as maintained by the insurance company and populated from the claim documents submitted by patients) consists of a single table and has the following format:

 

1. Patient ID (SSN)

2. Sex (M/F)

3. Age (0 to 120)

4. Address

5. Claim Date

6. Illness Category, Illness ID, and Illness Description (may be more than one illness)

7. Illness Duration Start Date - End Date

8. Hospital ID(s)

9. Doctor ID(s)

10. IDs of diagnostic tests performed

11. Names of medicines given

12. Other treatments (for example, physiotherapy)

13. Diagnostic tests bills

14. Medicine bills

15. Other treatment bills

16. Hospital bills

17. Doctors’ charges

18. Misc. amount (all other costs)

19. Net Amount.

 

Let us evaluate whether a new specific claim is “suspicious” in some way. If so, the claim can be processed in a different way — cancel claim payment, proceed with claim payment, recall claim, reduce payment amount, or seek clarification from hospital or patient.

 

For the purpose of evaluating a new claim, one can often define various criteria or indices for suspiciousness.

 

For each criteria or index, the claim gets a score; typically, high-score values in a specific index indicate greater suspiciousness. Thus, a claim that has high scores for many criteria is more  suspicious.

 

Examples of such criteria include:  

  • The net amount is too large as compared to the average amount in similar claims
  • The cost of one or more diagnostic tests is too large as compared to the average amount in similar claims.
  • The percentage of one diagnostic test costs to the net amount is too high as compared to the average percentage in similar claims. 
  • The previous two scores can be adapted for medicine costs, doctors’ charges, hospital bills, and other costs.  
  • The claim is a duplicate (a very similar claim by the same patient was paid in recent past).
  • The address of patient, hospital, or doctor is suspicious (missing ZIP Code, address includes P.O. box number, errors in address components — incorrect phone number, ZIP Code, town name, or email address).

One can define many more such indices. All such indices have to be defined rigorously;the previous descriptions are merely indicative. Ideally, the fraud control system can provide a facility to dynamically define such indices outside the system so that enhancement is easily possible.

 

Because the indices represent knowledge about the fraud detection in claims warranty data, a rule language can capture it in a knowledge base. The system can provide a facility that lists similar claims to the given claim (for example, based on k-nearest-neighbour algorithms), along with a similarity matching score. This facility would enable the end user to evaluate the given claim with respect to similar claims. From a pool of already known fraudulent claims, machine learning algorithms can construct a classification (such as a decision tree) that can help evaluate a new claim.

 

As a simple example, you can check the disease (illness) ID against the duration and costs. Using the historical claims database, you can easily get a histogram of the hospital duration bins (0 to 2 days, 3 to 5 days, and so on) against the number of claims (this histogram will be for a specific illness ID, sex, and age group). You can then compare the claim duration against this histogram. If it falls in a sparsely populated bin, then it’s at least a bit suspicious.

 

Clustering of historical data can be used to automatically detect such outliers. Several types of calculations can be performed for fraud detection, such as regression analysis and time-series analysis. In time-series analysis, the time-stamped data is analysed for trends, seasonal patterns, and outliers. The series is first transformed, if necessary, so that the variance is constant. Additional assumptions may be needed because the observations in claims data aren’t necessarily at regular time intervals.

 

Several time series in the claims data can  be analysed using time-series analysis techniques. For example, the NO_OF_CLAIMS and NET_AMOUNT (or any other component of claim amount) for any specific or all hospitals, for any specific or all illness IDs, and for any specific

or all patients.

 

Suppose X is the time varying  quantity (NO_OF_ CLAIMS) in a time-series; let X(I) denote value of X at time I. A graph can be used to plot week-to-week changes (X(I+1) - X(I)) in the timevarying quantity (NO_OF_ CLAIMS).

 

This can be used to quickly identify the outliers. Another graph that plots week-to-week percentage change is X100 * X(I+1)/X(I) in the time-varying quantity. Auto-correlation and other techniques can be used to study these time series.

 

The following are some variables that are important for fraud detection in the claims data. Multiple regression analysis can be performed on chosen subsets of these variables:

  • Age
  • Sex
  • Hospital ID
  • Illness ID
  • Duration of illness
  • Various cost components
  • Net amount.
  • Statistical analysis can also be performed for identifying outliers:
  • Test cost outliers
  • Hospital charges outliers
  • Medicines cost outliers
  • Illness duration outliers
  • Combination outliers (doctors charges and net amount).

Some important temporal parameters for a claim include CLAIM_DATE, illness  start and end dates (duration of illness). The difference between CLAIM_DATE and ILLNESS_START_DATE, called CLAIM_DELAY, is an important independent variable.

 

The relationship between the two independent variables NET_AMOUNT (on X-axis) and duration of illness (on Y axis) can be shown in a scatter  plot (for only those claims for a specific illness ID). You may find, for example, that most claims above Rs. 20,000 have a long duration of illness. The Pearson correlation coefficient R for these two variables can be computed easily and indicates how closely- related these two variables are. Analysis of variance can be used to check if the mean duration of illness is equal for, say, all hospitals.

 

Moreover, these comparisons can be done for claims of different NET_AMOUNT bins. If not, further tests can be performed for ensuring that there’s no special behavior by specific hospitals.

 

All such statistical analyses need to be studied in-depth and defined for the specific tasks of fraud detection and  control in the medical claims domain. A large number of predefined statistical  calculations are oriented to detecting suspicious data.

Fraud is an important phenomenon in today’s wired commercial world. Fraud causes huge losses and damages an organisation’s reputation and goodwill. Fraud management is a complex and knowledge-intensive process involving deployment and effective use of tools based on a plethora of statistical and AI techniques.

 

Source : Article by K. Paul Jayakar, the Author is the member of the institute.

 

 


 

Rewards waiting for feedback at
E-mail : smarttrainee@gmail.com

 


 

www.primeonlinetest.com

 


 

Disclaimer: We believe that the information contained in this e-zine is true. If you do not wish to receive Smart Trainee please click here.

 

Prime Academy - In Pursuit of excellence

 

 

 

Click here to contact us, if you are unable to view the content properly