|
|
Total Number of Subscribers: 451 | |
|
| ||
|
| ||
|
Date: 8th August 2008 |
Compiled by Mr. M. Sathya Kumar | |
|
|
Identification of Losses Due to Electronic
Frauds Fraud or scams —
euphemistically called economic offences — are a dominant white-collar
crime in today’s business environment. An unfortunate but rather well
known fact is that many businesses and government organisations,
particularly in financial and related services, suffer from frauds of
various kinds. Frauds bleed businesses to the tune of hundreds
of billions of dollars worldwide, annually. Continued prevalence of this
malpractice on a large scale can have disastrous long-term consequences
not only for the businesses involved but also for the investors, financial
institutions, government, and economy, in
general. Today’s highly automated business systems
collect vast amounts of data regarding almost all kinds of business
transactions and activities. With the advent of data warehousing and
corporate memory systems, both current and historical business data can be
accessed. Clearly, evidence of fraud and fraudulent activities is partly
hidden in these enormous quantities of data. Data analysis
techniques can help businesses perform effective fraud
management to prevent losses
and bring the culprits to justice. Fraud management involves a whole gamut of activities: early warnings and alarms; telltale symptoms and patterns of various types of fraud; profiles of users and activities; fraud detection, prevention, and avoidance; minimising false alarms and avoiding customer dissatisfaction; estimating losses; risk analysis; surveillance and monitoring; security (of computers, data, networks, and physical facilities); data and records management; collection of evidence from data and other sources; reports; summaries; data visualization links to management information systems and operation systems (such as billing and accounting); and control actions (such as prosecution, employee education and ethics programs, hotlines, and cooperation with partners and law enforcement agencies). Several critical issues make building fraud management systems a challenging and difficult task.
These include enormous volumes of data with
complex structure; changing behaviour of users, business
activities, and fraudsters; continuous evolution
of newer frauds particularly to bypass existing detection techniques; need
for fast and accurate fraud detection without undue burden on business
operations; risks or false alarms; and social issues such as privacy and
discrimination. There are a number of means and processes, in
particular software-based techniques, that can be used to detect,
investigate, and prevent frauds. What is Fraud?
Oxford
Advanced Learner’s Dictionary defines
fraud as “an act of deceiving illegally in order to make money or obtain
goods”. Indeed, in fraud,
groups of unscrupulous (or “morally challenged”,) individuals manipulate
or influence the activities of a target business with the intention of
making money or obtaining goods through illegal or unfair means. Fraud
cheats the target organisation of its legitimate income and results in a loss of goods, money, and even goodwill and reputation. Fraud often employs illegal and always immoral or unfair means. Outright criminal activities — typically involving violence or other physical means — such as break-in thefts, industrial espionage, sabotage, attacks and robberies, and so forth are usually excluded from the scope of fraud.
But even within a particular organisation, the full scope of what exactly constitutes fraud isn’t al-ways clear. A particular difficulty is distinguishing fraud from losses due to incompetence, procedural lapses, accidents, mismanagement, wrong decisions, or business risks. General economic offenses also include criminal acts other than fraud: money laundering, financing of criminal or anti-national activities, corruption, bribery, kickbacks, and so on. Nevertheless, due to their potential for significant negative impact, fraud has been studied in-depth as a phenomenon. Luckily, fraud falls into typical similar types that share common characteristics, means, and methods.
Just as a house theft can occur in only some specific ways — break-in,
lock picking, gaining entry and confidence by misrepresenting identity —
fraud shares similar modus operandi. Consequently, an organisation
can take advantage of these commonalties to establish
business practices to protect itself from fraud and resultant losses. Of
course, any particular fraud in an organisation need not meet all of these
characteristics. Fraud often consists of many instances or incidents
involving repeated transgressions using the same method. Fraud instances can be
similar in content and appearance but usually aren’t identical.
Fraud investigation is complex, time-consuming,
and tedious activity and requires a great deal of knowledge of
finance, economics, business practices, market analysis and business conditions,
investigative skills, and law. A comprehensive
investigative and surveillance business process for
fraud management (often set up in t he form of a
fraud control center within an organisation) often includes a number of
steps, activities, and deliverables. The core of this business process is
data analysis. Data Analysis
Techniques for Fraud
Detection The techniques used for fraud detection fall in
two primary categories: statistical techniques and artificial intelligence (AI)
techniques. Many commercial tools are available for fraud detection
that provide a variety of techniques from either
of these areas, although usually not in any single integrated tool.
Important statistical data analysis techniques for fraud detection
are: Data preprocessing techniques for detection,
validation, error correction, and filling up (estimation)
of missing or incorrect
data Calculation of various statistical parameters
such as averages, quartiles, per- formance metrics, probability
distributions, and so on. For example, the averages may include average
length of call, average number of calls per month (or per day), and
average delays in bill payment. Models and probability distributions of various
business activities either in terms of various parameters or probability
distributions. Computing user profiles (classifications of
users, customers, and orders into various categories)
and statistical characterization of these
profiles (in terms of parameters, probability distributions,
and so
forth) Time-series analysis of time-dependent
data. Clustering and classification to find patterns
and associations among groups of data. Matching algorithms to detect anomalies in the
behavior of transactions or users as compared to
previously known models and profiles. Techniques
are also needed to eliminate false alarms, estimate
risks, and predict future of current
transactions or users In addition, a number of auxiliary tools can
help surveillance personnel quickly grasp the nature of business data and
activities. These include canned queries, summary reports, data
visualization in various forms, software filters in the form of early
warning indicators, alarm conditions, and so
on. Usually, these techniques require considerable
human expertise and active participation.
Also, they’re used in a sort of iterating
manner, where suspicious transactions are first identified and
then further investigated to locate the victims,
suspects, and their methods, which are then investigated to enable
prevention or gather evidence. As already remarked, fraud management is a
knowledge intensive activity. Therefore, applications of knowledge based
techniques from AI is a natural idea. Important AI techniques used for
fraud management include:
Other techniques
such as Bayesian networks, decision theory, and sequence matching are also used for fraud detection. Data analysis can be a strategic weapon in the
management and control of fraud Medical
Fraud An illustration of some of these techniques to
handle the problem of fraud detection in a hypothetical and highly
simplified medical insurance claims database is shown below. The database
(as maintained by the insurance company and populated from the claim
documents submitted by patients) consists of a single table and has the following format:
1. Patient ID
(SSN) 2. Sex (M/F) 3. Age (0 to
120) 4. Address 5. Claim
Date 6. Illness Category, Illness ID, and Illness
Description (may be more than one
illness) 7. Illness Duration Start Date - End
Date 8. Hospital
ID(s) 9. Doctor
ID(s) 10. IDs of diagnostic tests
performed 11. Names of medicines
given 12. Other treatments (for example,
physiotherapy) 13. Diagnostic tests
bills 14. Medicine
bills 15. Other treatment
bills 17. Doctors’
charges 18. Misc. amount (all other
costs) 19. Net
Amount. Let us evaluate whether a new specific claim is
“suspicious” in some way. If so, the claim can be processed in a different
way — cancel claim payment, proceed with claim payment, recall claim,
reduce payment amount, or seek clarification from hospital or
patient. For the purpose of evaluating a new claim, one
can often define various criteria or indices for
suspiciousness. For each criteria or index, the claim gets a
score; typically, high-score values in a specific index indicate greater
suspiciousness. Thus, a claim that has high scores for many criteria is
more suspicious. Examples of such
criteria include:
One can define many more such indices. All such indices have to be defined rigorously;the previous descriptions are merely indicative. Ideally, the fraud control system can provide a facility to dynamically define such indices outside the system so that enhancement is easily possible.
Because the indices represent knowledge about
the fraud detection in claims warranty data, a rule
language can capture it in a knowledge base. The
system can provide a facility that lists similar claims to the given claim
(for example, based on k-nearest-neighbour algorithms), along with a
similarity matching score. This facility would enable the
end user to evaluate the given claim with respect to similar claims. From
a pool of already known fraudulent claims, machine learning algorithms can
construct a classification (such as a decision tree) that can help
evaluate a new claim. As a simple example, you can check the disease
(illness) ID against the duration and costs. Using the historical claims
database, you can easily get a histogram of the hospital duration bins (0
to 2 days, 3 to 5 days, and so on) against the number of claims (this
histogram will be for a specific illness ID, sex, and age group). You can
then compare the claim duration against this histogram. If it falls in a
sparsely populated bin, then it’s at least a bit suspicious.
Clustering of historical data can be used to automatically detect such outliers. Several types of calculations can be performed for fraud detection, such as regression analysis and time-series analysis. In time-series analysis, the time-stamped data is analysed for trends, seasonal patterns, and outliers. The series is first transformed, if necessary, so that the variance is constant. Additional assumptions may be needed because the observations in claims data aren’t necessarily at regular time intervals.
Several time series in the claims data can be analysed using time-series
analysis techniques. For example, the NO_OF_CLAIMS and NET_AMOUNT (or any
other component of claim amount) for any specific or all hospitals, for
any specific or all illness IDs, and for any
specific or all
patients. Suppose X is the time varying quantity (NO_OF_ CLAIMS) in a
time-series; let X(I) denote value of X at time I. A graph can be used to
plot week-to-week changes (X(I+1) - X(I)) in the timevarying
quantity (NO_OF_ CLAIMS).
This can be used to quickly identify the
outliers. Another graph that plots week-to-week percentage
change is X100 * X(I+1)/X(I) in the time-varying
quantity. Auto-correlation and other techniques can be used to study these
time series. The following are some variables that are
important for fraud detection in the claims data. Multiple regression
analysis can be performed on chosen subsets of these
variables:
Some important temporal parameters for a claim
include CLAIM_DATE, illness start and end dates (duration of
illness). The difference between CLAIM_DATE and ILLNESS_START_DATE, called
CLAIM_DELAY, is an important independent
variable. The relationship between the two independent
variables NET_AMOUNT (on X-axis) and duration of illness (on Y axis) can
be shown in a scatter plot
(for only those claims for a specific illness ID). You may find, for
example, that most claims above Rs. 20,000 have a long duration of
illness. The Pearson correlation coefficient R for these two variables can
be computed easily and indicates how closely- related these two variables
are. Analysis of variance can be used to check if the mean duration of
illness is equal for, say, all hospitals.
Moreover, these comparisons can be done for
claims of different NET_AMOUNT bins. If not, further tests can be
performed for ensuring that there’s no special behavior by specific
hospitals. All such statistical analyses need to be studied
in-depth and defined for the specific tasks of fraud detection and control in the medical claims
domain. A large number of predefined statistical calculations are oriented to
detecting suspicious data. Fraud is an important phenomenon in today’s wired commercial world. Fraud causes huge losses and damages an organisation’s reputation and goodwill. Fraud management is a complex and knowledge-intensive process involving deployment and effective use of tools based on a plethora of statistical and AI techniques.
Source : Article by K. Paul Jayakar, the Author is the member of the institute. | |
|
| ||
|
|
| |
|
|
Rewards waiting for feedback
at | |
|
|
| |
|
|
||
|
|
| |
|
|
Disclaimer: We believe that the information contained in this e-zine is true. If you do not wish to receive Smart Trainee please click here. | |
|
|
||
|
|
| |
|
|
Click here to contact us, if you are unable to view the content properly | |
|
|
| |