Can you use ETL tools to manage sales compensation? Pranshu Kacholia April 22, 2014
Step 1 – Input profiling and validations
Open source ETL software Talend Data Integration offers a powerful alternative to excel for running sales compensation management systems. Featuring a simple drag and drop interface and flexibility to build complex logic using Java custom code, Talend is not only simple to learn and easy to use but also beats many other tools when it comes to processing performance.
The first step is to load various inputs into the system. Loading data in Talend is as simple as it can get. Just drop the component which corresponds to the type of the data source, configure that component, and Talend will do the rest. Talend imports data directly from databases and from a wide variety of file formats such as Excel, CSV, EBCDIC, XML, Text Delimited, HL7, JSON, etc. It also integrates with popular third-party applications such as salesforce.com, Marketo, and SugarCRM.
Once data has been imported, it is most important to ensure the data integrity of inputs. This is where the tremendous potential of Talend can be realized. Checks, as simple as ensuring that the Employee ID is always an 8-digit number to as complex as isolated cases in which sales of an employee exceed 10 times his previous month’s sales, can be built easily in the Talend process flow. Various checks can be designed and automated as part of your regular sales compensation process. Some of these are highlighted below so that you can start using such checks more effectively.
Referential integrity/Check for missing data
First, define a master list or identify a master file for each of the key dimensions, such as Roster for Employee_ID, Hierarchy for Geography_ID, etc. Loaded inputs during every run are then compared against these master files to identify any missing or extra data in the input files. For example, If the sales data contains no record of sales from territory x, their kick-out reports would indicate that perhaps data for territory x has not been received (see fig 1). Ideally, a master file for each field must be defined at the start of the process to ensure data completeness and correctness.
Similarly, there also are checks for missing data among different types of files. For example, if the roster contains a record of employee ‘Adam’ and the sales data doesn’t, this would be a feature in a validation report. Such checks have to be built once and then run every month automatically.
Fig 1. Checking if a particular territory is not present in the sales data.
Technical details: Sales data is grouped by region using the AggregateRow_3 component, and this list is matched with the existing Territories list in tMap_3. Inside the tMap_3, an inner join is done, and the matches are flagged.
Correct format and structure of data
You can predefine the schema (format) of all your input files and use Talend to generate errors where the file does not follow the pre-defined format. For example, you might define that the Employee ID will always be a 6-digit number or that the date of the sale will always lie in the month for which data is being processed. Such checks help in detecting manual data entry errors. These checks are extremely simple to build as they require only pre-defining the format.
Fig 2 shows a job that rejects all records which have an incorrect format and stores them in a file, and processes the remaining valid data.
Check for duplicates
Checking for duplicates is important to ensure the accuracy of data. Depending on the nature of the business and type of business process, we can define the duplicate check at any pre-defined combination level (for, eg. One record should exist for each Data Month-Employee ID-Role combination). Simple uniqueness checks, such as validating that a single sale record exists per OrderID, can be built in Talend to either remove duplicate records or highlight such instances. Here is a simple Talend job that de-duplicates data:
Fig 3. Checking duplicate records. In the UniqRow component, we specify the combination of columns that constitute a unique record.
Checks can be built to ensure that number of records loaded matches with the number of records actually processed by the upstream system. This ensures that the data is complete and has not been corrupted during transmission.
These checks highlight specific records with changes that defy historical trends. We can define thresholds to ensure that abnormal values are at least highlighted and are reviewed for consistency before processing them. For example, let’s say that the average sales per month for the last 12 months is 90k. However, sales data received this month drops to 30k. There is a real possibility of this data being incomplete or some other issue with the input file or the upstream system, and detecting this in the validation stage can help prevent the processing of incorrect data. Such analyses are particularly easy to perform for Pharma companies since they receive historical data every month along with the current month’s data. Row-level validations, such as thresholding the revenue of a single order between $100 and $5k can also be automated. Additionally, each of these inputs is summarized using various levels of descriptive statistics and visualizations to identify any other anomalies in the data. A summary report with figures such as maximum revenue of any order, highest contract length, average rep order volume, etc., can easily be generated after the data is processed, and abnormalities will pop out in summary.
Business logic checks
Checks specific to the business process can also be built in a similar way as sanity checks. For example, a check that kicks out whenever the roster file shows an employee from a certain region reporting to a manager of a different region can be easily built. An analysis of the span of control for each of the managers can be built to highlight the manager who is particularly over-leveraged.
In this post, we focused on Incentius’ validation framework and how validations can be designed and implemented on Talend so as to ensure the accuracy of input data before processing final sales compensation. This approach helps increase the accuracy of sales compensation, reducing reruns and saving your analysts valuable time.
Watch out for our subsequent blog posts on this topic. They will include techniques to actually build a sales compensation plan on Talend and build effective output validations and analysis.
If you got this far, we think you’d like our future blog content, too. Please subscribe on the right side.