How to design and Create an apex job to process a million records

If you are using an enterprise org and have a requirement to process a million records in an automated fashion, apex job provides the solution to process those records. But at the same time, we need to be careful of the governor limit and use the following considerations to design an apex job which would scale and process million records easily.

1. Designing the query locator
For any reports or any processing, we need a soql query . We need to create a soql query and set it at the query locator. Now
a. If you are having multiple objects which need to be queried, you need to make sure that there is a relationship defined between the objects which can be a lookup or master detailed. This would help to query more than one custom object.
b. There could be scenarios where there are 2 custom objects which may not have relationship defined but still some loop up data exists in common between the objects. For eg if we have to process say a count of applications for each state or county and let us say the application is a custom object and there are some check box fields which are flagged for type like change in address, change in birth date etc on the application object and the application object has a look up code for the county custom object which has a id and name and description. Now in this scenario, we might have to query the counties first and then loop through the application object in the execute method. So the design principle is to choose the lookup custom object in your query locator and use the data entity like application, order in the execute method which would help to get the result done quick.

return Database.getQueryLocator([
SELECT LKUP_CD__c, LKUP_GRP_CD__c, SORT_ORD__c FROM CNFG_SELECT_OPTIONS__c
where LKUP_GRP_CD__c =:COUNTY_GRP_CODE
AND LANG_CD__c =: LANG_CD and
LKUP_CD__c!= 'SEL'
ORDER BY LKUP_CD__c )]

2. Implementing the execute batch and bypass governor limits.
If you use the execute method, you would know that salesforce has the capability to create a separate thread for each execution of the batch for a maximum of 2000 records. So salesforce would break the query into 2000 records and run several batches.
Things to watch out for.
1. If you use soql queries inside the loop in the execute method, you would end up with 200 maximum soql limit exception. So to avoid this, what you need to do is limit the scope to 50 or 100. To calculate this, let us say if you have 10 soqls inside the for loop, then your maximum scope limit would be 20. To do this , you would use the following code snippet while executing the batch.

Database.executeBatch(batch, 9);

2. If you are doing counts or using aggregate functions to do some reporting and let us say there is one custom object where you would have to group all the fields and get a count, you would try to use one soql query to get all the counts.

SELECT Count(ASSET_CHG_SW__c) cnt , Count(ADD_CHG_STAT_SW__c) addchg,
FROM RMC_RQST__c
WHERE application__r.APP_CPLT_TMS__C >=: rptStartDate AND
application__r.APP_CPLT_TMS__C <=: rptEndDate AND application__r.cnty_num__c =: countyCode and ( ASSET_CHG_SW__c = true OR ADD_CHG_STAT_SW__c = true) group by application__r.cnty_num__c, ASSET_CHG_SW__c , ADD_CHG_STAT_SW__c;

Design Principle
If you want to use count fields on a custom object, ensure that the field is not nullable. In other words, checkbox, radio button fields will not work on count(Field) queries. So you need to make a separate call like count(id) to get the count values.
3 . Never use dml statements inside the execute batch as you are going to hit the 150 DML LIMIT exception. Use a list and add all the objects.
4. If you do any delete in the execute batch method which is executed as part of the loop, ensure that the deleted record is not added to the list where you would do the upsert or update at the end in the final method.
5. If you want to have the line numbers printed on your stack trace in the execute method which would fail, then always put a try catch block at the start of the execute method and in the catch exception, assign the exception error message to a string and log it to a custom object or email.
6. Create a custom object called BatchMaster which would have job id, start date, end date, exception message and count of records to indicate the number of records processed in the batch.

catch(Exception e)
{
System.debug('Exception in executing Batch BTCH CHG RPT'+e);
isBatchRunSuccesfully = false;
batchErrMessage = e.getMessage();
}

3. Reporting and debugging at the final method.
The final method is the method which would be executed at the end. So here are things which you want to do on the final method.
1. Create an email method to send an email with batch status, number of records processed, input passed etc which would atleast tell you in a quick glance whether the apex job succeeded.
2. Do the dml statements on the final method by using insert , upsert or update batch methods which would take list as input.
3. Update the batch status at the end of the final method if you are using a custom object.

So to summarize, here is a quick list.
a. Use the scope parameter to limit the number of records in the execute batch to avoid soql query limit.
b. The query locator can scale to 50 million records and would process 2000 records in a batch.
c. Do the dmls in the final method and send an email on the job status.
d. Do a try catch on the execute batch get the exception error message on to a custom object.
So please let me know if these principles helped you to design your batch job better.
Do not do list
1. avoid dmls in the execute batch and use a list to populate it.

Leave a Reply

Your email address will not be published. Required fields are marked *

Get free tips on Salesforce
Get free tips on Salesforce
We respect your privacy.