As a spreadsheet application, Microsoft Excel helps organize and analyze vast data sets. However, working with data also exposes us to errors or mistakes, even though they may be unintentional. This article examines these data-related mistakes or inconsistencies and explores various ways to find discrepancies in Excel and resolve them.
Data Discrepancies in Excel: A Brief Overview
Data discrepancy is the lack of evenness or mismatch in data or sets of data. Incorrect data in Excel can be categorized into two broad categories, namely:
- Formatting Errors: Formatting issues like undue inclusion of spaces, improper date or currency formats, or number styles can significantly impact the accuracy and consistency of the data.
- Duplicate, missing, or incorrect entries: Manual input mistakes and repetitive or missing entries can also lead to incorrect data representations, thereby contributing to data discrepancies.
How to Find Discrepancies in Excel (inconsistent Data)
Various methods are available in Excel that can help us detect data inconsistencies, as mentioned below:
- Manual Reviews
- Sort and Filter Option Usage
- Conditional Formatting
- Advanced Excel Functions
Each of the above has its distinct feature that can be used based on the size of the data set to be checked.
1] Manual Reviews
Manually reviewing the data in the Excel sheets or files can also help identify and address data discrepancies. To review the data manually, close attention must be paid to the data contained in the cells to ensure the process is effective. However, this will only be feasible for a smaller set of data.
2] Sort and Filter Option Usage
By utilizing the data filter option in Excel, discrepancies can be detected easily, especially while working on larger data sets. The filter option allows us to see all the unique values in a column, making it easier to isolate the incorrect ones.
It is generally used to detect formatting errors, type mistakes, or missing values to determine or detect data discrepancies.
- Select the header row to apply the filter, and click on the Sort and Filter option under the Home tab.
- Click on the Filter drop-down arrow on the column beside each header (in this case, Name, Gender, Designation).
- The drop-down menu will list all the unique values in that particular column.
- From the above list, the incorrect entries can be identified. Make a note of the incorrect entries that need to be rectified.
Once the errors have been pointed out, the below-mentioned steps can be undertaken to make the necessary corrections:
- Click on the Find and Replace option under the Home Tab
- Enter the incorrect entry under Find What and the correct one under Replace With, and then click on Replace All. Repeat the same process for all incorrect entries.
- Click OK in the dialogue box that appears to confirm the changes.
In the above example, the discrepancies in the Gender column have been corrected using the Sort and Filter option.
3] Conditional Formatting
Conditional Formatting can be an effective way to identify and highlight discrepancies in Excel sheets. This feature allows us to use the in-built conditional options or create customized rules highlighting data inconsistencies.
- Select the header row to which the formatting will be applied.
- Click on the Conditional Formatting option under the Home tab.
- Select the rule type and the rule. In this case, we choose Highlight Cell Rule -> Duplicate Values.
- To highlight the duplicate values, we choose the Duplicate option to mark the values with Light Red Fill with Dark Red Text.
In the above example, we have two columns with the Employee ID’s of Sales and Ops Team. Since an employee cannot be in both departments simultaneously, through Conditional Formatting, we highlight if there have been any such repetitions.
Once the duplicate values have been identified, the necessary corrections can be made accordingly.
4] Advanced Excel Functions
Functions in Excel can be significantly effective in identifying data errors.
IF and IS Functions
The IF and IS functions can be used to compare the values in cells and identify their differences, if any. Also, the IF function can be combined with IS functions like ISNUMBER, ISNA, etc., for analyzing and detecting data-related errors or discrepancies.
The IF function can be used to check and compare values in 2 cells and display an error message if any error is found during the process. For example, if we want to compare the values in cells A1 and B1 and the results accordingly, we can use the below-mentioned formula to do the same:
=IF(A1<>B1, "Mismatch", "Perfect Match")
Select the 1st cell containing the above formula and drag the fill handle across the cells for it to apply to the relevant cells. Based on the findings, the above formula checks the data in cells A1 and B1 and displays a Mismatch or Perfect Match.
The IF and IS functions can be combined to check for data inconsistency through logical tests. For example, if we need to check if the value of a particular cell is a number or not, we can use the below-mentioned formula:
=IF(ISNUMBER(A1), "Valid", "Invalid")
This formula checks if the content of cell A1 is a number or not and returns VALID. If it is so, it returns INVALID. This can be particularly helpful in situations wherein the wrong type of data is entered in a column (for example, text is entered in a column that should have numbers, ideally). The above formula is in its simplest form.
However, this logical test can be customized based on your specific criteria to check for data discrepancies.
Identifying duplicate values between two columns in single or multiple worksheets can be done effectively using Excel’s VLOOKUP and HLOOKUP functions.
VLOOKUP, or Vertical Lookup, allows us to search for values in one column that appear in another, helping us locate duplicate records.
Continuing on the same example used for Conditional Formatting, we can use the VLOOKUP function to check for any repetition of the same ID in both columns to identify any duplication.
The simplest form of the function would be:
VLOOKUP (what to look for, where to look for, column number containing the value, return an approximate or exact match- denoted by TRUE or 1/FALSE or 0)
To build up the formula for the VLOOKUP function, the below-mentioned points need to be considered:
- The lookup value or the value that needs to be searched.
- The data range is where the lookup value is located. The lookup value must always be in the first column of the range for the function to work correctly. For example, if the value is in column B, the range should start with B.
- The column number in the range containing the return value must be mentioned. If we check for duplicate records in Column C and mention the data range as B3:C10, the column number to be mentioned in the formula would be 1 (since column B is the 1st column in the data range).
- Optionally, we can mention TRUE or FALSE, depending on whether we need an approximate or exact match.
- Hence, the formula, in this case, can be written as:
Note: VLOOKUP can look for data and matches only on its right side
Checking for data discrepancies between two rows in an Excel file can also be done using the Match(). The function can be handy for checking both case-sensitive and insensitive data discrepancies.
The MATCH function in Excel is typically used to find the position of a specified value within a range. This function can be handy if we are trying to check inconsistency in text data but have the option to ignore mismatches based on case-sensitive text. The syntax of the function can be simplified to:
Match (what to search, where to search, match type)
The function is similar to VLOOKUP as far as the first two arguments are concerned. However, the third argument or match type determines if we should find the first value exactly equal to what we are searching (0) or the largest one less than or equal to the value being searched (1).
In the example below, we check if any of the data in List 2 is available in List 1. Only in case they are unavailable, Not in List 1 would be displayed as the findings in the Result column, irrespective of the lower or upper case alphabet. The formula, in this case, would be written as:
=IF(ISNA(MATCH(E5,D:D,0)), "Not in List 1", "")
The ISNA function in Excel is an error-handling function that checks for formula errors. In this case, if the data in List 2 is found in List 1, it will display a blank instead of #N/A, unlike the VLOOKUP example above.
5] Spreadsheet Comparison Tool
The Spreadsheet Comparison feature in Excel can be used to check and track changes made on two versions of the same workbook. The option can be handy to track undesired changes made to the data contained in the worksheets to rectify the discrepancies arising from such changes.
Note: Spreadsheet Compare is only available with Office Professional Plus 2013, Office Professional Plus 2016, Office Professional Plus 2019, or Microsoft 365 Apps for enterprise.
- Click on Spreadsheet Compare after searching for the same from the Start Menu.
- Click on Compare Files at the top left corner.
- Select the path of the original or first version of the file by clicking on the Folder icon beside Compare.
- Select the file path to be compared beside the TO option, then click OK.
- Select the options that need to be checked for changes (Entered Values, Formulas, etc.). In this case, we can select the option for Entered Values.
In this example, we create a file Compare1 with details of the employees and Compare2 after making a few changes to the Compare1 file to demonstrate the above feature.
- Two files with the same name can also be compared in different folders.
- While entering the file’s path in the Compare Files window, the website URL where the file is saved can also be entered.
- This option can also compare hidden files or workbooks.
- The Spreadsheet Compare option is available only in Office 2013, 2016, 2019, 2021 Professional Editions and Microsoft 365.