Validity

Contributed by Charlene Castillo on 19th of January 2015 06:55:28 PM

A Discussion on the Dimensions of Validity

Harrison Assessments has researched, documented and validated the instrument during the past 24 years and ranks among the highest in the industry for construct, face, criterion, and test-retest validity. There are several “types” of validity, and all must be seriously considered for use of an assessment.

Face validity refers to what the test appears to measure. If a test has face validity it "looks valid" to the applicants or employees who take it, the HR professionals who choose it, and the recruiters or line managers who use it. Face validity refers to assessment questions that are work-related and report results that appear to relate to the requirements of the specific job. Questions that are job-related will make a much better impression on applicants and employees and will more likely enable the test to predict job success. Reports that are job-specific and provide an overall score will help recruiters to make better employment decisions and help coaches to more effectively guide employees to better performance. Behavioral assessments that focus on job-related questions (especially preferences) also provide the benefit of making the assessment more easily transfer across cultures because generalized personality questions nearly always have culturally influenced significance that makes answers to such questions quite different across cultures. Having strong face validity also has a big advantage of helping to protect against lawsuits and greatly reduces the burden of having to prove that an organization is not unfairly discriminating against specific races, genders or age groups.

Criterion validity is generally considered to be the most important aspect of validity for employment assessments. It indicates the degree to which a set of scores from test results relate to job performance. A strong relationship indicates that the assessment is likely to predict job success for that job. It is important to note that it is only related to the test when used to assess for a specific job and not the test in general. Criterion-related validity for one job does not necessarily indicate criterion-related validity for another job, because a test that can reasonably predict success for one may not do so for different jobs. As previously discussed, if the test does not produce an overall score related to specific jobs, there is no way to determine if it predicts job success and thus there is no way to show (or know) if the test has criterion validity.

Criterion validity is determined using correlation coefficients, which show the degree of relationship between tests results (overall score) and job performance. A correlation coefficient of 1.0 indicates a perfect correlation in which the overall score of the test matches perfectly with the performance scores of the employees. This is nearly impossible to achieve since performance scores themselves are not perfect and there is no need for the test to predict the exact level of performance. For example, a test result of 95% is close enough if the performance is 90%. Being reasonably close in the large majority of cases is considered quite useful and will generally be achieved by a 0.5 correlation. A correlation coefficient of 0 indicates that there is no correlation whatsoever between the assessment score and performance.

Extensive research has shown that structured interviews without any assessment have a correlation of 0.2 and therefore any assessment should have a correlation of at least 0.2 in order to be useful. A correlation coefficient of 0.5 will generally predict performance quite well and thus is considered a strong correlation.

Keep in mind that most assessments are only intended to measure part of the factors related to job success and thus the most important measure is how all assessment components, including the interview, combine to predict job performance. Knowing the degree that each assessment segment predicts success (correlation coefficient) enables you to know how much weight (if any) to put on the various assessment segments. For example, if one uses a behavioral assessment that has an independent 0.5 correlation, greater weight should be placed on the behavioral assessment than the interview results. However, keep in mind that the sample size is also important.

Criterion-related validity is vitally important because it tells us how well the assessment works. Due to modern computer technology, criterion-related validity is much easier to determine and should be routinely evaluated for jobs in which you have more than thirty employees.

Construct validity examines the question: Are the assessment method and results consistent with the related theory or concept the assessment intends to measure? Construct validity methods are very complex and technical and are used during test construction. To evaluate employment tests, construct validity is similar to evaluating the quality of the engine of a race car as compared to criterion-related validity, which is like timing the race car in a race. The important and reliable means to determine the value of an employment assessment is criterion-related validity.

Construct validity does not necessarily indicate that the test is effective for employment purposes. Some of the oldest and most popular personality style tests have extensive construct validity but do not predict job success. In many countries those tests are not legal to use for recruitment but, unfortunately, they are still used in many cases. While they have the benefit of stimulating discussion related to teams, it would be far better to use assessments that stimulate reflection and discussion on real performance issues related to teams and individuals.

Test-retest is a method to determine the reliability of tests. It is determined by testing a group of people and then retesting them after a period of time (generally three months to one year later) to determine the consistency of the results. This can be a useful measure to confirm that an assessment consistently produces the same results. However, it does not have nearly the importance as criterion-related validity because it is extremely unlikely that a test that has low reliability will predict job success. Many personality tests measure only general personality patterns, or personality types, which tend to change very little over time and thus they have strong test-retest results. However, many of these tests have no predictive accuracy for specific jobs. Having a strong test-retest correlation does provide some confidence that the results are measuring something that stays reasonably constant.