Session Overview
 
Date: Thursday, 23/Jul/2015
8:30am - 9:30amK1: Assessing and Changing Cognitive Processes in Addiction
Session Chair: Victor J. Rubio
Reinout W. Wiers (University of Amsterdam, Netherlands)
KO2-F-180 (Ⅵ) 
9:45am - 11:15amIS1: Is “Q-short” a Useful Approach for Psychological Assessment? Pitfalls and Opportunities of Short Questionnaires for the Measurement of Psychological Constructs
Session Chair: Christoph J. Kemper
KO2-F-180 (Ⅵ) 
 

Is “Q-short” a useful approach for psychological assessment? Pitfalls and opportunities of short questionnaires for the measurement of psychological constructs

Chair(s): Christoph J. Kemper (University of Luxembourg, Luxembourg)

In recent years, the development and application of short questionnaires (“Q-short”) for psychological constructs has been gaining pace. At present, short measures are widely-used for psychological assessment in diverse domains, e.g. personality, social or I/O psychology, psychopathology, social and educational science, and behavioral economics, as well as diverse assessment settings such as research and practice. Their popularity is largely due to the promise of higher efficiency of measurement, lower cost, lower respondent burden, and higher data quality. Besides these obvious advantages, there is also considerable criticism for using short questionnaires of psychological constructs leaving researchers and practitioners in limbo concerning the choice of an appropriate measure for their assessment setting. As the criticism mainly pertains to the methodology of short scale development, the symposium focuses on the construction process. Presenters demonstrate and/or compare construction strategies such as manual and automated approaches (e.g. Ant Colony Optimization) or top-down strategies (starting with a longer version of a scale) and bottom-up strategies (starting with a single item to which further items are gradually added) using empirical as well as simulated data. Aim of the symposium is to make recommendations for the development, validation, and application of short questionnaires in research and applied settings.
 

Presentations of the Symposium

 

Assessing personality and situation perception at the same time, in a short time

Matthias Ziegler1, Kai Horstmann1, Marko Vetter2; zieglema@hu-berlin.dezieglema@hu-berlin.de
1Humboldt-Universität zu Berlin, Germany, 2Schuhfried GmbH, Austria

The idea of interactionism suggests that human behavior is caused equally by the situation and the personality of the actor. However, personality tests and situation perception are scarce. Here, the B5PS, a test capturing the Big 5 and 42 facets as well as 5 dimensions of situation perception (Situation 5) is used as starting point for the development of a short test. This short test yields scores for the Big 5 and the Situation 5 and was evaluated using a representative sample of 400. During the talk, the construction strategy paying specific attention to the nomological network of the constructs assessed will be explained. Moreover, evidence for the psychometric quality of the short test will be reported and compared with the original test (criterion, convergent, discriminant, and factorial validity as well as construct and test-retest reliability). The mixed-method approach applied here can be generally applied in test construction and can serve as a best practice example.
 

Following the ants: Pros and cons of Ant Colony Optimization (ACO) for short scale development

Anne B. Janssen1, Martin Schultze2, Adrian Grötsch3; a.janssen@jacobs-university.dea.janssen@jacobs-university.de
1Jacobs University Bremen, Germany, 2Freie Universität Berlin, Germany, 3Technische Universität Braunschweig, Germany

The present study was aimed at constructing useable, reliable, and valid short scales of two measures assessing proactive personality and supervisor support. For this purpose, we compared Ant Colony Optimization (ACO; Leite et al., 2008) and classical item selection procedures. ACO is algorithm-based, and selects and compares sets of items according to defined criteria. For proactive personality, the two selection procedures (ACO and classical item selection) provided similar results. Both five-item short forms showed a satisfactory reliability and a small, however negligible, loss of criterion validity. For a two-dimensional supervisor support scale, ACO found a reliable and valid short form. Psychometric properties of the short version were in accordance with those of the parent form. A classical short form for supervisor support revealed a rather poor model fit and a serious loss of validity. Benefits and shortcomings of ACO compared to classical item selection procedures and recommendations of ACO application are discussed.
 

Best practices in short scale development: Comparing state-of-the-art methods using simulated and empirical data

Peter M. Kruyen1, Constanze Beierlein2, Beatrice Rammstedt2; p.m.kruyen@gmail.comp.m.kruyen@gmail.com
1Radboud University Nijmegen, The Netherlands, 2GESIS Leibniz Institute for the Social Sciences, Germany

Psychological constructs have attracted increasing attention as valuable predictors of social phenomena. However, most psychological measures include too many items to be practically useful in large-scale research. Because of this, researchers often remove items from these long measures. By doing so, many researchers rely on well-known techniques such as maximizing coefficient alpha. Research has shown, however, that these strategies may result in serious deficiencies. Recently, psychometricians have developed sophisticated methods that are believed to result in sound short scales. From the viewpoint of practitioners, there seems to be little guideline on how to choose and apply these new techniques to optimally shorten a scale.

Against this background, the aim of our talk is three-fold: First, we explain the limitations of old approaches. Subsequently, we will introduce several state-of-the-art procedures. In this context, we will compare “top-down” and “bottom-up” strategies. Top-down approaches refer to item-selection process which start with a longer version of a scale. “Bottom-up” approaches, in contrast, start with a single item. Here, scale components are gradually added. We also distinguish between manual and automated approaches. We use both simulated and empirical data to evaluate these different methods. Finally, we provide recommendations for using appropriate procedures for constructing short measures.
 
11:45am - 1:15pmPD: Potential Impact of the Revised EFPA Review Model for the Description and Evaluation of Psychological and Educational Tests
Chair: Dave Bartram
Discussants: Johnny Fontaine, Mark Schittekatte, Fons van de Vijver
KO2-F-180 (Ⅵ) 
2:30pm - 3:30pmK2: Assessment of Personality Disorders in DSM-5
Session Chair: Daniel Leising
Robert Krueger (University of Minnesota, USA)
KO2-F-180 (Ⅵ) 
4:30pm - 6:00pmIS2: On the Effect of Item Positions in Tests
Session Chair: Karl Schweizer
Session Chair: Siegbert Reiß
KO2-F-180 (Ⅵ) 
 

On the effect of item positions in tests

Chair(s): Karl Schweizer (Goethe University Frankfurt, Germany), Siegbert Reiß (Goethe University Frankfurt, Germany)

The item-position effect is usually observable if test takers complete a homogeneous set of items that constitute a psychological scale because successively completing a number of items that are demanding to the same ability or trait is modifying performance. The repeated call of the same cognitive processes can involve automation, facilitation, clustering, maintenance of information and learning. The consequence is an increasing degree of dependency among the responses to the successively presented items. It means an increasing degree of consistency in responding from the first to last items. Although this effect has been known for quite a time, the major models of measurement do not take it into consideration.

The presentations will provide further evidence of the item-position effect regarding different psychological scales and inform about new developments in improving the representation and investigation of it. There will be reports of the item-position effect in Advanced Progressive Matrices, Cattell’s Culture Fair Test and Viennese Matrices Test. The new developments will encompass the IRT and CFA approaches. These new developments aim to enable more appropriate representations of the item-position effect and better ways of separating what is due to the effect and what is a pure representation of the construct.
 

Presentations of the Symposium

 

The impact of the position effect on the factorial structure of the Culture Fair Test (CFT)

Stefan J. Troche1, Felicitas L. Wagner2, Karl Schweizer3, Thomas H. Rammsayer2; Stefan.Troche@uni-wh.deStefan.Troche@uni-wh.de
1Private Universität Witten/Herdecke, Germany, 2University of Bern, Switzerland, 3Goethe-University Frankfurt, Germany

The Culture Fair Test (CFT) is a psychometric test of fluid intelligence consisting of four subtests; Series, Classification, Matrices, and Topographies. The four subtests are only moderately intercorrelated, doubting the notion that they assess the same construct (i.e., fluid intelligence). As an explanation of these low correlations, we investigated the position effect. This effect is assumed to reflect implicit learning during testing. By applying fixed-links modeling to analyze the CFT data of 206 participants, we identified position effects as latent variables in the subtests; Classification, Matrices, and Topographies. These position effects were disentangled from a second set of latent variables representing fluid intelligence inherent in the four subtests. After this separation of position effect and basic fluid intelligence, the latent variables representing basic fluid intelligence in the subtests Series, Matrices, and Topographies could be combined to one common latent variable which was highly correlated with fluid intelligence derived from the subtest Classification (r=.72). Correlations between the three latent variables representing the position effects in the Classification, Matrices, and Topographies subtests ranged from r=.38 to r=.59. The results indicate that all four CFT subtests measure the same construct (i.e., fluid intelligence) but that the position effect confounds the factorial structure.
 

The position effect in a Rasch-homogenous test: A fixed-links modeling approach

Philipp Thomas1, Thomas H. Rammsayer1, Karl Schweizer2, Stefan J. Troche3; philipp.thomas@psy.unibe.chphilipp.thomas@psy.unibe.ch
1University of Bern, Switzerland, 2Goethe University Frankfurt, Germany, 3Private Universität Witten/Herdecke, Germany

The position effect describes the influence of just-completed items in a psychological scale on subsequent items. This effect has been repeatedly reported for psychometric reasoning scales and is assumed to reflect implicit learning during testing. One way to identify the position effect is fixed-links modeling. With this approach, two latent variables are derived from the test items. Factor loadings of one latent variable are fixed to 1 for all items to represent ability-related variance. Factor loadings on the second latent variable increase from the first to the last item describing the position effect. Previous studies using fixed-links modeling on the position effect investigated reasoning scales constructed in accordance with classical test theory (e.g., Raven’s Progressive Matrices) but, to the best of our knowledge, no Rasch-scaled tests. These tests, however, meet stronger requirements on item homogeneity. In the present study, therefore, we will analyze data from 239 participants who have completed the Rasch-scaled Viennese Matrices Test (VMT). Applying a fixed-links modeling approach, we will test whether a position effect can be depicted as a latent variable and separated from a latent variable representing basic reasoning ability. The results have implications for the assumption of homogeneity in Rasch-homogeneous tests.
 

Predictors of an individual decrease in test performance during the PISA assessments

Johannes Hartig1, Janine Buchholz1, Dries Debeer2, Rianne Janssen2; hartig@dipf.dehartig@dipf.de
1DIPF, Germany, 2KU Leuven, Belgium

Item position effects have been shown repeatedly in large-scale assessments of student achievement. In addition to a fixed effect of items becoming more difficult during the test, there are individual differences related to this effect, meaning that students differ in the extent to which their performance declines during the test. These interindividual differences have been labelled as “persistence” in previous studies. The present study aims at gaining a better understanding of the nature of these differences by relating them to student characteristics. The analyses make use of the the PISA 2006 and 2009 assessments on science and reading, respectively, using data from several European countries. Gender, the language spoken at home, the socio-economic status, the motivational scales “effort thermometer” (2006 assessment), and the “joy of reading” (2009 assessment) were used as predictors for persistence. Position effects and persistence are modelled by a logistic multilevel regression model which is equivalent to an extension of the Rasch model.  Effects of gender, language, and reported test effort are inconsistent across countries, e.g. girls have a higher persistence only in some countries. The effect of the reported joy of reading is small but consistent across all countries, indicating that at least part of the individual differences is caused by individual differences in subject-specific motivation.
 

Modeling response omissions in tests using a tree-based IRT approach

Dries Debeer, Rianne Janssen; Rianne.Janssen@ppw.kuleuven.beRianne.Janssen@ppw.kuleuven.be
KU Leuven, Belgium

Reported item position effects in large-scale assessments often pertain to an increased item difficulty towards the end of the test and to respondents differing in their level of persistence completing the test. Both phenomena may be partly due to the increased occurrence of missing responses towards the end of the test and individual differences therein. In fact, two types of missing responses are possible, respondents may omit certain items well before reaching their last answered item, leading to “skipped items” and  respondents may not complete the entire test and drop out before the end of the assessment, leading to “not-reached items”. Both types of missing responses may be related to the proficiency of the respondent, and therefore, cause non-ignorable missingness. Several studies have proposed ways to deal with these missing responses. In the present paper, an IRTree-based approach will be presented in which both types of missing responses are modeled together with the proficiency process. The IRTree models can be applied to both power and speed tests and are modeled fairly easily. Apart from results of several simulation studies, the analyses of a speed test on mental arithmetic from a Flemish national assessment will be discussed.
 

On the search for the best possible representation of the item-position effect: A simulation study based on APM

Florian Zeller, Siegbert Reiss, Karl Schweizer; Florian.zeller@outlook.comFlorian.zeller@outlook.com
Goethe University Frankfurt, Germany

The item-position effect describes the impact of prior completed items on the following items. In previous studies the item-position effect was represented by constraints reflecting functions, for example a linear function. This kind of representation was inflexible regarding the specificities of the items, and therefore, there was the question whether this is the best possible way of representing the effect. Accordingly, our aim was to optimize the representation of the item-position effect in considering the items of Raven’s Advanced Progressive Matrices (APM). We disassembled the 36 APM items into two, three, four, and six same–sized subsets of neighboring items for separate investigations. Analyses were conducted by means of data that were simulated according to the covariance matrix of the APM items based on the data of 530 participants. Similar to former studies we used fixed-links models for testing different representations of the item-position effect. Besides the standard model with only one latent variable we analyzed linear, quadratic and logarithmic trends of the item-position effect. The results revealed an increase of true variance from the first to last items, just as expected. But the course of increase varied in slope.
 
6:15pm - 7:45pmMembers Meeting
KO2-F-180 (Ⅵ) 

 
Date: Friday, 24/Jul/2015
8:30am - 9:30amK3: Ambulatory Assessment: Promises and Challenges
Session Chair: Tuulia M. Ortner
Ulrich Ebner-Priemer (Karlsruhe Institute of Technology, Germany)
KO2-F-180 (Ⅵ) 
9:45am - 11:15amIS3: Recent Methodological Developments for Testing Measurement Invariance
Session Chair: Carolin Strobl
KO2-F-180 (Ⅵ) 
 

Recent methodological developments for testing measurement invariance

Chair(s): Carolin Strobl (Universität Zürich, Switzerland)

This symposium gives an overview over recent methodological developments for testing measurement invariance in item response theory, factor analysis, and cognitive diagnosis models.
 

Presentations of the Symposium

 

Detecting violations of measurement invariance in item response theory

Carolin Strobl1, Julia Kopf2, Basil Abou El-Komboz2, Achim Zeileis3; carolin.strobl@uzh.chcarolin.strobl@uzh.ch
1Universität Zürich, Switzerland, 2LMU München, Germany, 3Universität Innsbruck, Austria

The main aim of educational and psychological testing is to provide a means for objective and fair comparisons between different test takers by establishing measurement invariance. However, in practical test development measurement, invariance is often violated by differential item functioning (DIF), which can lead to an unfair advantage or disadvantage for certain groups of test takers. A variety of statistical methods has been suggested for detecting DIF in item response theory (IRT) models, such as the Rasch model, that are increasingly used in educational and psychological testing. However, most of these methods are designed for the comparison of pre-specified focal and reference groups, such as females vs. males, whereas in reality the group of disadvantaged test takers may be formed by a complex combination of several covariates, such as females only up to a certain age. In this talk, a new framework for DIF detection based on model-based recursive partitioning is presented that can detect groups of test takers exhibiting DIF in a data-driven way. The talk outlines the statistical methodology behind the new approach as well as its practical application for binary and polytomous IRT models.
 

Score-based tests of measurement invariance with respect to continuous and ordinal variables

Achim Zeileis1, Edgar C. Merkle2, Ting Wang2; Achim.Zeileis@uibk.ac.atAchim.Zeileis@uibk.ac.at
1Universität Innsbruck, Austria, 2University of Missouri, USA

The issue of measurement invariance commonly arises in psychometric models and is typically assessed via likelihood ratio tests, Lagrange multiplier tests, and Wald tests, all of which require advance definition of the number of groups, group membership, and offending model parameters. We present a family of recently-proposed measurement invariance tests that are based on the scores of a fitted model (i.e., observation-wise derivatives of the log-likelihood with respect to the model parameters). This family can be used to test for measurement invariance w.r.t. a continuous auxiliary variable, without pre-specification of subgroups. Moreover, the family can be used when one wishes to test for measurement invariance w.r.t. an ordinal auxiliary variable, yielding test statistics that are sensitive to violations that are monotonically related to the ordinal variable (and less sensitive to non-monotonic violations). The tests can be viewed as generalizations of the Lagrange multiplier (or score) test and they are especially useful for identifying subgroups of individuals that violate measurement invariance (without prespecified thresholds) as well as identifying specific parameters impacted by measurement invariance violations. We illustrate how the tests can be applied in practice in factor-analytic contexts using the R packages "lavaan" for model estimation and "strucchange" for carrying out the tests and visualization of the results.
 

Exact versus approximate measurement invariance. Theoretical overview and empirical examples

Jan Cieciuch, Eldad Davidov, René Algesheimer; jancieciuch@gazeta.pljancieciuch@gazeta.pl
Universität Zürich, Switzerland

Measurement invariance is a necessary condition for conducting meaningful comparisons of means and relationships between variables across groups (Vandenberg & Lance, 2000). Measurement invariance implies that the parameters of a measurement model (factor loadings, intercepts) are equal across groups. One of the most frequently used procedures for measurement invariance testing is multigroup confirmatory factor analysis (MGCFA) which compares the fit indices between models with parameters constrained to be equal across groups and those with freely estimated parameters. Three levels of measurement invariance are usually distinguished: configural (the same items load on the same factors in each group), metric (factor loadings are constrained to be exactly equal across groups) and scalar (factor loadings and the intercepts are constrained to be exactly equal across groups). Establishing measurement invariance in this approach is very difficult and this method has been criticized as being unrealistic and too strict. Muthén and Asparouhov (2013) recently proposed a new approach to test for approximate rather than exact measurement invariance using Bayesian MGCFA. Approximate measurement invariance permits small differences between parameters (loadings and intercepts) otherwise constrained to be equal in the classical exact approach. In the presentation we will discuss the main differences between the exact and approximate approaches to test for measurement invariance. Furthermore, we will compare results obtained in both approaches while testing for the measurement invariance of the Portrait Value Questionnaire developed by Schwartz and colleagues (2001, 2012) to measure values. The results suggest that the approximate measurement invariance seems to be more likely than the exact approach to establish measurement invariance which enables meaningful cross-group comparisons.
 

Differential item functioning in cognitive diagnosis models

Michel Philipp1, Carolin Strobl1, Achim Zeileis2; m.philipp@psychologie.uzh.chm.philipp@psychologie.uzh.ch
1Universität Zürich, Switzerland, 2Universität Innsbruck, Austria

Cognitive diagnosis models (CDMs) are a family of psychometric models for analyzing dichotomous response data. They provide detailed information about mastery or non-mastery of predefined skills, which are required to solve the tested items, and can thus reflect the strengths and weaknesses of the examinees in the form of a skills profile. In the context of educational testing,  this means that the students can be given detailed feedback, which particular skills they need to practice more, rather than only being reported their overall test performance. However, for reliable interpretation and fair comparisons these models also rely on measurement invariance, which may be violated in practice by differential item functioning (DIF). Taking the simplest version of a CDM, the non-compensatory DINA model, as an example, the talk introduces the general principles of CDMs, explains what DIF means in this context and presents an overview over recent approaches for detecting DIF in CDMs.
 
11:45am - 1:15pmIS4: Cross-Cultural Assessment
Session Chair: Fons van de Vijver
KO2-F-180 (Ⅵ) 
 

Cross-cultural assessment

Chair(s): Fons van de Vijver (Tilburg University, The Netherlands)

This symposium brings together modern developments in the area of cross-cultural assessments. The emphasis will go beyond traditional psychometric invariance testing. Papers will be presented on response styles, the use of ipsatization to address response styles, qualitative methods to assess bias, and the structure of emotions.
 

Presentations of the Symposium

 

Controlling for culture-specific response bias using ipsatization and response style indicators: Family orientation in fourteen cultures and two generations

Boris Mayer; boris.mayer@psy.unibe.chboris.mayer@psy.unibe.ch
University of Bern, Switzerland

Within-subject standardization (ipsatization) has been advocated as a possible means to control for culture-specific responding (e.g., Fisher, 2004). However, the consequences of different kinds of ipsatization procedures for the interpretation of mean differences remain unclear. The current study compared several ipsatization procedures with ANCOVA-style procedures using response style indicators for the construct of family orientation with data from 14 cultures and two generations from the Value-of-Children-(VOC)-Study (4135 dyads). Results showed that within-subject centering/standardizing across all Likert-scale items of the comprehensive VOC-questionnaire removed most of the original cross-cultural variation in family orientation and lead to a non-interpretable pattern of means in both generations. Within-subject centering/standardizing using a subset of 19 unrelated items lead to a decrease to about half of the original effect size and produced a theoretically meaningful pattern of means. A similar effect size and similar mean differences were obtained when using a measure of acquiescent responding based on the same set of items in an ANCOVA-style analysis. Additional models controlling for extremity and modesty performed worse, and combinations did not differ from the acquiescence-only model. The usefulness of different approaches to control for uniform response styles (scalar equivalence not given) in cross-cultural comparisons is discussed.
 

The qualitative assessment of bias: Contributions of cognitive interviewing methodology to the bias definition

Isabel Benítez Baena1, Fons van de Vijver2, José-Luis Padilla García3; ibenitez@ugr.esibenitez@ugr.es
1University of Granada, Spain, Tilburg University, The Netherland, 2Tilburg University, The Netherlands, 3University of Granada, Spain

Defining and assessing bias have been two of the main methodological topics in the cross-cultural field. Most of the attention has been paid to the development of statistical procedures to detect several kinds of biases and the interpretation of results in quantitative terms. However, qualitative procedures can be also useful for understanding the presence of bias when comparing different cultural or linguistic groups. The aim of this study is to illustrate potential contributions of Cognitive Interviews (CI) when investigating bias. On one hand, conclusions of integrating CI findings with quantitative data from analysing item bias will be presented by enhancing the advantages for understanding bias sources. On the other hand, utility of CI for extracting information of different levels of bias (item, method, and construct) will be described. The approach will be illustrated by studying responses and response processes of Dutch and Spanish participants to “Quality of Life” items from five international studies. The qualitative perspective of bias will be discussed as well as the potentiality of qualitative procedures for investigating bias, as single or as part of mixed methods studies.
 

The internal structure of the guilt and shame domain across cultures

Johnny Fontaine; Johnny.Fontaine@UGent.beJohnny.Fontaine@UGent.be
Ghent University, Belgium

Cross-cultural as well as in Western scientific literature is plagued with inconsistent theory development on the nature and the role of guilt and shame.  We present a large cross-cultural study that assesses these emotions on the basis of the componential emotion approach using two different methods, namely an episode and a frequency method. In total 3684 participants from 20 countries across the world rated appraisals, action tendencies, bodily reactions, expressions, and feelings in the three last episodes where they experienced a self-conscious emotion (episode method) and they also rated the frequencies of these emotional reactions in general (frequency method). Cultural stability of the internal structure was investigated by comparing classical principal component analysis with simultaneous principal component analysis. Both for the episodes and the frequencies a five-componential structure emerged stably across cultural groups. Four of the five components had the same meaning between the two methods, namely guilt, embarrassment, negative esteem of the self, and anger. The fifth component in the episode structure referred to the seriousness of the situation and the fifth factor in frequency structure could be interpreted as general distress. These cross-culturally stable internal structures allow for more consistent theorizing both in Western and in cross-cultural research.
 

Extreme response style in attitudinal and behavioral questions

Jia He1, Isabel Benítez Baena2, Byron Adams1, Fons van de Vijver1; Fons.vandeVijver@uvt.nlFons.vandeVijver@uvt.nl
1Tilburg University, The Netherlands, 2University of Granada, Spain, Tilburg University, The Netherland

This paper investigated the cross-cultural similarities and differences of extreme response style (ERS) extracted from self-report data of attitudinal and behavioral questions. Data of a subsample of 3,255 young adults with different immigration and racial backgrounds in the third wave of the UK household survey were analyzed. For each participant, responses to items concerning general mental health and identity were used to extract one ERS index for attitudinal questions and responses of items concerning family and school activities were used to extract the ERS index for behavioral questions. The two indexes were positively correlated, indicating similar response style preference in different types of questions. The pattern of cross-cultural mean differences were similar in both indexes, with minority groups (immigrants, Asian, African, Black, Caribbean, and mixed-race in UK) showing higher extreme response style compared with the majority group (nonimmigrants and whites in UK). However, there were more cross-cultural variations in attitudinal ERS than behavioral ERS. Implications are discussed.
 
1:45pm - 2:25pmMeet the Editor - Q&A with the Editor-in-Chief of the European Journal of Psychological Assessment
Session Chair: René Proyer
Matthias Ziegler (Humboldt University Berlin, Germany)
KO2-F-180 (Ⅵ) 
2:30pm - 3:30pmK4: Computer Adaptive Assessment of Personality
Session Chair: Matthias Ziegler
Fritz Drasgow (University of Illinois at Urbana-Champaign, USA)
KO2-F-180 (Ⅵ) 
4:30pm - 6:00pmIS5: On the Validity of Objective Personality Tests: What Do They Measure?
Session Chair: Tuulia M. Ortner
KO2-F-180 (Ⅵ) 
 

On the validity of objective personality tests: What do they measure?

Chair(s): Tuulia M. Ortner (University of Salzburg, Austria)

Behavior-based measures, also called Objective Personality Tests (OPTs), have a long history in Psychology. During the last decade, their use and development was notably boosted in different fields of psychology as in social psychology, differential psychology, psychological assessment and, a number of applied fields. OPTs aim to capture behavior in highly standardized miniature situations; they lack transparency, and do not require introspection. Therefore, they are supposed to avoid two well-known weaknesses of self-reports: limited self-knowledge and impression management. Nevertheless, do current concepts of OPTs fulfil psychometric properties and standards in a way that allow for their application beyond their use in research? Within this symposium we aim to present a mixture of established and new developments and aim for further insight in OPTs psychometric properties with special regard to their validity and discuss how they can contribute to the advancement of personality research and assessment.
 

Presentations of the Symposium

 

Economic games as objective personality measures – Stability, reliability, and validity

Simona Maltese1, Anna Baumert1, Thomas Schlösser2, Manfred Schmitt1; maltese@uni-landau.demaltese@uni-landau.de
1University of Koblenz-Landau, Germany, 2University of Cologne, Germany

Study 1 (n=615) tested stability, reliability, and validity of behavioral reactions in economic games as indicators of altruistic and fairness dispositions. We assessed financial decisions in three independent rounds of a dictator-game and an ultimatum-game. Additionally, we assessed decisions in one round of a mixed-game. In this situation, participants were observers of a dictator-situation and decided what amount to invest in order to punish Person A and/or to compensate Person B, depending of Person A’s allocation to Person B. Six weeks later, behavioral reactions were assessed again. In addition, self-report measures of personality dispositions were administered. Latent-State-Trait Models revealed high relative stability of behavioral reactions and high reliability of the economic games. In Study 2, (n=518) a longitudinal design with three measurement occasions across 6 months, behavioral reactions in a dictator game and an ultimatum game were repeatedly measured together with self-reported personality dispositions. Importantly, this design informs about the relationship between changes in behavioral reactions and personality measures over time. Results and implications will be discussed.
 

An objective task-based personality test for assessing risk propensity: Analyzing feedback and convergent validity of the PTR

Victor Rubio, David Aguado, Marta Antúnez, José Santacreu; victor.rubio@uam.esvictor.rubio@uam.es
University Autonoma Madrid, Spain

Risk propensity referes to the individual tendency to choose highly rewarded alternatives even if they have a lower probability of occurrence (or even high probability of loses). Traditionally, the assessment of such construct has flipped from self-reports devoted to assess related constructs, such as sensation seeking or impulsivity, to a more or less domain-specific self-reports about concrete risk taking behaviors. Last decade has shown the development of several objective task-based personality tests (OPTs) with promising results, such as the BART (Lejuez et al., 2002), the GDT (Brand et al., 2005), the RT (Rubio el at, 2010) or the PTR (Aguado et al., 2011). Nevertheless, there are still certain aspects to explore. On the one hand, the role of task performance feedback in risk propensity assessment; one of the most reputed OPT (BDT) usually gives feedback on performance while other (RT) gives no feedback at all. The present contribution is aimed to show the effect of a controlled feedback on task performance. Ordinarily, OPTs have failed in showing convergent validity with domain-specific risk-taking self-reports. In this case, convergent validity of the PTR with the general personality dimensions supposedly related to risk-taking behavior is presented.
 

Is it a "test"? Psychometric criteria of the Balloon Analogue Risk Task

Tuulia M. Ortner1, Michael Eid2, Tobias Koch2; tuulia.ortner@sbg.ac.attuulia.ortner@sbg.ac.at
1University of Salzburg, Austria, 2Free University of Berlin, Germany

The Balloon Analogue Risk Task (BART) represents a OPT of the new generation and has been widely and successfully used as a research tool for the assessment of risk taking for more than a decade. Literature reported scores tht revealed to be positively associated with self-reported risk-related behaviors such as smoking, gambling, drug and alcohol consumption, and risky sexual behaviors. Although the BART has been established as a research tool, but has not been used a measure for single case assessment or in clinical consulting so far. The following contribution further analyzes psychometric properties of the BART with special regard to its convergent and discriminant correlations with OPTs, rating scales and IATs, its criterion validity, and its temporal stability based on data of 370 participants who completed on the BART on three measurement occasions with 1-2 weeks between trials. Data shows that the BART assement is more stable than occasion specific aspect of the construct. Furthermore, data endorse that analyzes of construct validity based on simple MTMM approaches remains a crucial aspect in evaluation of OPTs.
 

Neuroimaging implicit and explicit assessment

Belinda Pletzer, Tuulia M. Ortner; belinda.pletzer@sbg.ac.atbelinda.pletzer@sbg.ac.at
University of Salzburg, Austria

Dual-process theories have often been explained the fact that implicit (via IATs) or behavioral (via OPTs) measures of personality are not or only weakly correlated with scores achieved on explicit rating scales. However, only few neuroimaging studies have tested whether these modes are represented by separate neuronal systems. A functional imaging study assessed differences in brain activations in a group of 60 healthy adult participants. We chose two OPTs, the Balloon Analogue Risk Task (BART) and the Game of Dice Task (GDT), whereby the BART has been suggested to measure risk taking more spontaneously, and the GDT has been suggested to measure risk taking more reflectively. In the BART, risky decisions yielded significantly stronger activations than safe decisions in the bilateral caudate, as well as the bilateral Insula. In the GDT, risky decisions also yielded significantly stronger activations than safe decisions in the bilateral caudate and Insula, but additionally in the ACC and the left dorsolateral prefrontal cortex and inferior parietal cortex, regions previously associated with cognitive control and number processing. Thus, implicit processing was associated with subcortical activations, while more explicit processing activated similar areas, but was additionally associated with activation in cortical, particularly prefrontal regions.
 

 
Date: Saturday, 25/Jul/2015
9:00am - 10:00amK5: Do Countries and Organizations Have Personalities?
Session Chair: Fons van de Vijver
Dave Bartram (CEB’s SHL Talent Management Solutions, UK, and University of Pretoria, Department of Human Resource Management, South Africa)
KO2-F-180 (Ⅵ) 
10:15am - 11:45amIS6: The Assessment of 21st Century Skills
Session Chair: Samuel Greiff
Discussant: Arthur C. Graesser
KO2-F-180 (Ⅵ) 
 

The assessment of 21st century skills

Chair(s): Samuel Greiff (University of Luxembourg, Lucembourg)

Discussant(s): Arthur C. Graesser (University of Memphis, USA)

The 21st century challenges individuals to deal with demands that they previously faced either not at all or to a much lesser extent. The skills needed to successfully deal with these challenges are often collated under the term 21st century skills. They include broad concepts such as digital reading, information computer technology (ICT), and complex problem solving. Even though these skills have recently experienced a lot of interest and have been included in international large-scale assessments such as PISA or PIAAC, many questions on the conceptual and the empirical role of 21st century skills remain. For instance, the question how these skills relate to other conceptions of cognition such as the Cattell-Horn-Carroll theory or whether 21st century skills (incrementally) predict important life outcomes still need more rigorous empirical research. It is the goal of this symposium to present concurrent and state-of-the-art empirical research that aims at providing a comprehensive picture on 21st century skills and their assessment. In this, the symposium is composed of three contributions on three different 21st century skills: ICT literacy (Frank Goldhammer), digital reading (Johannes Naumann), and complex problem solving (Matthias Stadler). These contributions are followed by a discussion from a cognitive science and computer-technology perspective (Art Graesser).
 

Presentations of the Symposium

 

Simulation-based assessment of ICT skills

Frank Goldhammer1, Lena Engelhardt2, Johannes Naumann3, Andreas Frey4, Katja Hartig3, Holger Horz3, Kathrin Kuchta3, Franziska Wenzel4; goldhammer@dipf.degoldhammer@dipf.de
1DIPF and ZIB, Germany, 2DIPF, Germany, 3Goethe-University Frankfurt, Germany, 4University of Jena, Germany

Given the ubiquity of information and communication technology (ICT) in daily life, ICT skills have become a key competence enabling successful participation in educational, professional, social, cultural, and civic live. Thus, there is ample need for valid measures of these skills for purposes in educational policy, research, intervention, and instruction. This presentation will address the major developmental steps of a new computer-based ICT skills measure. First, a multidimensional theoretical framework is presented defining the targeted construct of ICT skills. Second, the development of interactive ICT tasks is described. We used simulations to design authentic task environments including several simulated software applications that need to be operated to solve the given task. Third, the psychometric properties of the scale are presented. The scale proved to be one-dimensional with a reliability of .72. To establish validity we show that systematically varied item properties in items’ instructions and stimuli affect item difficulty and tap into individual differences as expected. Finally, we show that relations to reading and problem solving skills, general cognitive ability, and computer knowledge match expectations derived from the theoretical framework. Overall, our findings demonstrate how computer-based simulations can be used to develop a sound measure of ICT skills.
 

Processes and predictors of digital reading literacy: What we can and cannot learn from large-scale assessments

Johannes Naumann1, Frank Goldhammer2, Ladislao Salmerón3; j.naumann@em.uni-frankfurt.dej.naumann@em.uni-frankfurt.de
1Goethe-University Frankfurt, Germany, 2DIPF and ZIB, Germany, 3University of Valencia, Spain

With the Internet having grown to be a major resource for the dissemination of knowledge, opinion, and debate, a person lacking digital reading literacy cannot fully participate in online discourse and is, thus, cut off from major information resources and channels of debate. Besides traditional literacy skills such as decoding and coherence processes, digital text frequently requires the readers to select, and order textual materials (“navigation”), a process that draws on cognitive resources in addition to text processing. Using PISA data, we first show that digital reading performance is predicted by two indicators of navigation quality, “precision”, and “task-adaptive processing”, which also mediate effects of print reading skill on digital reading performance. Second, we show that time-on-task in digital reading is more positively predictive of task performance in hard digital reading tasks and in tasks requiring complex navigation. Likewise, we show that time-on-task is more positively predictive of digital reading performance in weak readers. Our results confirm the assumption that navigation is a multifaceted process that is consumptive of cognitive resources  impacts performance in different ways. Finally, we discuss prospects and limitations of using large-scale data to explore a latent variable’s cognitive structure.
 

The role of complex problem solving in university success

Matthias Stadler1, Nicolas Becker2, Christoph Niepel1, Samuel Greiff1; matthias.stadler@uni.lumatthias.stadler@uni.lu
1University of Luxembourg, Luxembourg, 2University of Saarbrucken, Germany

The university years represent a critical phase in the life of many students that comes along with various complex opportunities and challenges. Based on this premise, the aim of this study was to investigate the role of complex problem solving (CPS) skills in predicting university success. 150 German students worked on a measure of reasoning as well as a set of complex problem solving tasks. In addition, the students were asked for their current grade point average at University (GPA) and their subjective evaluation of their university success. CPS was significantly related to university GPA (R2 = .18) even after controlling for reasoning (ΔR2 = .09). In addition, CPS was related to the students’ subjective evaluation of their university success (R2 = .10) with incremental value over and above reasoning (ΔR2 = .09). The results suggested that complex problem solving skills helped students successfully navigating a university program even beyond reasoning skills.
 
12:15pm - 1:15pmK6: Measuring Adaptive and Maladaptive Personality for Workplace Applications
Session Chair: Johnny Fontaine
Deniz S. Ones (University of Minnesota, USA)
KO2-F-180 (Ⅵ) 
2:30pm - 3:30pmClosing Ceremony
KO2-F-180 (Ⅵ)