13th European Conference on Psychological Assessment

Original Photo adapted from Hansueli Kramer / CC BY

Date: Thursday, 23/Jul/2015
8:30am - 9:30am	K1: Assessing and Changing Cognitive Processes in Addiction Session Chair: Victor J. Rubio Reinout W. Wiers (University of Amsterdam, Netherlands)
KO2-F-180 (Ⅵ)
9:45am - 11:15am	IS1: Is “Q-short” a Useful Approach for Psychological Assessment? Pitfalls and Opportunities of Short Questionnaires for the Measurement of Psychological Constructs Session Chair: Christoph J. Kemper
KO2-F-180 (Ⅵ)
	Is “Q-short” a useful approach for psychological assessment? Pitfalls and opportunities of short questionnaires for the measurement of psychological constructs Chair(s): Christoph J. Kemper (University of Luxembourg, Luxembourg) In recent years, the development and application of short questionnaires (“Q-short”) for psychological constructs has been gaining pace. At present, short measures are widely-used for psychological assessment in diverse domains, e.g. personality, social or I/O psychology, psychopathology, social and educational science, and behavioral economics, as well as diverse assessment settings such as research and practice. Their popularity is largely due to the promise of higher efficiency of measurement, lower cost, lower respondent burden, and higher data quality. Besides these obvious advantages, there is also considerable criticism for using short questionnaires of psychological constructs leaving researchers and practitioners in limbo concerning the choice of an appropriate measure for their assessment setting. As the criticism mainly pertains to the methodology of short scale development, the symposium focuses on the construction process. Presenters demonstrate and/or compare construction strategies such as manual and automated approaches (e.g. Ant Colony Optimization) or top-down strategies (starting with a longer version of a scale) and bottom-up strategies (starting with a single item to which further items are gradually added) using empirical as well as simulated data. Aim of the symposium is to make recommendations for the development, validation, and application of short questionnaires in research and applied settings. Presentations of the Symposium Assessing personality and situation perception at the same time, in a short time Matthias Ziegler¹, Kai Horstmann¹, Marko Vetter²; zieglema@hu-berlin.de zieglema@hu-berlin.de ¹Humboldt-Universität zu Berlin, Germany, ²Schuhfried GmbH, Austria The idea of interactionism suggests that human behavior is caused equally by the situation and the personality of the actor. However, personality tests and situation perception are scarce. Here, the B5PS, a test capturing the Big 5 and 42 facets as well as 5 dimensions of situation perception (Situation 5) is used as starting point for the development of a short test. This short test yields scores for the Big 5 and the Situation 5 and was evaluated using a representative sample of 400. During the talk, the construction strategy paying specific attention to the nomological network of the constructs assessed will be explained. Moreover, evidence for the psychometric quality of the short test will be reported and compared with the original test (criterion, convergent, discriminant, and factorial validity as well as construct and test-retest reliability). The mixed-method approach applied here can be generally applied in test construction and can serve as a best practice example. Following the ants: Pros and cons of Ant Colony Optimization (ACO) for short scale development Anne B. Janssen¹, Martin Schultze², Adrian Grötsch³; a.janssen@jacobs-university.de a.janssen@jacobs-university.de ¹Jacobs University Bremen, Germany, ²Freie Universität Berlin, Germany, ³Technische Universität Braunschweig, Germany The present study was aimed at constructing useable, reliable, and valid short scales of two measures assessing proactive personality and supervisor support. For this purpose, we compared Ant Colony Optimization (ACO; Leite et al., 2008) and classical item selection procedures. ACO is algorithm-based, and selects and compares sets of items according to defined criteria. For proactive personality, the two selection procedures (ACO and classical item selection) provided similar results. Both five-item short forms showed a satisfactory reliability and a small, however negligible, loss of criterion validity. For a two-dimensional supervisor support scale, ACO found a reliable and valid short form. Psychometric properties of the short version were in accordance with those of the parent form. A classical short form for supervisor support revealed a rather poor model fit and a serious loss of validity. Benefits and shortcomings of ACO compared to classical item selection procedures and recommendations of ACO application are discussed. Best practices in short scale development: Comparing state-of-the-art methods using simulated and empirical data Peter M. Kruyen¹, Constanze Beierlein², Beatrice Rammstedt²; p.m.kruyen@gmail.com p.m.kruyen@gmail.com ¹Radboud University Nijmegen, The Netherlands, ²GESIS Leibniz Institute for the Social Sciences, Germany Psychological constructs have attracted increasing attention as valuable predictors of social phenomena. However, most psychological measures include too many items to be practically useful in large-scale research. Because of this, researchers often remove items from these long measures. By doing so, many researchers rely on well-known techniques such as maximizing coefficient alpha. Research has shown, however, that these strategies may result in serious deficiencies. Recently, psychometricians have developed sophisticated methods that are believed to result in sound short scales. From the viewpoint of practitioners, there seems to be little guideline on how to choose and apply these new techniques to optimally shorten a scale. Against this background, the aim of our talk is three-fold: First, we explain the limitations of old approaches. Subsequently, we will introduce several state-of-the-art procedures. In this context, we will compare “top-down” and “bottom-up” strategies. Top-down approaches refer to item-selection process which start with a longer version of a scale. “Bottom-up” approaches, in contrast, start with a single item. Here, scale components are gradually added. We also distinguish between manual and automated approaches. We use both simulated and empirical data to evaluate these different methods. Finally, we provide recommendations for using appropriate procedures for constructing short measures.
11:45am - 1:15pm	PD: Potential Impact of the Revised EFPA Review Model for the Description and Evaluation of Psychological and Educational Tests Chair: Dave Bartram Discussants: Johnny Fontaine, Mark Schittekatte, Fons van de Vijver
KO2-F-180 (Ⅵ)
2:30pm - 3:30pm	K2: Assessment of Personality Disorders in DSM-5 Session Chair: Daniel Leising Robert Krueger (University of Minnesota, USA)
KO2-F-180 (Ⅵ)
4:30pm - 6:00pm	IS2: On the Effect of Item Positions in Tests Session Chair: Karl Schweizer Session Chair: Siegbert Reiß
KO2-F-180 (Ⅵ)
	On the effect of item positions in tests Chair(s): Karl Schweizer (Goethe University Frankfurt, Germany), Siegbert Reiß (Goethe University Frankfurt, Germany) The item-position effect is usually observable if test takers complete a homogeneous set of items that constitute a psychological scale because successively completing a number of items that are demanding to the same ability or trait is modifying performance. The repeated call of the same cognitive processes can involve automation, facilitation, clustering, maintenance of information and learning. The consequence is an increasing degree of dependency among the responses to the successively presented items. It means an increasing degree of consistency in responding from the first to last items. Although this effect has been known for quite a time, the major models of measurement do not take it into consideration. The presentations will provide further evidence of the item-position effect regarding different psychological scales and inform about new developments in improving the representation and investigation of it. There will be reports of the item-position effect in Advanced Progressive Matrices, Cattell’s Culture Fair Test and Viennese Matrices Test. The new developments will encompass the IRT and CFA approaches. These new developments aim to enable more appropriate representations of the item-position effect and better ways of separating what is due to the effect and what is a pure representation of the construct. Presentations of the Symposium The impact of the position effect on the factorial structure of the Culture Fair Test (CFT) Stefan J. Troche¹, Felicitas L. Wagner², Karl Schweizer³, Thomas H. Rammsayer²; Stefan.Troche@uni-wh.de Stefan.Troche@uni-wh.de ¹Private Universität Witten/Herdecke, Germany, ²University of Bern, Switzerland, ³Goethe-University Frankfurt, Germany The Culture Fair Test (CFT) is a psychometric test of fluid intelligence consisting of four subtests; Series, Classification, Matrices, and Topographies. The four subtests are only moderately intercorrelated, doubting the notion that they assess the same construct (i.e., fluid intelligence). As an explanation of these low correlations, we investigated the position effect. This effect is assumed to reflect implicit learning during testing. By applying fixed-links modeling to analyze the CFT data of 206 participants, we identified position effects as latent variables in the subtests; Classification, Matrices, and Topographies. These position effects were disentangled from a second set of latent variables representing fluid intelligence inherent in the four subtests. After this separation of position effect and basic fluid intelligence, the latent variables representing basic fluid intelligence in the subtests Series, Matrices, and Topographies could be combined to one common latent variable which was highly correlated with fluid intelligence derived from the subtest Classification (r=.72). Correlations between the three latent variables representing the position effects in the Classification, Matrices, and Topographies subtests ranged from r=.38 to r=.59. The results indicate that all four CFT subtests measure the same construct (i.e., fluid intelligence) but that the position effect confounds the factorial structure. The position effect in a Rasch-homogenous test: A fixed-links modeling approach Philipp Thomas¹, Thomas H. Rammsayer¹, Karl Schweizer², Stefan J. Troche³; philipp.thomas@psy.unibe.ch philipp.thomas@psy.unibe.ch ¹University of Bern, Switzerland, ²Goethe University Frankfurt, Germany, ³Private Universität Witten/Herdecke, Germany The position effect describes the influence of just-completed items in a psychological scale on subsequent items. This effect has been repeatedly reported for psychometric reasoning scales and is assumed to reflect implicit learning during testing. One way to identify the position effect is fixed-links modeling. With this approach, two latent variables are derived from the test items. Factor loadings of one latent variable are fixed to 1 for all items to represent ability-related variance. Factor loadings on the second latent variable increase from the first to the last item describing the position effect. Previous studies using fixed-links modeling on the position effect investigated reasoning scales constructed in accordance with classical test theory (e.g., Raven’s Progressive Matrices) but, to the best of our knowledge, no Rasch-scaled tests. These tests, however, meet stronger requirements on item homogeneity. In the present study, therefore, we will analyze data from 239 participants who have completed the Rasch-scaled Viennese Matrices Test (VMT). Applying a fixed-links modeling approach, we will test whether a position effect can be depicted as a latent variable and separated from a latent variable representing basic reasoning ability. The results have implications for the assumption of homogeneity in Rasch-homogeneous tests. Predictors of an individual decrease in test performance during the PISA assessments Johannes Hartig¹, Janine Buchholz¹, Dries Debeer², Rianne Janssen²; hartig@dipf.de hartig@dipf.de ¹DIPF, Germany, ²KU Leuven, Belgium Item position effects have been shown repeatedly in large-scale assessments of student achievement. In addition to a fixed effect of items becoming more difficult during the test, there are individual differences related to this effect, meaning that students differ in the extent to which their performance declines during the test. These interindividual differences have been labelled as “persistence” in previous studies. The present study aims at gaining a better understanding of the nature of these differences by relating them to student characteristics. The analyses make use of the the PISA 2006 and 2009 assessments on science and reading, respectively, using data from several European countries. Gender, the language spoken at home, the socio-economic status, the motivational scales “effort thermometer” (2006 assessment), and the “joy of reading” (2009 assessment) were used as predictors for persistence. Position effects and persistence are modelled by a logistic multilevel regression model which is equivalent to an extension of the Rasch model. Effects of gender, language, and reported test effort are inconsistent across countries, e.g. girls have a higher persistence only in some countries. The effect of the reported joy of reading is small but consistent across all countries, indicating that at least part of the individual differences is caused by individual differences in subject-specific motivation. Modeling response omissions in tests using a tree-based IRT approach Dries Debeer, Rianne Janssen; Rianne.Janssen@ppw.kuleuven.be Rianne.Janssen@ppw.kuleuven.be KU Leuven, Belgium Reported item position effects in large-scale assessments often pertain to an increased item difficulty towards the end of the test and to respondents differing in their level of persistence completing the test. Both phenomena may be partly due to the increased occurrence of missing responses towards the end of the test and individual differences therein. In fact, two types of missing responses are possible, respondents may omit certain items well before reaching their last answered item, leading to “skipped items” and respondents may not complete the entire test and drop out before the end of the assessment, leading to “not-reached items”. Both types of missing responses may be related to the proficiency of the respondent, and therefore, cause non-ignorable missingness. Several studies have proposed ways to deal with these missing responses. In the present paper, an IRTree-based approach will be presented in which both types of missing responses are modeled together with the proficiency process. The IRTree models can be applied to both power and speed tests and are modeled fairly easily. Apart from results of several simulation studies, the analyses of a speed test on mental arithmetic from a Flemish national assessment will be discussed. On the search for the best possible representation of the item-position effect: A simulation study based on APM Florian Zeller, Siegbert Reiss, Karl Schweizer; Florian.zeller@outlook.com Florian.zeller@outlook.com Goethe University Frankfurt, Germany The item-position effect describes the impact of prior completed items on the following items. In previous studies the item-position effect was represented by constraints reflecting functions, for example a linear function. This kind of representation was inflexible regarding the specificities of the items, and therefore, there was the question whether this is the best possible way of representing the effect. Accordingly, our aim was to optimize the representation of the item-position effect in considering the items of Raven’s Advanced Progressive Matrices (APM). We disassembled the 36 APM items into two, three, four, and six same–sized subsets of neighboring items for separate investigations. Analyses were conducted by means of data that were simulated according to the covariance matrix of the APM items based on the data of 530 participants. Similar to former studies we used fixed-links models for testing different representations of the item-position effect. Besides the standard model with only one latent variable we analyzed linear, quadratic and logarithmic trends of the item-position effect. The results revealed an increase of true variance from the first to last items, just as expected. But the course of increase varied in slope.
6:15pm - 7:45pm	Members Meeting
KO2-F-180 (Ⅵ)

Date: Friday, 24/Jul/2015
8:30am - 9:30am	K3: Ambulatory Assessment: Promises and Challenges Session Chair: Tuulia M. Ortner Ulrich Ebner-Priemer (Karlsruhe Institute of Technology, Germany)
KO2-F-180 (Ⅵ)
9:45am - 11:15am	IS3: Recent Methodological Developments for Testing Measurement Invariance Session Chair: Carolin Strobl
KO2-F-180 (Ⅵ)
	Recent methodological developments for testing measurement invariance Chair(s): Carolin Strobl (Universität Zürich, Switzerland) This symposium gives an overview over recent methodological developments for testing measurement invariance in item response theory, factor analysis, and cognitive diagnosis models. Presentations of the Symposium Detecting violations of measurement invariance in item response theory Carolin Strobl¹, Julia Kopf², Basil Abou El-Komboz², Achim Zeileis³; carolin.strobl@uzh.ch carolin.strobl@uzh.ch ¹Universität Zürich, Switzerland, ²LMU München, Germany, ³Universität Innsbruck, Austria The main aim of educational and psychological testing is to provide a means for objective and fair comparisons between different test takers by establishing measurement invariance. However, in practical test development measurement, invariance is often violated by differential item functioning (DIF), which can lead to an unfair advantage or disadvantage for certain groups of test takers. A variety of statistical methods has been suggested for detecting DIF in item response theory (IRT) models, such as the Rasch model, that are increasingly used in educational and psychological testing. However, most of these methods are designed for the comparison of pre-specified focal and reference groups, such as females vs. males, whereas in reality the group of disadvantaged test takers may be formed by a complex combination of several covariates, such as females only up to a certain age. In this talk, a new framework for DIF detection based on model-based recursive partitioning is presented that can detect groups of test takers exhibiting DIF in a data-driven way. The talk outlines the statistical methodology behind the new approach as well as its practical application for binary and polytomous IRT models. Score-based tests of measurement invariance with respect to continuous and ordinal variables Achim Zeileis¹, Edgar C. Merkle², Ting Wang²; Achim.Zeileis@uibk.ac.at Achim.Zeileis@uibk.ac.at ¹Universität Innsbruck, Austria, ²University of Missouri, USA The issue of measurement invariance commonly arises in psychometric models and is typically assessed via likelihood ratio tests, Lagrange multiplier tests, and Wald tests, all of which require advance definition of the number of groups, group membership, and offending model parameters. We present a family of recently-proposed measurement invariance tests that are based on the scores of a fitted model (i.e., observation-wise derivatives of the log-likelihood with respect to the model parameters). This family can be used to test for measurement invariance w.r.t. a continuous auxiliary variable, without pre-specification of subgroups. Moreover, the family can be used when one wishes to test for measurement invariance w.r.t. an ordinal auxiliary variable, yielding test statistics that are sensitive to violations that are monotonically related to the ordinal variable (and less sensitive to non-monotonic violations). The tests can be viewed as generalizations of the Lagrange multiplier (or score) test and they are especially useful for identifying subgroups of individuals that violate measurement invariance (without prespecified thresholds) as well as identifying specific parameters impacted by measurement invariance violations. We illustrate how the tests can be applied in practice in factor-analytic contexts using the R packages "lavaan" for model estimation and "strucchange" for carrying out the tests and visualization of the results. Exact versus approximate measurement invariance. Theoretical overview and empirical examples Jan Cieciuch, Eldad Davidov, René Algesheimer; jancieciuch@gazeta.pl jancieciuch@gazeta.pl Universität Zürich, Switzerland Measurement invariance is a necessary condition for conducting meaningful comparisons of means and relationships between variables across groups (Vandenberg & Lance, 2000). Measurement invariance implies that the parameters of a measurement model (factor loadings, intercepts) are equal across groups. One of the most frequently used procedures for measurement invariance testing is multigroup confirmatory factor analysis (MGCFA) which compares the fit indices between models with parameters constrained to be equal across groups and those with freely estimated parameters. Three levels of measurement invariance are usually distinguished: configural (the same items load on the same factors in each group), metric (factor loadings are constrained to be exactly equal across groups) and scalar (factor loadings and the intercepts are constrained to be exactly equal across groups). Establishing measurement invariance in this approach is very difficult and this method has been criticized as being unrealistic and too strict. Muthén and Asparouhov (2013) recently proposed a new approach to test for approximate rather than exact measurement invariance using Bayesian MGCFA. Approximate measurement invariance permits small differences between parameters (loadings and intercepts) otherwise constrained to be equal in the classical exact approach. In the presentation we will discuss the main differences between the exact and approximate approaches to test for measurement invariance. Furthermore, we will compare results obtained in both approaches while testing for the measurement invariance of the Portrait Value Questionnaire developed by Schwartz and colleagues (2001, 2012) to measure values. The results suggest that the approximate measurement invariance seems to be more likely than the exact approach to establish measurement invariance which enables meaningful cross-group comparisons. Differential item functioning in cognitive diagnosis models Michel Philipp¹, Carolin Strobl¹, Achim Zeileis²; m.philipp@psychologie.uzh.ch m.philipp@psychologie.uzh.ch ¹Universität Zürich, Switzerland, ²Universität Innsbruck, Austria Cognitive diagnosis models (CDMs) are a family of psychometric models for analyzing dichotomous response data. They provide detailed information about mastery or non-mastery of predefined skills, which are required to solve the tested items, and can thus reflect the strengths and weaknesses of the examinees in the form of a skills profile. In the context of educational testing, this means that the students can be given detailed feedback, which particular skills they need to practice more, rather than only being reported their overall test performance. However, for reliable interpretation and fair comparisons these models also rely on measurement invariance, which may be violated in practice by differential item functioning (DIF). Taking the simplest version of a CDM, the non-compensatory DINA model, as an example, the talk introduces the general principles of CDMs, explains what DIF means in this context and presents an overview over recent approaches for detecting DIF in CDMs.
11:45am - 1:15pm	IS4: Cross-Cultural Assessment Session Chair: Fons van de Vijver
KO2-F-180 (Ⅵ)
	Cross-cultural assessment Chair(s): Fons van de Vijver (Tilburg University, The Netherlands) This symposium brings together modern developments in the area of cross-cultural assessments. The emphasis will go beyond traditional psychometric invariance testing. Papers will be presented on response styles, the use of ipsatization to address response styles, qualitative methods to assess bias, and the structure of emotions. Presentations of the Symposium Controlling for culture-specific response bias using ipsatization and response style indicators: Family orientation in fourteen cultures and two generations Boris Mayer; boris.mayer@psy.unibe.ch boris.mayer@psy.unibe.ch University of Bern, Switzerland Within-subject standardization (ipsatization) has been advocated as a possible means to control for culture-specific responding (e.g., Fisher, 2004). However, the consequences of different kinds of ipsatization procedures for the interpretation of mean differences remain unclear. The current study compared several ipsatization procedures with ANCOVA-style procedures using response style indicators for the construct of family orientation with data from 14 cultures and two generations from the Value-of-Children-(VOC)-Study (4135 dyads). Results showed that within-subject centering/standardizing across all Likert-scale items of the comprehensive VOC-questionnaire removed most of the original cross-cultural variation in family orientation and lead to a non-interpretable pattern of means in both generations. Within-subject centering/standardizing using a subset of 19 unrelated items lead to a decrease to about half of the original effect size and produced a theoretically meaningful pattern of means. A similar effect size and similar mean differences were obtained when using a measure of acquiescent responding based on the same set of items in an ANCOVA-style analysis. Additional models controlling for extremity and modesty performed worse, and combinations did not differ from the acquiescence-only model. The usefulness of different approaches to control for uniform response styles (scalar equivalence not given) in cross-cultural comparisons is discussed. The qualitative assessment of bias: Contributions of cognitive interviewing methodology to the bias definition Isabel Benítez Baena¹, Fons van de Vijver², José-Luis Padilla García³; ibenitez@ugr.es ibenitez@ugr.es ¹University of Granada, Spain, Tilburg University, The Netherland, ²Tilburg University, The Netherlands, ³University of Granada, Spain Defining and assessing bias have been two of the main methodological topics in the cross-cultural field. Most of the attention has been paid to the development of statistical procedures to detect several kinds of biases and the interpretation of results in quantitative terms. However, qualitative procedures can be also useful for understanding the presence of bias when comparing different cultural or linguistic groups. The aim of this study is to illustrate potential contributions of Cognitive Interviews (CI) when investigating bias. On one hand, conclusions of integrating CI findings with quantitative data from analysing item bias will be presented by enhancing the advantages for understanding bias sources. On the other hand, utility of CI for extracting information of different levels of bias (item, method, and construct) will be described. The approach will be illustrated by studying responses and response processes of Dutch and Spanish participants to “Quality of Life” items from five international studies. The qualitative perspective of bias will be discussed as well as the potentiality of qualitative procedures for investigating bias, as single or as part of mixed methods studies. The internal structure of the guilt and shame domain across cultures Johnny Fontaine; Johnny.Fontaine@UGent.be Johnny.Fontaine@UGent.be Ghent University, Belgium Cross-cultural as well as in Western scientific literature is plagued with inconsistent theory development on the nature and the role of guilt and shame. We present a large cross-cultural study that assesses these emotions on the basis of the componential emotion approach using two different methods, namely an episode and a frequency method. In total 3684 participants from 20 countries across the world rated appraisals, action tendencies, bodily reactions, expressions, and feelings in the three last episodes where they experienced a self-conscious emotion (episode method) and they also rated the frequencies of these emotional reactions in general (frequency method). Cultural stability of the internal structure was investigated by comparing classical principal component analysis with simultaneous principal component analysis. Both for the episodes and the frequencies a five-componential structure emerged stably across cultural groups. Four of the five components had the same meaning between the two methods, namely guilt, embarrassment, negative esteem of the self, and anger. The fifth component in the episode structure referred to the seriousness of the situation and the fifth factor in frequency structure could be interpreted as general distress. These cross-culturally stable internal structures allow for more consistent theorizing both in Western and in cross-cultural research. Extreme response style in attitudinal and behavioral questions Jia He¹, Isabel Benítez Baena², Byron Adams¹, Fons van de Vijver¹; Fons.vandeVijver@uvt.nl Fons.vandeVijver@uvt.nl ¹Tilburg University, The Netherlands, ²University of Granada, Spain, Tilburg University, The Netherland This paper investigated the cross-cultural similarities and differences of extreme response style (ERS) extracted from self-report data of attitudinal and behavioral questions. Data of a subsample of 3,255 young adults with different immigration and racial backgrounds in the third wave of the UK household survey were analyzed. For each participant, responses to items concerning general mental health and identity were used to extract one ERS index for attitudinal questions and responses of items concerning family and school activities were used to extract the ERS index for behavioral questions. The two indexes were positively correlated, indicating similar response style preference in different types of questions. The pattern of cross-cultural mean differences were similar in both indexes, with minority groups (immigrants, Asian, African, Black, Caribbean, and mixed-race in UK) showing higher extreme response style compared with the majority group (nonimmigrants and whites in UK). However, there were more cross-cultural variations in attitudinal ERS than behavioral ERS. Implications are discussed.
1:45pm - 2:25pm	Meet the Editor - Q&A with the Editor-in-Chief of the European Journal of Psychological Assessment Session Chair: René Proyer Matthias Ziegler (Humboldt University Berlin, Germany)
KO2-F-180 (Ⅵ)
2:30pm - 3:30pm	K4: Computer Adaptive Assessment of Personality Session Chair: Matthias Ziegler Fritz Drasgow (University of Illinois at Urbana-Champaign, USA)
KO2-F-180 (Ⅵ)
4:30pm - 6:00pm	IS5: On the Validity of Objective Personality Tests: What Do They Measure? Session Chair: Tuulia M. Ortner
KO2-F-180 (Ⅵ)
	On the validity of objective personality tests: What do they measure? Chair(s): Tuulia M. Ortner (University of Salzburg, Austria) Behavior-based measures, also called Objective Personality Tests (OPTs), have a long history in Psychology. During the last decade, their use and development was notably boosted in different fields of psychology as in social psychology, differential psychology, psychological assessment and, a number of applied fields. OPTs aim to capture behavior in highly standardized miniature situations; they lack transparency, and do not require introspection. Therefore, they are supposed to avoid two well-known weaknesses of self-reports: limited self-knowledge and impression management. Nevertheless, do current concepts of OPTs fulfil psychometric properties and standards in a way that allow for their application beyond their use in research? Within this symposium we aim to present a mixture of established and new developments and aim for further insight in OPTs psychometric properties with special regard to their validity and discuss how they can contribute to the advancement of personality research and assessment. Presentations of the Symposium Economic games as objective personality measures – Stability, reliability, and validity Simona Maltese¹, Anna Baumert¹, Thomas Schlösser², Manfred Schmitt¹; maltese@uni-landau.de maltese@uni-landau.de ¹University of Koblenz-Landau, Germany, ²University of Cologne, Germany Study 1 (n=615) tested stability, reliability, and validity of behavioral reactions in economic games as indicators of altruistic and fairness dispositions. We assessed financial decisions in three independent rounds of a dictator-game and an ultimatum-game. Additionally, we assessed decisions in one round of a mixed-game. In this situation, participants were observers of a dictator-situation and decided what amount to invest in order to punish Person A and/or to compensate Person B, depending of Person A’s allocation to Person B. Six weeks later, behavioral reactions were assessed again. In addition, self-report measures of personality dispositions were administered. Latent-State-Trait Models revealed high relative stability of behavioral reactions and high reliability of the economic games. In Study 2, (n=518) a longitudinal design with three measurement occasions across 6 months, behavioral reactions in a dictator game and an ultimatum game were repeatedly measured together with self-reported personality dispositions. Importantly, this design informs about the relationship between changes in behavioral reactions and personality measures over time. Results and implications will be discussed. An objective task-based personality test for assessing risk propensity: Analyzing feedback and convergent validity of the PTR Victor Rubio, David Aguado, Marta Antúnez, José Santacreu; victor.rubio@uam.es victor.rubio@uam.es University Autonoma Madrid, Spain Risk propensity referes to the individual tendency to choose highly rewarded alternatives even if they have a lower probability of occurrence (or even high probability of loses). Traditionally, the assessment of such construct has flipped from self-reports devoted to assess related constructs, such as sensation seeking or impulsivity, to a more or less domain-specific self-reports about concrete risk taking behaviors. Last decade has shown the development of several objective task-based personality tests (OPTs) with promising results, such as the BART (Lejuez et al., 2002), the GDT (Brand et al., 2005), the RT (Rubio el at, 2010) or the PTR (Aguado et al., 2011). Nevertheless, there are still certain aspects to explore. On the one hand, the role of task performance feedback in risk propensity assessment; one of the most reputed OPT (BDT) usually gives feedback on performance while other (RT) gives no feedback at all. The present contribution is aimed to show the effect of a controlled feedback on task performance. Ordinarily, OPTs have failed in showing convergent validity with domain-specific risk-taking self-reports. In this case, convergent validity of the PTR with the general personality dimensions supposedly related to risk-taking behavior is presented. Is it a "test"? Psychometric criteria of the Balloon Analogue Risk Task Tuulia M. Ortner¹, Michael Eid², Tobias Koch²; tuulia.ortner@sbg.ac.at tuulia.ortner@sbg.ac.at ¹University of Salzburg, Austria, ²Free University of Berlin, Germany The Balloon Analogue Risk Task (BART) represents a OPT of the new generation and has been widely and successfully used as a research tool for the assessment of risk taking for more than a decade. Literature reported scores tht revealed to be positively associated with self-reported risk-related behaviors such as smoking, gambling, drug and alcohol consumption, and risky sexual behaviors. Although the BART has been established as a research tool, but has not been used a measure for single case assessment or in clinical consulting so far. The following contribution further analyzes psychometric properties of the BART with special regard to its convergent and discriminant correlations with OPTs, rating scales and IATs, its criterion validity, and its temporal stability based on data of 370 participants who completed on the BART on three measurement occasions with 1-2 weeks between trials. Data shows that the BART assement is more stable than occasion specific aspect of the construct. Furthermore, data endorse that analyzes of construct validity based on simple MTMM approaches remains a crucial aspect in evaluation of OPTs. Neuroimaging implicit and explicit assessment Belinda Pletzer, Tuulia M. Ortner; belinda.pletzer@sbg.ac.at belinda.pletzer@sbg.ac.at University of Salzburg, Austria Dual-process theories have often been explained the fact that implicit (via IATs) or behavioral (via OPTs) measures of personality are not or only weakly correlated with scores achieved on explicit rating scales. However, only few neuroimaging studies have tested whether these modes are represented by separate neuronal systems. A functional imaging study assessed differences in brain activations in a group of 60 healthy adult participants. We chose two OPTs, the Balloon Analogue Risk Task (BART) and the Game of Dice Task (GDT), whereby the BART has been suggested to measure risk taking more spontaneously, and the GDT has been suggested to measure risk taking more reflectively. In the BART, risky decisions yielded significantly stronger activations than safe decisions in the bilateral caudate, as well as the bilateral Insula. In the GDT, risky decisions also yielded significantly stronger activations than safe decisions in the bilateral caudate and Insula, but additionally in the ACC and the left dorsolateral prefrontal cortex and inferior parietal cortex, regions previously associated with cognitive control and number processing. Thus, implicit processing was associated with subcortical activations, while more explicit processing activated similar areas, but was additionally associated with activation in cortical, particularly prefrontal regions.

Date: Saturday, 25/Jul/2015
9:00am - 10:00am	K5: Do Countries and Organizations Have Personalities? Session Chair: Fons van de Vijver Dave Bartram (CEB’s SHL Talent Management Solutions, UK, and University of Pretoria, Department of Human Resource Management, South Africa)
KO2-F-180 (Ⅵ)
10:15am - 11:45am	IS6: The Assessment of 21st Century Skills Session Chair: Samuel Greiff Discussant: Arthur C. Graesser
KO2-F-180 (Ⅵ)
	The assessment of 21st century skills Chair(s): Samuel Greiff (University of Luxembourg, Lucembourg) Discussant(s): Arthur C. Graesser (University of Memphis, USA) The 21st century challenges individuals to deal with demands that they previously faced either not at all or to a much lesser extent. The skills needed to successfully deal with these challenges are often collated under the term 21st century skills. They include broad concepts such as digital reading, information computer technology (ICT), and complex problem solving. Even though these skills have recently experienced a lot of interest and have been included in international large-scale assessments such as PISA or PIAAC, many questions on the conceptual and the empirical role of 21st century skills remain. For instance, the question how these skills relate to other conceptions of cognition such as the Cattell-Horn-Carroll theory or whether 21st century skills (incrementally) predict important life outcomes still need more rigorous empirical research. It is the goal of this symposium to present concurrent and state-of-the-art empirical research that aims at providing a comprehensive picture on 21st century skills and their assessment. In this, the symposium is composed of three contributions on three different 21st century skills: ICT literacy (Frank Goldhammer), digital reading (Johannes Naumann), and complex problem solving (Matthias Stadler). These contributions are followed by a discussion from a cognitive science and computer-technology perspective (Art Graesser). Presentations of the Symposium Simulation-based assessment of ICT skills Frank Goldhammer¹, Lena Engelhardt², Johannes Naumann³, Andreas Frey⁴, Katja Hartig³, Holger Horz³, Kathrin Kuchta³, Franziska Wenzel⁴; goldhammer@dipf.de goldhammer@dipf.de ¹DIPF and ZIB, Germany, ²DIPF, Germany, ³Goethe-University Frankfurt, Germany, ⁴University of Jena, Germany Given the ubiquity of information and communication technology (ICT) in daily life, ICT skills have become a key competence enabling successful participation in educational, professional, social, cultural, and civic live. Thus, there is ample need for valid measures of these skills for purposes in educational policy, research, intervention, and instruction. This presentation will address the major developmental steps of a new computer-based ICT skills measure. First, a multidimensional theoretical framework is presented defining the targeted construct of ICT skills. Second, the development of interactive ICT tasks is described. We used simulations to design authentic task environments including several simulated software applications that need to be operated to solve the given task. Third, the psychometric properties of the scale are presented. The scale proved to be one-dimensional with a reliability of .72. To establish validity we show that systematically varied item properties in items’ instructions and stimuli affect item difficulty and tap into individual differences as expected. Finally, we show that relations to reading and problem solving skills, general cognitive ability, and computer knowledge match expectations derived from the theoretical framework. Overall, our findings demonstrate how computer-based simulations can be used to develop a sound measure of ICT skills. Processes and predictors of digital reading literacy: What we can and cannot learn from large-scale assessments Johannes Naumann¹, Frank Goldhammer², Ladislao Salmerón³; j.naumann@em.uni-frankfurt.de j.naumann@em.uni-frankfurt.de ¹Goethe-University Frankfurt, Germany, ²DIPF and ZIB, Germany, ³University of Valencia, Spain With the Internet having grown to be a major resource for the dissemination of knowledge, opinion, and debate, a person lacking digital reading literacy cannot fully participate in online discourse and is, thus, cut off from major information resources and channels of debate. Besides traditional literacy skills such as decoding and coherence processes, digital text frequently requires the readers to select, and order textual materials (“navigation”), a process that draws on cognitive resources in addition to text processing. Using PISA data, we first show that digital reading performance is predicted by two indicators of navigation quality, “precision”, and “task-adaptive processing”, which also mediate effects of print reading skill on digital reading performance. Second, we show that time-on-task in digital reading is more positively predictive of task performance in hard digital reading tasks and in tasks requiring complex navigation. Likewise, we show that time-on-task is more positively predictive of digital reading performance in weak readers. Our results confirm the assumption that navigation is a multifaceted process that is consumptive of cognitive resources impacts performance in different ways. Finally, we discuss prospects and limitations of using large-scale data to explore a latent variable’s cognitive structure. The role of complex problem solving in university success Matthias Stadler¹, Nicolas Becker², Christoph Niepel¹, Samuel Greiff¹; matthias.stadler@uni.lu matthias.stadler@uni.lu ¹University of Luxembourg, Luxembourg, ²University of Saarbrucken, Germany The university years represent a critical phase in the life of many students that comes along with various complex opportunities and challenges. Based on this premise, the aim of this study was to investigate the role of complex problem solving (CPS) skills in predicting university success. 150 German students worked on a measure of reasoning as well as a set of complex problem solving tasks. In addition, the students were asked for their current grade point average at University (GPA) and their subjective evaluation of their university success. CPS was significantly related to university GPA (R2 = .18) even after controlling for reasoning (ΔR2 = .09). In addition, CPS was related to the students’ subjective evaluation of their university success (R2 = .10) with incremental value over and above reasoning (ΔR2 = .09). The results suggested that complex problem solving skills helped students successfully navigating a university program even beyond reasoning skills.
12:15pm - 1:15pm	K6: Measuring Adaptive and Maladaptive Personality for Workplace Applications Session Chair: Johnny Fontaine Deniz S. Ones (University of Minnesota, USA)
KO2-F-180 (Ⅵ)
2:30pm - 3:30pm	Closing Ceremony
KO2-F-180 (Ⅵ)