Validation of the Social Skills Rating System: The Serbian Preschool Teacher Version

Автор: Nataša Buha, Marija Jelić

Журнал: International Journal of Cognitive Research in Science, Engineering and Education @ijcrsee

Статья в выпуске: 2 vol.13, 2025 года.

Бесплатный доступ

Social competence is a key aspect of early development, with long-term implications for children’s functioning in educational and social environments. Given its importance, it is essential to have reliable and valid instruments for assessing both social skills and challenging behaviors in early childhood. This study aimed to evaluate the psychometric properties, factorial structure, and measurement invariance across developmental status and age groups of the preschool teacher version of the Social Skills Rating System, which includes the Social Skills Scale and the Problem Behaviors Scale, within the Serbian preschool population. A total of 309 teachers provided evaluations for 450 children aged 3 to 7, including both typically developing children and those with developmental challenges. Confirmatory factor analysis supported the original structure of the Social Skills Rating Scale, with good overall model fit. While the Problem Behaviors Scale was retained in its original form, three items were removed from the Social Skills Scale due to low factor loadings or validity concerns, resulting in a refined 27-item version. Both scales demonstrated high internal consistency, convergent and discriminant validity. The present findings suggests that the instrument holds potential for assessing social skills and behavioral challenges in Serbian preschool children aged 3 to 7 years, including those with and without developmental disabilities, thereby extending its applicability beyond the originally intended age range of 3 to 5 years. Although preliminary psychometric results are promising, further research is needed to more robustly confirm its validity and reliability across broader populations and contexts.

Еще

Social skills, problem behaviors, social skills rating system, validation, teacher rating, preschool

Короткий адрес: https://sciup.org/170210281

IDR: 170210281 | УДК: 373.23(497.11), 159.923.5-053.4(497.11) | DOI: 10.23947/2334-8496-2025-13-2-349-363

Текст научной статьи Validation of the Social Skills Rating System: The Serbian Preschool Teacher Version

Social skills are learned, socially acceptable behaviors that facilitate positive interactions with others. They involve both initiating and responding to social cues, reinforced by appropriate social feedback and shaped by context. Essentially, social skills enable individuals to engage effectively in various social settings ( Little et al., 2017 ), and they are crucial for success in academic environments (e.g. Quílez- Robres et al., 2021 ). According to teachers, key social skills for thriving in school include active listening, following classroom rules and teacher instructions, seeking help when needed, collaborating with peers, and managing emotions in conflict situations ( Gresham et al., 2011 ).

ing ( American Psychiatric Association [APA], 2013 ; World Health Organization [WHO], 2018 ). Similarly, autism spectrum disorder is characterized by severe and persistent deficits in reciprocal social interaction, often accompained by difficulties in forming, sustaining, and understanding relationships ( APA, 2013 ; WHO, 2018 ). In defining intellectual disability, social skills are recognized as a key component within the broader concept of adaptive behavior ( APA, 2013 ; WHO, 2018 ). As Brojčin et al. (2011) note, cognitive deficits in children with intellectual disability, reflected in lower IQ scores, delay the development of social skills. This delay often leads to social withdrawal and less favorable peer relationships.

A variety of methods are available for evaluating social skills, including observational assessments, projective techniques, self-report instruments, sociometric evaluations, and behavior rating scales. Among these, behavior rating scales—often used alongside direct behavioral observations—are widely regarded as the most reliable and practical approach for assessing children’s social skills ( Merrell, 2001 ).

The Social Skills Rating System ( SSRS, Gresham and Elliott, 1990 ) is considered the gold standard for evaluating social skills in children aged 3–18 due to its comprehensive design. It is designed for screening and identifying children’s social strengths and potential challenges, as well as supporting specialists in developing targeted interventions to enhance social behavior.

Recognizing that children’s social behavior varies across different settings, the SSRS incorporates input from multiple evaluators (teachers, parents, and children) across diverse environments (preschool, school, home) and developmental stages. Since each group of respondents offers a unique perspective, Gresham and Elliot conducted separate analyses for the parent, teacher, and child versions of the instrument, revealing distinct factor structures. All versions of the SSRS Social Skills Scale assess three core factors: Cooperation, Assertion, and Self-Control. Additionally, the parent version includes an extra factor, Responsibility, while the child version incorporates Empathy. Beyond measuring social skills, the SSRS also evaluates problem behavior and academic performance, recognizing that problem behaviors often interfere with the development and application of social skills. The inclusion of an academic competence measure reflects the frequent co-occurence of poor social skills, competing problem behaviors, and below-average academic achievement ( Gresham et al., 2011 ).

The SSRS has been widely used in research, both as a whole and through its individual scales, across diverse child populations, including low-income groups ( Fantuzzo et al., 1998 ), post-institutionalized children ( Julian and McCall, 2016 ), and various clinical groups. These clinical groups include individuals with intellectual disabilities ( Brojčin and Glumbić, 2012 ; Jelić and Stojković, 2020 ; Memisevic and Biscevic, 2020 ), autism ( Rankin et al., 2016 ), visual impairments ( Bilić Prcić et al., 2015 ; Runjić et al., 2015 ), language impairments ( Pentimonti et al., 2016 ), and attention-deficit/hyperactivity disorder ( Van der Oord et al., 2005 ). Beyond research, the SSRS is frequently used in applied settings and in evaluating the effectiveness of social skills interventions (e.g., Goh et al., 2020 ). Additionally, it has served as a benchmark for the development and validation of new assessment tools (e.g. Arnesen et al., 2018 ; Fink et al., 2013 ; Gresham and Elliott, 2008 ; Gresham et al., 2010 ).

The SSRS demonstrates strong psychometric properties, with numerous studies confirming its validity and reliability. Research has examined various aspects of validity, with some studies focusing on reliability measures, such as internal consistency and test-retest reliability (e.g., Rich et al., 2008 ; Wang et al., 2011 ). Others have explored predictive, convergent, and discriminant validity, assessing its relationship with different constructs, criteria, or alternative social skills measures (e.g., Rich et al., 2008 ; Van der Oord et al., 2005 ; Wang et al., 2011 ).

Recognizing the influence of situational and cultural factors on social skills is essential for the development, validation, and meaningful interpretation of psychological measures in this domain. The assessment of social skills and related constructs is deeply shaped by the socio-cultural context in which social behaviors occur. Since assessment instruments reflects the values, beliefs, and communication norms of the culture in which they were developed, their applicability must be carefully considered when used in different cultural settings. Cultural norms—particularly in emotional expression, behavior regulation, and parental beliefs about child socialization—vary across societies (e.g. Cordaro et al., 2018; Deng et al., 2019; Minkov et al., 2018). As a result, behaviors deemed adaptive and appropriate in one culture may not be perceived the same way in another. Thus, an instrument’s effectiveness in its original cultural context does not necessarily guarantee its reliability and validity in a different setting (Jurado et al., 2006). The SSRS was developed for English-speaking informants and standardized on children and adolescents in the United States. However, its psychometric properties have not yet been examined within the Serbian socio-cultural context. To our knowledge, the only validation study conducted in Serbia involved an older population of children (12–18 years) and yielded a different factor structure from the original version, as determined through exploratory factor analysis (Jelić, 2015).

Considering the potential influence of the educational context on the implementation of assessment tools is also essential. In the United States, preschool includes children aged 3 to 5, whereas in Serbia, this stage extends from ages 3 to 6, covering both preschool and kindergarten years. Additionally, some preschool settings in Serbia accommodate 7-year-olds who begin first-grade within that calendar year. While having a single tool applicable to all children within a given educational framework would be practical, the SSRS preschool version, designed for 3 to 5-year-olds, may not fully capture the diverse age range of children attending Serbian kindergartens.

Given these considerations, the present study aims to evaluate the reliability and factorial structure of the preschool teacher version of the SSRS, which includes the Social Skills Scale and the Problem Behaviors Scale (Gresham and Elliott, 1990), within the Serbian preschool population. To address this aim, the main research questions are as follows:

1. Does the factor structure of the original SSRS Social Skills Scale, confirmed in a U.S. teacher sample, demonstrate good model fit when applied to a Serbian sample of preschool teachers? If so, does it show adequate internal consistency, convergent validity, and discriminant validity?
2. Does the factor structure of the original SSRS Problem Behaviors Scale, confirmed in the U.S. teacher sample, demonstrate good model fit in a Serbian sample of preschool teachers? If so, does it show adequate internal consistency, convergent validity, and discriminant validity?
3. Does the factor structure of both scales demonstrate measurement invariance across developmental status and age groups?

We hypothesize that the factors derived from the original Social Skills Scale—Cooperation, Assertion, and Self-Control—will be replicated in our sample of preschool teachers, along with the Internalizing and Externalizing problem behaviors factors from the Problem Behaviors Scale. If this structure is not confirmed, exploratory factor analysis will be conducted to identify the latent factor structure and reliability within the Serbian sample. Additionally, we hypothesize that the factor structure of the instrument will demonstrate measurement invariance across age groups (3–5 years vs. 6–7 years) and developmental status groups (children with developmental disabilities vs. typically developing children), thereby supporting its validity for diverse preschool populations. Our goal is to propose a version suitable for children aged 3 to 7 in Serbia, ensuring that a single instrument can effectively cover the entire preschool period.

Materials and Method

Participants and procedure

Participants were selected based on predefined criteria to ensure that the sample included a sufficient number of teachers working with children with developmental difficulties and those at risks, as well as a balanced representation of children across age and gender. Based on these criteria, the sample consisted of 309 preschool teachers reporting on 450 children, including 235 boys (52.2%) and 215 girls (47.8%), aged between 3 and 7 years ( M = 5.73, SD = 1.05). Of these, 140 children (31.1%) were in the younger age group (3–5 years), and 310 children (68.9%) were in the older age group (6–7 years). Of the total sample, 61.3% were children with typical development, while 38.7% faced various developmental difficulties or risks, as outlined in Table 1. Identification of children with developmental difficulties and those at risk was based on the presence of an Individualized Education Plan and supporting documentation available for each child within the preschool institution.The two groups were similar in age, t (448) = -1.37, p = .172, and gender distribution, x² = 2.484, p = .115. All children attended inclusive preschool institutions spread across 14 different sites, encompassing both rural (fewer than 5000 inhabitants) and urban areas in Serbia.

The vast majority of participating preschool teachers were female (96.8%), with a mean age of 41.04 years (SD = 9.83). Most teachers had between 5 and 20 years of professional experience (41.1%), followed by those with over 20 years (32.4%), while 26.5% had less than 5 years of experience. Additionally, 10.03% of the teachers were employed in rural settings.

Table 1. Sample characteristics

			Gender		Age
	N	%	Boys	Girls	Years
	N	%	N (%)	N (%)	M (SD)
Typical development	276	61.3	136 (49.3)	140 (50.7)	5.68 (1.06)
Children with developmental difficulties/risks	174	38.7	99 (56.9)	75 (43.1)	5.82 (1.01)
Health problems	5	1.1
Multiple disabilities	52	11.6
Speech and language disorders	36	8.0
Learning difficulties	17	3.8
Motor difficulties	25	5.6
Visual/ Hearing difficulties	23	5.1
Emotional/behavioral problems	16	3.6

Prior to data collection, the study was approved by the Ethics Committee for Research of the Faculty of Special Education and Rehabilitation, University of Belgrade. In addition, consent was obtained from the participating preschool and from the parents of the children whose social competencies were assessed by the teachers. Both parents and teachers received detailed information about the purpose of the research and were informed of their right to withdraw at any time without consequences.

Teachers completed the instrument within their preschool settings, under standardized conditions. Special attention was given to establishing good communication with the participants and addressing any initial questions or uncertainties about the process. In cases where participants encountered difficulties, researchers provided individual assistance, ensuring the accuracy and comprehension of the responses throughout the data collection process.

Instrument

The teacher version of the SSRS is a 40-item measure for assessing preschool children’s social competence (Gresham and Elliott, 1990). According to the manual, the standardization was conducted on a heterogeneous sample, encompassing a wide range of educational classifications, such as children with learning disabilities, behavioral disorders, and intellectual disabilities. The SSRS consists of two core components: the Social Skills Scale (30 items) and the Problem Behaviors Scale (10 items), which assess the frequency of children’s social skills and challenging behaviors. Each item is rated on a 3-point Likert scale (0 = never, 1 = sometimes, and 2 = very often).

The Social Skills Scale consists of three statistically derived domains: Self-Control, Assertion, and Cooperation. As outlined by Gresham and Elliott (1990) , the Cooperation domain primarily evaluates skills related to a child’s ability to collaborate with the teacher and follow classroom rules. This includes behaviors such as paying attention to the teacher’s instructions, complying with directives, and maintaining focus despite peer distractions. In contrast, the Assertion domain focuses on the child’s assertive behavior during interactions with both peers and adults. It includes behaviors such as initiating conversations, inviting peers to join group activities, offering compliments, and helping others. Furthermore, this domain also assesses the child’s ability to assert themselves appropriately when facing unfair treatment or demands from the teacher. Finally, the Self-Control domain evaluates the child’s ability to manage their emotions and respond effectively to conflicts with peers and teachers. It also examines the child’s capacity to balance their own needs with those of others in social interactions, including behaviors like controlling emotions, responding appropriately to teasing, and taking turns during play.

The Problem Behaviors Scale includes two domains: Externalizing and Internalizing subscales. The Externalising subscale includes behaviors like temper tantrums, fidgeting, arguing, and fighting, while the Internalising domain focuses on behaviors and emotions such as anxiety, sadness, and social withdrawal.

Procedures for adapting the SSRS

Adaptations to the original SSRS (Gresham and Elliott, 1990) were made to account for sociocultural differences between the U.S. and Serbian contexts during the development of the Serbian version of the scale. The scale was initially translated from English into Serbian by two independent bilingual experts proficient in both languages, and then back-translated by a translator who was unaware of the original instrument. Discrepancies between the original and back-translated versions were resolved by the team of translators to ensure the translation was accurate and clear. Additionally, a panel of experts reviewed the translation to ensure linguistic and contextual appropriateness, aiming to achieve clarity and conceptual equivalence with the original version.

A university-based panel of experts in the fields of early childhood education, children’s social competence, and special education, reviewed the translated instrument and discussed the content of each item to evaluate its validity, as well as its linguistic and cultural appropriateness. The panel reached a consensus, without major disagreements, that all items were clear, relevant, and appropriate within the context of preschool education in Serbia. They also concluded that the constructs and behaviors assessed by the scale (e.g., giving compliments to peers, appropriately expressing disagreement, responding to teasing) were easily understandable and culturally appropriate. Given the consistency of expert evaluations, a formal Content Validity Index (CVI) was not applied.

Additionally, cognitive interviews with end users were not conducted at this stage, as the consensus of experts provided a relevant and qualitatively grounded assessment of the cultural adequacy of the instrument’s content. Instead, a focus group with 15 preschool teachers was organized to further examine the face validity of the instrument in terms of relevance, clarity, and appropriateness of item content ( Olu- watayo, 2012 ), from the perspective of practitioners in preschool settings. Feedback from the focus group led to minor linguistic adjustments—such as expanding the phrase “Invites others to join in activities” into a more explicit expression (“Demonstrates initiative to socialize and invites peers to participate in shared activities”) and clarifying items like “Puts work materials or school property away” by adding context emphasizing independence (“Regularly puts personal belongings and work materials in their designated place without needing reminders”). These minor modifications, along with expert evaluations, contributed to the assessment of the instrument’s initial validity. Conceptual equivalence and construct validity were further evaluated through confirmatory factor analysis.

Statistical analysis

To assess the factor structure of the original SSRS, Confirmatory Factor Analysis (CFA) was conducted to evaluate the model fit for the current dataset. Given the ordinal nature of the items and significant deviations from normality, the analysis used the Diagonally Weighted Least Squares (DWLS) estimator ( Li, 2016 ), implemented in JASP 19.3 via the lavaan package.

Model fit was assessed using several common fit indices: Comparative Fit Index (CFI) and TuckerLewis Index (TLI), where values above 0.95 indicate a good fit, and values above .90 represent adequate fit ( Bentler, 1990 ; Hu and Bentler, 1999 ); Root Mean Square Error of Approximation (RMSEA), with values below 0.06 indicating good fit and below .08 representing a reasonable fit ( Browne and Cudeck, 1993 ; Steiger, 1990 ); and Standardized Root Mean Square Residual (SRMR), with values under .05 denoting a good fit and under .08 signifying an adequate fit ( Hu and Bentler, 1999 ). According to the non-invariance criteria for unequal group sizes ( Chen, 2007 ), invariance was supported when ΔCFI ≤ .005, combined with ΔRMSEA ≤ .010 or ΔSRMR ≤ .025 for testing metric invariance, and ΔSRMR ≤ .005 for testing scalar or residual invariance.

The reliability of the scales was assessed by examining the internal consistency of the factors, as indicated by both Cronbach’s alpha and McDonald’s omega coefficients. Convergent validity was considered satisfactory when the Average Variance Extracted (AVE) for each construct reached or exceeded .50 ( Hair et al., 2019 ). To evaluate discriminant validity, the heterotrait-monotrait ratio (HTMT) was applied; values below .90 were interpreted as evidence that the constructs represent distinct dimensions ( Hense- ler et al., 2015 ).

Results

To evaluate the suitability of the data for factor analysis, preliminary tests of sampling adequacy were conducted. The Kaiser–Meyer–Olkin measure yielded an excellent value of .94, suggesting sufficient intercorrelations among items. Additionally, Bartlett’s test of sphericity was statistically significant, %2 (435) = 10624.328, p < .001, supporting the factorability of the correlation matrix. Descriptive statistics revealed that most items deviated from normality (Table 2). Z-scores for skewness and kurtosis exceeded the ±1.96 threshold for most items, indicating significant departures from normality. Most Social Skills Scale items showed negative skewness, suggesting responses were skewed toward higher ratings, while negative kurtosis values indicated a flatter distribution. For Problem Behaviors Scale items, skewness Z-scores ranged from 3.46 to 6.26, and kurtosis Z-scores ranged from 0.63 to 16.09, indicating significant positive skewness and heavy-tailed distributions.

Table 2. Descriptive parameters of the SSRS

Item	Min	Max	M	SD	Z Skew	Z Kurt
Social Skills Scale
1	0	2	1.49	0.59	-5.79	-2.24
2	0	2	1.34	0.70	-5.09	-3.47
3	0	2	1.49	0.67	-8.18	-1.25
4	0	2	1.26	0.63	-2.32	-2.85
5	0	2	1.22	0.70	-2.79	-4.02
6	0	2	1.27	0.72	-4.01	-4.29
7	0	2	1.23	0.68	-2.77	-3.63
8	0	2	0.92	0.72	1.09	-4.66
9	0	2	1.54	0.64	-9.17	-0.01
10	0	2	1.57	0.58	-8.30	-0.37
11	0	2	1.12	0.72	-1.58	-4.56
12	0	2	0.66	0.68	4.66	-3.33
13	0	2	1.18	0.63	-1.45	-2.60
14	0	2	1.37	0.68	-5.37	-3.08
15	0	2	1.30	0.66	-3.62	-3.25
16	0	2	1.30	0.65	-3.35	-3.13
17	0	2	1.27	0.73	-4.03	-4.39
18	0	2	1.51	0.60	-6.78	-1.60
19	0	2	1.63	0.56	-10.40	1.96
20	0	2	1.19	0.65	-1.88	-3.05
21	0	2	1.36	0.64	-4.23	-2.90
22	0	2	1.44	0.61	-5.14	-2.50
23	0	2	1.14	0.63	-1.04	-2.33
24	0	2	1.33	0.66	-4.18	-3.21
25	0	2	1.33	0.70	-4.86	-3.60
26	0	2	1.02	0.64	-0.14	-2.49
27	0	2	1.36	0.66	-4.64	-3.05
28	0	2	1.05	0.61	-0.23	-1.54
29	0	2	1.39	0.70	-6.17	-2.97
30	0	2	1.21	0.74	-3.02	-4.77
Problem Behaviors Scale
1	0	2	0.67	0.73	5.23	4.03
2	0	2	0.75	0.75	3.46	4.86
3	0	2	0.55	0.63	6.26	2.06
4	0	2	0.62	0.69	5.83	3.01
5	0	2	0.28	0.56	16.09	10.52
6	0	2	0.46	0.59	7.98	0.66
7	0	2	0.41	0.60	10.12	1.39
8	0	2	0.64	0.63	4.08	1.59
9	0	2	0.43	0.64	10.57	1.41
10	0	2	0.46	0.60	8.03	0.63

Social Skill Scale Analysis

Following the descriptive analysis, a CFA was performed to assess the factorial validity of the Social Skills Scale. The chi-square test was statistically significant (χ² = 1026.083, df = 402, p < .001), indicating some discrepancy between the observed and model-implied covariance matrices. However, when using DWLS estimation, the chi-square statistic is not considered the most reliable indicator of model fit as its distribution is adjusted due to partial information in the weight matrix, which can affect its accuracy ( Kyriazos and Poga-Kyriazou, 2023 ; Li, 2016 ). Instead, the chi-square to degrees of freedom ratio (χ²/df) is commonly used, with values below 3.00 indicating good fit and values under 5.00 considered acceptable. To provide a more comprehensive assessment of model adequacy, we also reported main fit indices, including CFI, TLI, RMSEA, and SRMR.

The model demonstrated good overall fit, with χ²/df = 2.55, CFI = 0.983 and TLI = 0.982 both exceeding the recommended 0.95 threshold, indicating excellent comparative fit. RMSEA (0.059; 90% CI [0.054, 0.063]) was below the 0.06 cutoff, suggesting acceptable approximation error. Additionally, SRMR (0.073) fell within the acceptable range (< 0.08), further supporting the adequacy of the model fit. While internal consistency, as indicated by Cronbach’s Alpha and McDonald’s Omega coefficients, demonstrated excellent reliability, the convergent and discriminant validity values did not fully meet the threshold criteria.

The AVE for Factor 1 (Cooperation) and Factor 2 (Assertion) was above the recommended .50 threshold (.52 and .51, respectively), indicating good convergent validity. However, Factor 3 (Self-Control) had an AVE of .48, slightly below the threshold, suggesting weaker convergent validity. Given that HTMT values above .90 may indicate poor discriminant validity due to conceptual overlap between factors, the results suggest that Factor 1 and Factor 2 (HTMT = .98) may not be sufficiently distinct constructs, potentially requiring further refinement of the model. To address these concerns, we conducted a more detailed analysis of all obtained parameters, with a particular focus on examining the indicator loadings and modification indices.

All standardized factor loadings were statistically significant ( p < .001), indicating that each observed variable significantly contributed to its respective latent construct. However, for Factor 1 (Cooperation), standardized loadings ranged from .348 (CO12) to .840 (CO6), with CO12 exhibiting the weakest loading, indicating that the variance it shared with the latent factor was relatively low compared to other items. For Factor 2 (Assertion), all items showed substantial contributions to the factor, with standardized loadings ranging from .591 (AS3) to .778 (AS19). For Factor 3 (Self-Control), standardized loadings ranged from .384 (SC26) to .785 (SC21), with SC26 exhibiting the weakest loading. Similar to item CO12, item SC26 had a loading below the recommended threshold of .40 for acceptable item reliability, suggesting that both items should be removed from the model (Hamid et al., 2017). Accordingly, items CO12 and SC26 were excluded, and the CFA was rerun.

Comparative fit indices improved relative to the initial model, with χ²/df = 2.11, CFI increasing to 0.989 and TLI to 0.988. RMSEA decreased to 0.050 (90% CI [0.045, 0.055]), and SRMR dropped to 0.067, indicating a more refined and better-fitting model. All AVE values exceeded .50, indicating adequate convergent validity, while HTMT values (.89, .90, and .82) supported discriminant validity. However, the borderline HTMT value of .90 between Factors 1 and 3 warranted further consideration of construct overlap. Examination of modification indices revealed substantial cross-loadings for item SC14, with a modification index of 15.56 on Factor 1 and 22.12 on Factor 2. Consequently, this item was removed, and the CFA was rerun again.

The exclusion of SC14 reduced the HTMT value between Factors 1 and 3 to .88, and between Factor 2 and 3 to .79 indicating reduced factor overlap and improved discriminant validity. AVE values for all three factors were as follows: Factor 1 (.58), Factor 2 (.51), and Factor 3 (.52), satisfying the threshold for convergent validity.The 27-item model demonstrated good fit, with χ²/df = 2.04, CFI = 0.990, TLI = 0.989, RMSEA = 0.048 (90% CI [0.043, 0.053]), and SRMR = 0.066, not statistically significant, but showing a better fit than the initial model. Factor loadings and additional parameters of the final model are presented in Table 3. Standardized factor loadings ranged from .554 to .853 confirming that all retained items load meaningfully on their respective factors. These findings support both the construct validity and conceptual equivalence of the adapted instrument.

Table 3. Factor loadings – Social Skills Scale
Factor	Item	Estimate	Std. Error	z-value	^p	95% Confidence Interval		Std. Est. (all)
Factor	Item	Estimate	Std. Error	z-value	^p	Lower	Upper	Std. Est. (all)
	CO1	0.417	0.020	20.348	< .001	0.377	0.457	0.706
	CO6	0.617	0.022	28.523	< .001	0.575	0.660	0.853
	CO9	0.508	0.025	20.045	< .001	0.459	0.558	0.794
Factor 1	CO10	0.464	0.019	24.376	< .001	0.427	0.502	0.802
Factor 1	CO16	0.485	0.025	19.292	< .001	0.436	0.535	0.749
	CO18	0.441	0.026	16.865	< .001	0.390	0.492	0.738
	CO22	0.467	0.022	21.422	< .001	0.424	0.510	0.767
	CO27	0.443	0.029	15.160	< .001	0.386	0.500	0.672
	CO29	0.523	0.027	19.574	< .001	0.471	0.575	0.752

	AS2	0.484	0.030	16.353	< .001	0.426	0.542	0.696
	AS3	0.392	0.035	11.311	< .001	0.324	0.460	0.587
	AS5	0.522	0.026	19.991	< .001	0.471	0.573	0.749
	AS8	0.527	0.023	22.964	< .001	0.482	0.572	0.730
	AS11	0.536	0.025	21.860	< .001	0.488	0.584	0.747
Factor 2	AS17	0.546	0.026	20.920	< .001	0.495	0.597	0.749
	AS19	0.439	0.025	17.384	< .001	0.389	0.488	0.782
	AS24	0.436	0.026	16.790	< .001	0.385	0.487	0.658
	AS25	0.464	0.027	17.170	< .001	0.411	0.516	0.663
	AS30	0.560	0.026	21.641	< .001	0.509	0.611	0.760

	SC4	0.351	0.033	10.615	< .001	0.286	0.415	0.554
	SC7	0.429	0.033	13.190	< .001	0.365	0.492	0.633
	SC13	0.496	0.025	19.483	< .001	0.443	0.542	0.777
	SC15	0.488	0.024	20.558	< .001	0.441	0.534	0.740
Factor 3	SC20	0.467	0.026	18.293	< .001	0.417	0.517	0.718
	SC21	0.510	0.026	19.870	< .001	0.460	0.561	0.802
	SC23	0.496	0.025	19.632	< .001	0.446	0.545	0.788
	SC28	0.440	0.024	17.993	< .001	0.392	0.488	0.715

All three factors of the Social Skills Scale showed excellent internal consistency (Factor 1: ω = .92, α = .92; Factor 2: ω = .90, α = .91; Factor 3: ω = .89, α = .89). The overall scale also demonstrated high reliability (ω = .96, α = .96), indicating strong internal consistency across all items.

Problem Behaviors Scale Analysis

In the analysis of the second part of the SSRS scale related to problem behaviors, which originally consists of two factors, CFA was conducted to assess the distribution of items into factors. The results indicated good fit indices, with all relevant values meeting the recommended thresholds. The chi-square test for the factor model was significant (χ² = 53.479, df = 34, p = .018), but other fit indices were excellent: χ²/df = 1.57, CFI = 0.995, TLI = 0.993, RMSEA = 0.036 (90% CI [0.015, 0.054]), and SRMR = 0.054. AVE values for Factors 1 (.54) and 2 (.59) exceeded .50, indicating adequate convergent validity, and the HTMT ratio (.82) supported discriminant validity (sufficient distinction between factors). Factor loadings and additional parameters of the model are presented in Table 4.

Table 4. Factor loadings – Problem Behaviors Scale

95% Confidence Interval

Factor	Item	Estimate	Std. Error	z-value	^p	Lower	Upper	Std. Est. (all)
	EPB31	0.439	0.017	25.838	< .001	0.406	0.473	0.599
	EPB32	0.552	0.019	29.314	< .001	0.515	0.589	0.731
	EPB33	0.462	0.017	26.829	< .001	0.428	0.496	0.734
Factor 1	EPB34	0.601	0.019	32.127	< .001	0.565	0.638	0.874
	EPB37	0.430	0.017	25.217	< .001	0.397	0.464	0.713
	EPB38	0.459	0.017	26.406	< .001	0.425	0.493	0.729

	IPB35	0.405	0.020	20.139	< .001	0.365	0.444	0.725
	IPB36	0.481	0.020	24.112	< .001	0.442	0.520	0.811
Factor 2	IPB39	0.506	0.021	23.688	< .001	0.464	0.548	0.793
	IPB40	0.437	0.019	22.610	< .001	0.399	0.475	0.732

All standardized factor loadings were statistically significant ( p < .001), supporting the hypothesized two-factor structure of the Problem Behaviors Scale (Table 4). For Factor 1 (Externalizing Problem Behaviors), standardized loadings ranged from .599 (EPB31) to .874 (EPB34), indicating moderate to strong relationships between the items and the latent construct. Similarly, Factor 2 (Internalizing Problem Behaviors) showed standardized loadings ranging from .725 (IPB35) to .811 (IPB36), also reflecting strong item-factor associations. These results suggest that all items are good indicators of their respective latent factors and support both the construct validity and conceptual equivalence of the adapted instrument.

The analysis of convergent and discriminant validity revealed that the AVE values for each factor exceed .50, indicating adequate convergent validity. Moreover, the HTMT ratios for all factor pairs were below the .90 threshold, confirming adequate discriminant validity and clear differentiation between the constructs. These findings suggest that the factors are well represented by their indicators and sufficiently distinct from each another.

The Problem Behaviors Scale demonstrated good internal consistency for both factors (Factor 1: ω = .87, α = .87; Factor 2: ω = .85, α = .85), while the overall scale showed excellent reliability (ω = .91, α = .91).

The results confirmed our hypothesis that the factors derived from the original Social Skills Scale— Cooperation, Assertion, and Self-Control—were replicated in our sample of preschool teachers, along with the Internalized and Externalized problem behavior factors from the Problem Behaviors Scale.

Measurement invariance across development status and age groups

Addressing the third research question, the invariance of the SSRS scales (Social Skills and Problem Behaviors) across developmental status (Table 5) and age groups (Table 6) was tested following the criteria defined by Chen (2007) for unequal group sizes.

Table 5. Measurement invariance across developmental status on SSRS scales

SSRS scales	Invariance level	CFI	ΔCFI	RMSEA	ΔRMSEA	SRMR	ΔSRMR
Social skills	Configural	.989		.050		.079
	Metric	.990	.001	.047	-.003	.078	-.001
	Scalar	.991	.001	.045	-.002	.078	.000
Problem behaviors	Configural	.992		.042		.074
	Metric	.994	.002	.039	-.003	.071	-.003
	Scalar	.996	.002	.030	-.009	.071	.000

The observed changes in CFI, RMSEA, and SRMR values across different levels of invariance testing (metric vs. configural, scalar vs. metric) for developmental status ranged from 0 to .003, with the exception of a scalar invariance difference of .009 in RMSEA on the Problem Behaviors Scale, which remains within acceptable limits. Similarly, invariance testing across age groups showed changes in fit indices ranging from 0 to .004, with RMSEA differences of .008 for scalar invariance on the Problem Behaviors Scale and .009 for metric invariance on the Social Skills Scale, both within acceptable thresholds.

Table 6. Measurement invariance across age on SSRS scales

SRSS scales	Invariance level	CFI	ΔCFI	RMSEA	ΔRMSEA	SRMR	ΔSRMR
	Configural	.984		.061		.086
Social skills	Metric	.988	.004	.052	-.009	.082	-.004
	Scalar	.989	.001	.050	-.002	.082	.000
	Configural	.994		.036		.071
Problem behaviors	Metric	.994	.000	.037	.001	.070	-.001
	Scalar	.996	.002	.029	-.008	.070	.000

These results confirm that the structure of both the Social Skills and Problem Behaviors Scales remains invariant across developmental status and age groups (3–5 and 6–7 years). However, as presented in Table 6, the SRMR for the Social Skills Scale across age groups is slightly elevated at .086. This minor increase can be attributed to the unequal group sizes (310 vs. 140). As noted by Chen (2007) , slight deviations above .08 in SRMR values are not uncommon under such conditions.

Considering the observed fit indices and invariance test results, we confirm that the SSRS instrument, validated within a sample of preschool teachers in our context, demonstrates measurement invariance across both developmental status and age groups within the preschool population.

Discussion

The aim of this study was to validate the factor structure of the original teacher version of the SSRS (Gresham and Elliott, 1990) in a sample of preschool teachers in Serbia. The SSRS was originally designed to assess the social competence of children aged 3 to 5 years based on evaluations by U.S. teachers. The instrument consists of two main scales: the Social Skills Scale and the Problem Behaviors Scale.

The original Social Skills Scale comprises three subscales-Cooperation, Assertion, and Self-Control—each containing 10 items each. Our first research question addressed whether the factor structure of the original Social Skills Scale would demonstrate a good model fit in a sample of preschool teachers in Serbia. The original three-factor model, consisting of 30 items corresponding to the Cooperation, Assertion, and Self-Control subscales, demonstrated good fit indices and high reliability, as indicated by Cronbach’s Alpha and McDonald’s Omega coefficients. However, convergent and discriminant validity fell below the recommended thresholds in our sample. Specifically, the AVE for the Self-Control factor was below .50 ( Hair et al., 2019 ), and HTMT analysis revealed correlations between Cooperation and Assertion subscales that exceeded the .90 threshold ( Henseler et al., 2015 ).

To address these issues, several modifications were made. Based on a detailed analysis of the data and guided by both theoretical and empirical criteria outlined by Hamid et al. (2017) and Hair et al. (2019) , two items with low factor loadings (< .40) were removed: one from the Self-Control factor ( Receives criticism well ) and one from the Cooperation factor ( Introduces himself or herself to new people without being told ). In addition, the removed Cooperation item exhibited high cross-loadings with the Assertion factor. This overlap is theoretically justified, as assertion encompasses traits such as taking initiative, confident behavior, and self-presentation, which can also contribute to cooperative behavior (e.g., Vagos and Pereira, 2019 ). Furthermore, one additional item was removed due to a lack of discriminant validity between Cooperation and Self-Control, as indicated by an HTMT value above .90 ( Henseler et al., 2015 ), suggesting considerable factor overlap. Although originally assigned to the Self-Control factor, this item ( Cooperates with peers without prompting ), is conceptually more aligned with the Cooperation factor.

These slight modifications, guided by both statistical criteria and theoretical considerations, improved the model’s validity and fit indices while maintaining high internal consistency. Cronbach’s Alpha and McDonald’s Omega coefficients ranged from .89 to .93 for each factor, and .96 for the overall scale. The revised model includes 27 out of the 30 original items: 9 items in the Cooperation subscale, 8 items in the Self-Control subscale, and all 10 items in the Assertion subscale. Consistent with psychometric literature on latent structure analysis (e.g., Brown, 2015; Kline, 2016), such modifications do not undermine the theoretical integrity of the model as long as the latent factor structure remains intact. In other words, aside from the three removed items, all other items were consistently aligned with the original version, indicating that preschool teachers in both Serbia and the United States similarly perceive and differentiate cooperation, assertion, and self-control in children.

Correlations among the factors confirmed that these social skills represent distinct yet interrelated dimensions, collectively forming a unified construct of social skills. Furthermore, the high internal consistency coefficients indicate that the factors consistently reflect the underlying social competence construct. However, reliability values exceeding .90 may not necessarily represent “excellent” internal consistency, as they can suggest item redundancy ( Hamid et al., 2017 ). This raises the possibility that some items within each factor may overlap in content or fail to provide distinct information about the construct being measured. Similarly, the initial HTMT values exceeding the recommended threshold of .90 indicate substantial overlap between Cooperation and Assertion, as well as Cooperation and Self-Control. These findings are consistent with previous results reported by Gresham and Elliott (1990) , as well as with studies conducted across different educational levels, age groups, and informant types (e.g., Jelić, 2015 ; Whiteside et al., 2007 ; Van der Oord et al., 2005 ), which also reported moderate to strong correlations between these factors. This pattern reflects the complex nature of social skills, in which various components are inherently interconnected. Although constructs such as cooperation, assertion, and self-control are theoretically distinct, they are often closely related in practice.

Our second research question investigated whether the factor structure of the original Problem Behaviors Scale, confirmed in the U.S. sample of teacher, would demonstrate good model fit in a Serbian sample of preschool teachers. The Problem Behaviors Scale includes two subscales: Externalizing Behaviors (6 items) and Internalizing Behaviors (4 items). The two-factor structure replicated well, with very good model fit indices. Reliability analysis, including Cronbach’s Alpha and McDonald’s Omega coefficients, revealed significant values above .80 for both subscales. HTMT correlations and AVE metrics further confirmed the psychometric quality of the Problem Behaviors Scale in the Serbian sample.

Our final research question examined measurement invariance across developmental status and age groups. Multigroup analyses confirmed that both the Social Skills Scale and the Problem Behaviors Scale demonstrate measurement invariance across these groups, indicating that the instrument reliably assesses social competence among Serbian preschool children regardless of developmental differences or age.

Although the original SSRS was designed for children aged 3 to 5 years, our results confirm its applicability and reliability for assessing social skills and problem behaviors in Serbian preschool children aged 3 to 7 years. For the Social Skills Scale, three items were removed, resulting in a revised version of the scale with 27 items, compared to the original 30. The Problem Behaviors Scale was fully replicated as it appeared in the original version. The modified version of the SSRS, reflecting these changes, is provided in the appendix.

Based on the present findings, the instrument shows promise for evaluating both typically developing children and those with developmental disabilities (e.g., autism, intellectual disability, and other challenges). While this suggests potential suitability for inclusive preschool settings, clinical contexts, and group-based research, further evidence, particulary external validity, is needed to fully confirm its appropriateness across diverse populations. Nonetheless, the instrument may support the development of individualized programs to enhance children’s social competence and provide useful information for group-based assessments in research.

Limitations and Future Directions

Despite its contributions, this study has some limitations that warrant further investigation. The sample included only preschool teachers. Although measurement invariance was confirmed across groups defined by developmental status and age, future studies should examine whether these results generalize across different informant types (e.g., special education teachers) and across SSRS versions (e.g., the parent form). In addition, while the sample encompassed participants from both urban and rural environments, the proportion of respondents from rural areas was limited to approximately 10%. Due to this imbalance, the sample cannot be considered fully representative of the rural preschool teacher population in Serbia. Future research should aim to recruit a more balanced sample to better reflect the diversity of educational contexts across urban and rural areas.

Although confirmatory factor analysis supported the conceptual equivalence and factorial validity of the adapted instrument, this study did not include formal quantitative assessments of content validity (e.g., Content Validity Index) or cognitive interviews due to reasons detailed in the methodology section. Future studies could further strengthen the adaptation process by incorporating additional qualitative methods and formal content validity indices.

Establishing inter-rater reliability between teacher and parent reports would provide a more comprehensive view of children’s social behavior and further support the broader applicability of the scale.

Additionally, this study did not include a comparison with other similar instruments, which limits the ability to evaluate the SSRS’s external discriminant validity. Future research could address this gap by comparing the SSRS with other well-established measures of social skills, thereby providing further support for its psychometric properties. Such comparisons would offer a more comprehensive understanding of how the SSRS differs from other tools in assessing social competence and its applicability across diverse settings.

Finally, the high inter- and intra-factor correlations—reflected in both Omega/Alpha coefficients and HTMT values—underscore the nuanced and interconnected nature of social skills. While this shared variance aligns with the theoretical expectations for closely related competencies, it also complicates the interpretation of the factors as entirely distinct. These results suggest a need for careful item selection to balance reliability and construct validity. Future research could refine the instrument by reducing item redundancy or applying alternative modeling approaches, such as bifactor or second-order models, to better distinguish shared from unique variance across dimensions.

Conclusion and implications

Social skills are critical for positive outcomes in various social settings. A validated tool for assessing these skills and challenging behaviors is a crucial step in supporting early childhood development and fostering children’s social competencies. This study represents the first empirical construct validation of the SSRS teacher form ( Gresham and Elliott, 1990 ) in Serbia. By confirming the original three-dimensional structure with slight modifications, this research provides a reliable instrument for evaluating and monitoring the development of social skills and problem behaviors in children aged 3 to 7 years. Validating the modified model with a new sample is essential to ensure its stability and robustness. Additionally, the study offers an opportunity for researchers from other countries to compare their findings with ours.

Although this study did not examine the relationship between social skill deficits and challenging behaviors, the validated SSRS provides a reliable foundation for future research on these connections. Identifying social skill deficits in both typically and atypically developing children supports early interventions to enhance social competencies and prevent problem behaviors.

Future research should focus on validating the SSRS parent form, as differences in how parents and teachers assess social skills and behaviors, especially for children with developmental disabilities, could offer valuable insights for tailored interventions. Additionally, exploring the SSRS constructs across different educational levels (from preschool to high school) and from various sources (e.g. parents, teachers, and special educators) would broaden its applicability and deepen our understanding of social skill development, contributing to a more holistic approach to promoting social competence in diverse educational settings.

Conflict of interests

The authors declare no conflict of interest.

Acknowledgments

This research was supported by the Ministry of Science, Technological Development and Innovation [No 451-03-137/2025-03/200096], Republic of Serbia.

Appendix A: Serbian Version of the Modified SSRS - Item Content and Reliability by FactorSkala socijalnih veština (Social Skills Scale)

F1: Kooperativnost (α = .92; ω = .92)

1. Pridržava se vaših uputstava.
6. Završava svoje obaveze i zadatke bez vašeg podsticanja na to.
9. Rado učestvuje u zajedničkim i grupnim aktivnostima.
10. Izvršava zadatke u skladu sa vašim instrukcijama.
16. Vreme koristi na prikladan način dok čeka pomoć vaspitača.
18. Vreme predviđeno za aktivnosti po izboru dete koristi na prihvatljiv način.
22. Završava zadatke u predviđenim rokovima.
27. Bez opominjanja na to, uredno odlaže svoje stvari i radni material na predviđeno mesto.
29. Pridružuje se aktivnosti ili grupi, bez da mu se to kaže.
2. Lako sklapa prijateljstva.
3. Na prikladan način saopštava kada smatra da ste bili nepravedni prema njemu/njoj.
5. Na odgovarajući način izražava sumnju u pravila koja smatra nepravednim.
8. U primerenim situacijama daje komplimente vršnjacima.
11. Pomaže Vam i kada to izričito ne zatražite od njega.
17. Govori lepe stvari o sebi u skladu sa situacijom.
19. Reaguje na pravi način kada mu se uputi kompliment i pohvala.
24. Inicira komunikaciju sa vršnjacima.
25. Inicira druženje i poziva vršnjake da mu se pridruže u nekoj aktivnosti.
30. Dobrovoljno pomaže vršnjacima u rešavanju zadataka.
4. Prikladno reaguje na zadirkivanja vršnjaka.
7. Kontroliše gnev u konfliktnim situacijama sa odraslima.
13. Prihvata ideje i predloge vršnjaka u vezi grupnih aktivnosti
15. Strpljivo čeka svoj red u igri ili drugim grupnim aktivnostima.
20. Dobro kontroliše emocije u konfliktima sa vršnjacima.
21. Pridržava se i sledi pravila u igrama i aktivnostima sa drugim vršnjacima.
23. Prihvata kompromis u konfliktnim situacijama izmenom sopstvenih ideja. u cilju postizanja dogovora.
28. Na adekvatan način reaguje na pritisak vršnjaka.
31. Ima napade besa.
32. Stalno se vrpolji, ne može da sedi na jednom mestu.
33. Bez realnog povoda, raspravlja se sa drugima.
34. Ometa rad.
37. Tuče se sa drugima.
38. Ne poštuje pravila.
35. Kaže da njega (nju) niko ne voli.
36. Čini se da je usamljen (usamljena).
39. Ispoljava anksioznost u grupi druge dece.
40. Deluje kao da je tužan (tužna) ili depresivan (depresivna).

F2: Asertivnost (α = .91; ω = .90)

F3: Samokontrola (α = .89; ω = .89)

Skala problematičnog ponašanja (Problem Behaviors Scale)

F1: Eksternalizovani problemi (a = .87; w = .87 )

F2: Internalizovani problemi (α = .85; ω = .85)