Online Learning Platform

Data Analysis Using Python > Correlation > Rank Correlation

Spearman Rank Correlation Coefficient (rs)

  • It is a non-parametric measure of correlation.
  • This procedure makes use of the two sets of ranks that may be assigned to the sample values of x and Y.
  • Spearman Rank correlation coefficient could be computed in the following cases:
    • Both variables are quantitative.
    • Both variables are qualitative ordinal.
    • One variable is quantitative and the other is qualitative ordinal.

Procedure:

  1. Rank the values of X from 1 to n where n is the numbers of pairs of values of X and Y in the sample.
  2. Rank the values of Y from 1 to n.
  3. Compute the value of di for each pair of observation by subtracting the rank of Yi from the rank of Xi
  4. Square each di and compute ∑di2 which is the sum of the squared values.
  5. Apply the following formula

 

In a study of the relationship between level education and income the following data was obtained. Find the relationship between them and comment.

Income
(Y)

level education
(X)

sample
numbers

25

Preparatory.

A

10

Primary.

B

8

University.

C

10

secondary

D

15

secondary

E

50

illiterate

F

60

University.

G

 

Solutions:

Comment: There is an indirect weak correlation between level of education and income.

Python Codes:

import pandas as pd
# creating a data dictionary
data = {
'Income': [25, 10, 8, 10, 15, 50, 60],
'Education': ['Preparatory', 'Primary', 'University',
'Secondary', 'Secondary', 'Illiterate',
'University']
}

# Create DataFrame
df = pd.DataFrame(data)
edu_codes = pd.Categorical(df['Education'], ordered=True,
categories=sorted(set(df['Education'])))

edu_order = {
'Illiterate': 1,
'Preparatory': 2,
'Primary': 3,
'Secondary': 4,
'University': 5
}
df['EducationRank'] = df['Education'].map(edu_order)

from scipy.stats import spearmanr, rankdata

# Apply rankdata to get average ranks
df['EducationRank'] = rankdata(edu_codes.codes, method='average')

# Spearman rank correlation
corr, p_value = spearmanr(df['Income'], df['EducationRank'])

# Output
print("DataFrame with Ranks:\n", df)
print(f"\nSpearman rank correlation: {corr:.4f}")
print(f"P-value: {p_value:.4f}")

 

Output:

DataFrame with Ranks:

    Income    Education  EducationRank

0      25  Preparatory            2.0

1      10      Primary            3.0

2       8   University            6.5

3      10    Secondary            4.5

4      15    Secondary            4.5

5      50   Illiterate            1.0

6      60   University            6.5

Spearman rank correlation: -0.2661

P-value: 0.5641

Comments:

Since correlation is negative so education and income is inversely related. i.e. higher educated persons earned less.

 

Prev
Test of Correlation Coefficient

No More

Feedback
ABOUT

Statlearner


Statlearner STUDY

Statlearner