Posts

Unit 1: Statistical Data & Descriptive Statistics, NEP FYUGP Business Statistics Notes BCom 3rd Semester

Get Gauhati University BCom 3rd Semester Business Statistics Unit 1: Statistical Data and Descriptive Statistics Notes with Most Important Topics!

Gauhati University BCom 3rd Semester
NEP FYUGP Business Statistics Notes

Unit 1: Statistical Data and Descriptive Statistics

Classification of Data - Univariate, Bivariate and Multi-Variate Data

Classification of Data:

 → The process of arranging the data in groups or classes according to their common characteristics is technically known as classification. Classification is the grouping of related facts into classes. It is the first step in tabulation. 

 → In the words of Secrist, "Classification is the process of arranging data into sequences and groups according to their common characteristics or separating them into different but related parts."

Essentials of classification:

a) The classification must be exhaustive so that every unit of the distribution may find place in one group or another.

b) Classification must conform to the objects of investigation.

c) All the items constituting a group must be homogeneous.

d) Classification should be elastic so that new facts and figures may easily be adjusted.

e) Classification should be stable. If it is not so and is changed for every enquiry then the data would not fit for an enquiry.

f) The data must not overlap. Each item of the data must be found in one class.

Classification of Data According to Nature:

Univariate Data: Univariate data refers to data that consists of observations on a single variable. It is the simplest type of data because it studies only one characteristic at a time. The main purpose of univariate analysis is to describe, summarize, and find patterns in the given dataset without examining relationships with other variables.

The following are the features of univariate data:

  1. It involves only one variable under study.

  2. It is used to describe the distribution and pattern of the dataset.

  3. It helps in identifying the central tendency (average value) of the data.

  4. It is useful to measure the variability or dispersion within the data.

  5. It does not study cause-and-effect relationships between variables.

The following are the examples of univariate data:

  • Age of 10 workers in a factory.

  • Marks scored by 50 students in Mathematics.

  • Daily wages of labourers.

  • Height of students in a class.

  • Population of different villages.

Methods of analysis:

  • Measures of Central Tendency: Mean, Median, Mode.

  • Measures of Dispersion: Range, Standard Deviation, Variance.

  • Graphical Methods: Bar graphs, Histograms, Pie charts, Line graphs.

Bivariate Data

Meaning:  Bivariate data refers to data that involves two variables observed simultaneously. Its analysis focuses on finding out the relationship, correlation, or cause-and-effect association between the two variables.

The following are the features of bivariate data:

  1. It consists of two variables studied together.

  2. It helps in understanding the degree of relationship between the two variables.

  3. It is often used to study dependency or association.

  4. It can identify whether changes in one variable affect the other.

  5. It is represented by scatter diagrams, tables, or paired data sets.

The following are the examples of bivariate data:

  • Age and weight of workers.

  • Demand and supply of a product.

  • Marks of students in Mathematics and English.

  • Temperature and sales of woollen clothes.

  • Price and demand of a commodity.

Methods of analysis:

  • Correlation Analysis.

  • Regression Analysis.

  • Scatter diagrams and trend lines.

  • Cross-tabulation.

Multivariate Data

Multivariate data refers to data that involves three or more variables studied simultaneously. It is complex in nature and is used when multiple factors interact and influence the outcome together.

The following are the features of multivariate data:

  1. It consists of three or more variables.

  2. It is suitable for analyzing complex relationships among variables.

  3. It helps in prediction and decision-making by considering many factors at once.

  4. It requires the use of advanced statistical methods.

  5. It is widely used in business, economics, psychology, and social sciences.

The following are the examples of multivariate data:

  • Sales of a product in different regions considering price, advertisement, and income level.

  • Study of education, occupation, and income on lifestyle.

  • Weather forecasting considering temperature, humidity, and wind speed.

  • Medical research involving age, weight, blood pressure, and cholesterol levels.

  • Consumer preference based on price, quality, and brand image.

Methods of analysis:

  • Multiple Correlation and Multiple Regression.

  • Multivariate Analysis of Variance (MANOVA).

  • Factor Analysis.

  • Cluster Analysis.

  • Principal Component Analysis (PCA).

Time Series Data

Time series data refers to data that is collected on a single variable at different points of time, usually at equal intervals such as daily, monthly, quarterly, or yearly. Its purpose is to study trends, seasonal variations, cyclical movements, and random fluctuations over time.

The following are the features of time series data:

  1. It involves one variable studied over time.

  2. Observations are arranged in chronological order.

  3. It is used to study past behavior of the variable.

  4. It helps in forecasting future trends.

  5. It is widely applied in economics, finance, business, and weather studies.

The following are the examples of time series data:

  • Sales of product X for the last 10 years.

  • Monthly rainfall in Assam during the last 5 years.

  • Daily closing prices of the stock market.

  • Annual production of wheat in India.

  • Yearly population growth of a city.

Methods of analysis:

  • Trend Analysis.

  • Moving Averages.

  • Time Series Decomposition.

  • Forecasting Models (e.g., ARIMA, Exponential Smoothing).

Cross-Sectional Data

Meaning:  Cross-sectional data refers to data collected on two or more variables at the same point of time across different individuals, groups, or regions. It provides a snapshot for comparison but does not show changes over time.

The following are the features of cross-sectional data:

  1. It is collected at one point of time only.

  2. It compares different entities or groups simultaneously.

  3. It helps in identifying variations across individuals or regions.

  4. It is easier to collect compared to time series data.

  5. It is widely used in surveys, censuses, and comparative studies.

The following are the examples of cross-sectional data:

  • Sales of product X in four different states in 2025.

  • Income levels of 50 households in a village in the same year.

  • Marks obtained by students in different subjects in one examination.

  • Average expenditure of families in different districts.

  • Employment levels in different industries at a given time.

Methods of analysis:

  • Comparative Statistics (averages, ratios, percentages).

  • Cross-tabulation.

  • Regression and correlation.

  • Graphical comparison (bar charts, pie diagrams).

 MEASURE OF CENTRAL TENDENCY (AVERAGE)

Meaning of Average:

One of the most important objectives of statistical analysis is to get one single value that describes the characteristics of the entire mass of data. Such a value is called the central value or an “average” or the expected value of the variables.

In the words of Croxton and Cowden, “An average value is a single value within the range of the data that is used to represent all the values in the series.”

In the words of Clark, “Average is an attempt to find one single figure to describe a whole of figures.”

From the above explanation we can say that average is a single value that represents a group of values. It depicts the characteristics of the whole group. The value of average lies between the maximum and minimum values of the series. That is why it is also called the measure of central tendency.

Objectives of averaging

The objective of study of averages is listed below:
a) To get a single value that describes the characteristic of the entire group.
b) To facilitate comparison of data either at a point of time or over a period of time.

Requisites of a good average

The following are the important properties which a good average should satisfy:

  1. It should be easy to understand.

  2. It should be simple to compute.

  3. It should be based on all the items.

  4. It should not be affected by extreme values.

  5. It should be rigidly defined.

  6. It should be capable of further algebraic treatment.

Types of average

Average is divided into three main categories:
a) Mean which is further classified as: Arithmetic Mean, Weighted Mean, Geometric Mean and Harmonic Mean.
b) Median
c) Mode

Arithmetic Mean: Meaning, Properties, Merits and Demerits


It is a value obtained by adding together all the items and by dividing the total by the number of items. It is also called average. It is the most popular and widely used measure for representing the entire data by one value. 

Arithmetic mean may be either:

(i)  Simple arithmetic mean, or

(ii)  Weighted arithmetic mean.

Properties of arithmetic mean:

1. The sum of deviations of the items from the arithmetic mean is always zero i.e. ∑(X–X) =0.

2. The Sum of the squared deviations of the items from A.M. is minimum, which is less than the sum of the squared deviations of the items from any other values.

3. If each item in the series is replaced by the mean, then the sum of these substitutions will be equal to the sum of the individual items.

Merits of A.M.:

(i)  It is simple to understand and easy to calculate.

(ii)  It is affected by the value of every item in the series.

(iii) It is rigidly defined.

(iv) It is capable of further algebraic treatment.

(v) It is calculated value and not based on the position in the series.

Demerits of A.M.:

(i)  It is affected by extreme items i.e., very small and very large items.

(ii)  It can hardly be located by inspection.

(iii)  In some cases A.M. does not represent the actual item. For example, the average number of patients admitted in a hospital is 10.7 per day.

(iv)  A.M. is not suitable in extremely asymmetrical distributions.

Geometric Mean (GM): Meaning, Uses, Merits and Demerits


It is defined as nth root of the product of n items or values. i.e., G.M. = n√ (x1. x2. x3 ……xn)

Merits of G.M.:

(i)      It is not affected by the extreme items in the series.

(ii)    It is rigidly defined and its value is a precise figure.

(iii)   It is capable of further algebraic treatment.

(iv)  It is Useful in calculating index number.

Demerits of G.M.:

(i)    It is difficult to understand and to compute. 

(ii)   It cannot be computed when one of the values is 0 or negative.

Uses of G.M.:

(i) It is used to find average of the rates of changes.

(ii) It is Useful in measuring growth of population.

(iii) It is considered to be the best average for the construction of index numbers.

Harmonic Mean (HM): Meaning, Uses, Merits and Demerits


It is defined as the reciprocal of the arithmetic mean of the reciprocal of the individual observations.

H.M.

 = 

(1/x1 + 1/x2 + 1/x3 + ........ +1/xn)

Merits of H.M. (Harmonic Mean):

  1. Like AM and GM, it is also based on all observations.

  2. It is the most appropriate average under conditions of wide variations among the items of a series since it gives larger weight to smaller items.

  3. It is capable of further algebraic treatment.

  4. It is extremely useful while averaging certain types of rates and ratios.

Demerits of H.M.:

  1. It is difficult to understand and to compute.

  2. It cannot be computed when one of the values is 0 or negative.

  3. It is necessary to know all the items of a series before it can be calculated.

  4. It is usually a value which may not be a member of the given set of numbers.

Uses of H.M.:  If there are two measurements taken together to measure a variable, H.M. can be used. For example, tonne mileage, speed per hour. In the above example, tonne is one measurement and mileage is another measurement. H.M. is used to calculate the average speed.


MEDIAN: MEANING, MERITS & DEMERITS

Meaning of Mode:  Mode is that value of a dataset which is repeated most often in the database. In other words, mode is the value which is predominant in the series or is at the position of greatest density. Mode may or may not exist in a series, or if it exists, it may not be unique, or its position may be somewhat uncertain.

Merits of Mode:

  1. Mode is the most representative value of distribution; it is useful to calculate modal wage.

  2. It is not affected by the extreme items in the series.

  3. It can be determined graphically.

  4. For open-ended classes, mode can be calculated.

  5. It can be located by inspection.

Demerits of Mode:

  1. It is not based on all observations.

  2. Mode cannot be calculated when frequency distribution is ill-defined.

  3. It is not capable of further algebraic treatment. Like mean, combined mode cannot be calculated.

  4. It is not a rigidly defined measure because several formulae to calculate mode are used.

Relationship between Mean, Median, and Mode:

  • In a normal distribution: Mean = Median = Mode

  • In an asymmetrical distribution, median is always in the middle, but mean and mode will interchange their positions or values.

  • Mode = 3 × Median – 2 × Mean

  • Or, 3 × Median = 2 × Mean + Mode

Relation between Arithmetic Mean, Geometric Mean, and Harmonic Mean:

  1. AM is greater than or equal to GM, and GM is greater than or equal to HM:
    A.M. ≥ G.M. ≥ H.M.

  2. GM is the square root of the product of AM and HM:
    GM = √(Arithmetic Mean × Harmonic Mean)

Which Average Should Be Used?

No single average is suitable for all situations, as each type of average has its own unique characteristics. The following factors should be considered when selecting the most appropriate average:

  1. The purpose for which the average is being calculated.

  2. The nature of the data available:

    1. Highly skewed data – avoid arithmetic mean.

    2. Data with gaps around the middle – avoid median.

    3. Unequal class intervals – avoid mode.

  3. Whether further computations are required.

  4. The typical value needed for the specific problem.

Uses of Various Types of Averages

1) Arithmetic Mean (AM):  The arithmetic mean is generally considered the best average, but it may not be suitable in the following situations:

  • In highly skewed distributions.

  • In distributions with open-ended intervals.

  • When there is an irregular difference in the range of data.

  • To average ratios or rates of change.

  • When the dataset contains extremely large or small values, which may distort the mean.

2) Median:  The median is usually the most appropriate average for open-ended or grouped distributions.

3) Mode:  Mode is most suitable when there is a large frequency of certain values. It is also applicable for qualitative data.

4) Geometric Mean (GM):  The geometric mean is used:

  • To calculate the average of rates of change.

  • To measure population growth.

  • As the preferred average in constructing index numbers.

5) Harmonic Mean (HM):  Harmonic mean is applied when two measurements are combined to measure a variable. For example, in tonne mileage or speed per hour, where tonne and mileage are separate measurements. HM is particularly useful to calculate average speed in such cases.

MEASURE OF DISPERSION

Meaning of Dispersion: The average of a given distribution is a single value that represents the entire data. However, the average alone cannot adequately describe a set of observations unless all observations are identical. It is necessary to describe the variability or dispersion of the observations. Two or more distributions may have the same central value, yet their spread can be very different. Measures of dispersion help study this important characteristic of a distribution.

  • In the words of Brooks and Dick:
    “Dispersion is the degree of the scatter or variation of the variable about a central value.”

  • In the words of Simpson and Kafka:
    “The measurement of the scatterness of the mass of figures in a series about an average is called a measure of variation or dispersion.”

From the above, it is clear that dispersion measures the variation of items. It indicates the extent to which the items vary from the central value. Dispersion is also known as the average of the second order, while mean, median, and mode are averages of the first order. Common measures of dispersion include:

  • Range

  • Mean Deviation

  • Quartile Deviation

  • Standard Deviation

Purpose and Significance of Measures of Dispersion

Measures of dispersion are needed for the following purposes:

  1. Reliability of the average: A small dispersion indicates a reliable average, whereas a large dispersion suggests the average may be unreliable.

  2. Control of variability: Helps in determining the nature and cause of variation.

  3. Comparison of distributions: Enables comparison of two or more series regarding their variability. High variation indicates less uniformity, while low variation indicates greater uniformity.

  4. Facilitating other statistical analyses: Dispersion is essential in correlation analysis, statistical quality control, regression analysis, etc.

Types of Measures of Dispersion

Measures of dispersion can be broadly classified into two types:

a) Absolute Measures of Dispersion

  • Range

  • Mean Deviation

  • Standard Deviation

  • Quartile Deviation

  • Lorenz Curve

b) Relative Measures of Dispersion

  • Coefficient of Range

  • Coefficient of Mean Deviation

  • Coefficient of Variation

  • Coefficient of Quartile Deviation

Difference Between Absolute and Relative Measures of Dispersion

Point

Absolute Measures

Relative Measures

1

Dependent on the unit of the variable

Unit-free

2

Not suitable for comparing two or more distributions

Suitable for comparing distributions

3

Easier to compute and understand

More difficult to compute and comprehend

Desirable Properties of a Good Measure of Dispersion

A good measure of dispersion should satisfy the following properties:

  1. Simple to understand and easy to compute.

  2. Based on all the items in the dataset.

  3. Not affected by extreme values.

  4. Rigidly defined.

  5. Capable of further algebraic treatment.

  6. Possess sampling stability.

Meaning of Range/ Merits and Demerits of Range


Range:  Range is defined as the difference between the value of the smallest item and the value of the largest item included in the distribution. It is the simplest method of measuring dispersion. Symbolically,

Range= Largest value (L) – Smallest Value (S)

The relative measure corresponding to range, called the coefficient of range, is obtained by applying the following formula: Coefficient of Range= (L- S)/ (L + S) 

Merits of Range:

(i) It is simple to understand and easy to calculate.

(ii) It is less time consuming.

Demerits of Range:

(i) It is not based on each and every item of the distribution.

(ii) It is very much affected by the extreme values.

(iii) The value of Range is affected more by sampling fluctuations 

(iv) Range cannot be computed in case of open-end distribution. 

Measures of Dispersion

Quartile Deviation (Q.D) / Semi Inter-Quartile Range

Meaning: The Quartile Deviation (Q.D) is half of the difference between the upper quartile (Q3) and the lower quartile (Q1).

QD = 1/2 (Q3 - Q1)

QD is an absolute measure of dispersion. Its corresponding relative measure, called the coefficient of QD, is calculated as:

Coefficient of QD = (Q3 - Q1) / (Q3 + Q1)

The coefficient of QD can be used to compare the degree of variation across different distributions.

Merits of QD:

  1. It is based on 50% of the observations.

  2. It is not affected by the presence of extreme values.

  3. It can be computed for open-end distributions.

Demerits of QD:

  1. It does not consider every item in the distribution.

  2. It is not capable of further algebraic treatment.

  3. The value of QD is more affected by sampling fluctuations.

Mean Deviation (M.D)

Meaning: For a given set of observations, Mean Deviation (M.D) is defined as the arithmetic mean of the absolute deviations of the observations from an appropriate measure of central tendency.

MD = Σ|D| / N

M.D is an absolute measure of dispersion. Its corresponding relative measure, the coefficient of MD, is obtained by dividing the mean deviation by the particular average used in its computation.

Coefficient of MD = MD / (Mean or Median)

Merits of M.D:

  1. Simple to understand and easy to compute.

  2. Based on every item in the dataset.

  3. Less affected by extreme values compared to standard deviation.

Demerits of M.D:

  1. The algebraic signs of deviations are ignored, which is a drawback.

  2. Not capable of further algebraic treatment.

  3. Much less popular compared to standard deviation.

MEANING OF STANDARD DEVIATION (S.D), Merits & Demerits

STANDARD DEVIATION:

Standard Deviation, represented by the symbol ‘σ’ (Sigma), is regarded as the most popular and widely applied measure of dispersion. It is defined as the square root of the second central moment of dispersion and is always calculated with reference to the arithmetic mean. In simple terms, Standard Deviation can be described as the root-mean-square of the deviations from the mean.

Merits of Standard Deviation (SD)

Standard Deviation is considered the most reliable and widely used measure of dispersion because of the following reasons:

  1. Based on all observations: SD uses every single value in the dataset and is strictly defined.

  2. Algebraic usability: It allows further mathematical treatment. For example, the combined standard deviation of multiple groups can be calculated.

  3. Sampling stability: Compared to other measures, SD is less influenced by random fluctuations in sampling.

  4. Comparative tool: The coefficient of variation, which is the most suitable measure for comparing variability across datasets, is derived from SD and the mean.

  5. Practical utility: SD plays a central role in advanced statistical analysis, hypothesis testing, and research applications.

Demerits of Standard Deviation (SD)

  1. Complex computation: The process of calculating SD is comparatively lengthy and not as easy to grasp as other measures.

  2. Influenced by extreme values: Greater importance is given to values far from the mean, while values close to the mean receive less emphasis.

Difference between Mean Deviation and Standard Deviation

Basis of Difference

Mean Deviation (MD)

Standard Deviation (SD)

1. Algebraic Signs

Ignore the signs of deviations (all are taken as positive).

Consider algebraic signs (squares of deviations are used).

2. Reference Point

Can be calculated from mean, median, or mode.

Always calculated from the arithmetic mean.

3. Simplicity

Easier to understand and compute.

More complex to calculate and understand.

4. Mathematical Use

Limited algebraic treatment.

Extensively used in advanced algebraic and statistical operations.

5. Stability in Sampling

More affected by sampling fluctuations.

Less affected by sampling fluctuations, hence more reliable.

6. Weightage to Values

Treats all deviations equally.

Assigns greater weight to extreme values due to squaring.

7. Practical Application

Rarely used in higher statistical analysis.

Widely used in research, probability, and inferential statistics.

8. Accuracy in Measuring Dispersion

Provides a rough idea of dispersion.

Provides the most accurate and scientific measure of dispersion.


VARIANCE AND COEFFICIENT OF VARIATION

Variance: The term variance was first used by Fisher in 1913 to describe the square of the standard deviation. Variance is particularly important in advanced analysis where the total variation can be split into parts attributable to different factors. It is calculated as the square of the standard deviation:

Variance = (S.D.)²

Coefficient of Variation (C.V.): The coefficient of variation is the ratio of the standard deviation to the mean, expressed as a percentage. It is also called the coefficient of variability and is defined as:

C.V. = (S.D. / Mean) × 100

Purpose of Coefficient of Variation:

  • Used to compare variability between two or more series.

  • A higher C.V. indicates greater variability (less uniform, less stable, or less consistent).

  • A lower C.V. indicates less variability (more uniform, more stable, or more consistent).

Difference Between Variance and Coefficient of Variation:

  • Variance measures the total variation around the mean.

  • Coefficient of Variation expresses variation as a percentage relative to the mean.

Lorenz Curve

The Lorenz Curve, devised by Max O. Lorenz, is a graphical method to study dispersion, initially used to measure income and wealth distribution. It is also used to study profit, wages, turnover, etc. The curve shows cumulative percentages of items against cumulative percentages of other variables (wealth, profits, turnover, etc.).

Procedure to Draw a Lorenz Curve:

  1. Cumulate the size of items and frequencies, then calculate percentages.

  2. Plot cumulative frequency percentages on the X-axis (0 to 100).

  3. Plot cumulative variable percentages on the Y-axis (0 to 100).

  4. Draw a diagonal line from 0 to 100 representing equal distribution.

  5. Plot the points for the distribution and join them to form the Lorenz curve. The curve always lies below the diagonal unless distribution is exactly equal. If multiple curves are plotted, the curve furthest from the diagonal shows greater inequality.

Relationship Between Mean and Standard Deviation

In a symmetrical distribution:

  • Mean ± 1 S.D. covers 68.27% of items.

  • Mean ± 2 S.D. covers 95.45% of items.

  • Mean ± 3 S.D. covers 99.73% of items.

Relationship Between Measures of Dispersion

In a normal distribution, the measures of dispersion are related as follows:

  • Quartile Deviation (QD) = 2/3 S.D. → S.D. = 3/2 QD

  • Mean Deviation (MD) = 4/5 S.D. → S.D. = 5/4 MD

Relationship Between Mean and Other Measures of Dispersion

  • Mean ± QD includes 50% of items.

  • Mean ± MD includes 57.31% of items.

  • Mean ± S.D. includes 68.27% of items (approximately 2/3 of items).

SKEWNESS, MOMENTS AND KURTOSIS

MEANING OF SKEWNESS

There are two other comparable characteristics called skewness and kurtosis that help us to understand a distribution. Two distributions may have the same mean and standard deviation but may differ widely in their overall appearance as can be seen from the following:

 

In both these distributions the value of mean and standard deviation is the same (Mean = SD = 5). But it does not imply that the distributions are alike in nature. The distribution on the left-hand side is a symmetrical one whereas the distribution on the right-hand side is asymmetrical or skewed. Measures of skewness help us to distinguish between different types of distributions. Some definitions of skewness are as follows: 

1)    “When a series is not symmetrical it is said to be asymmetrical or skewed.” – Croxton & Cowden. 

2)    “Skewness refers to the asymmetry or lack of symmetry in the shape of a frequency distribution.” – Morris Hamburg. 

The analysis of above definitions shows that the term ‘SKEWNESS’ refers to lack of symmetry, i.e., when a distribution is not symmetrical (or is asymmetrical) it is called a skewed distribution. Any measure of skewness indicates the difference between the manners in which items are distributed in a particular distribution compared with a symmetrical (or normal) distribution. If, for example, skewness is positive, the frequencies in the distribution are spread out over a greater range of values on the high-value end of the curve (the right-hand side) than they are on the low value end. If the curve is normal spread will be the same on both sides of the centre point and the mean, median and mode will all have the same value. The concept of skewness gains importance from the fact that statistical theory is often based upon the assumption of the normal distribution. A measure of skewness is, therefore, necessary in order to guard against the consequences of this assumption.

Difference between Dispersion and Skewness:

Dispersion is concerned with the amount of variation rather than with its direction. Skewness tells us about the direction of the variation or the departure from Symmetry. In fact, measures of skewness are dependent upon the amount of dispersion. 

It may be noted that although skewness is an important characteristic for defining the precise pattern of a distribution, it is rarely calculated in business and economic series. Variation is by far the most important characteristic of a distribution. 

Requisites of a Good Measure of Skewness

A good measure of skewness should have three properties. It should:

1)    Be a pure number in the sense that its value should be independent of the units of the series and also of the degree of variation in the series. 

2)    Have a zero value, when the distribution is symmetrical and 

3)    Have some meaningful scale of measure so that we could easily interpret the measured value. 

MEASURES OF SKEWNESS

Measures of skewness tell us the direction and extent of asymmetry in a series, and permit us to compare two or more series with regard to these. They may either be absolute or relative. 

Absolute Measures of Skewness:

Skewness can be measured in absolute terms by taking the difference between mean and mode. Symbolically:

Absolute Skewness = Mean - Mode

If the value of mean is greater than mode skewness will be positive, i.e., we shall get a plus sign in the above formula. Conversely, if the value of mode is greater than mean, we shall get a minus sign meaning thereby that the distribution is negatively skewed. 

Relative Measures of Skewness:

There are four important measures of relative skewness, namely, 

1.    The Karl Pearson’s coefficient of skewness. 

2.    The Bowley’s coefficient of skewness. 

3.    The Kelly’s coefficient of skewness. 

4.    Measure of skewness based on moments. 

KARL PEARSON’S COEFFICIENT OF SKEWNESS

This method of measuring skewness, also known as Pearson's Coefficient of Skewness, was suggested by Karl Pearson (1857–1936), a British biometrician and statistician. It is based on the difference between the mean and the mode. This difference is divided by the standard deviation to give a relative measure. The formula is:

SKP = (Mean – Mode) / Standard Deviation

Here, SKP represents Karl Pearson’s Coefficient of Skewness.

  • There is no theoretical limit to this measure, which is a slight drawback.

  • In practice, the value rarely exceeds +1.

Bowley’s Coefficient of Skewness

An alternative measure, proposed by Professor Bowley, is based on quartiles. In a symmetrical distribution, the third quartile (Q3) is the same distance above the median as the first quartile (Q1) is below it:

Q3 – Median = Median – Q1 or Q3 + Q1 – 2Median = 0

  • In a positively skewed distribution, the top 25% of values are farther from the median than the bottom 25%, i.e., Q3 is farther from the median than Q1.

  • For negative skewness, the reverse is true.

Bowley’s coefficient of skewness is calculated as:

SKB = (Q3 + Q1 – 2Median) / (Q3 – Q1)

Notes:

  • Results from these two measures should not be compared, as their numerical values are not directly related.

  • Bowley’s measure is limited to values between –1 and +1, while Pearson’s measure has no such limits.

Moments and Kurtosis

Moments:- The term moment originates from mechanics and refers to the measure of a force's tendency to cause rotation. The strength of this tendency depends on both the magnitude of the force and its distance from the point of origin where it is applied.

In statistics, the term moment is used differently. It describes the characteristics of a frequency distribution, such as central tendency, variation, skewness, and kurtosis. Interestingly, the formula for a moment coefficient is identical to that of an arithmetic mean, which is why the arithmetic mean is often referred to as the “first moment about the origin.”

Purpose of Moments

Moments are highly significant in statistical analysis as they help measure:

  • Central tendency of a dataset

  • Variability of observations

  • Asymmetry of the distribution

  • Peakedness of the frequency curve

Calculating the first four moments about the mean is often the first step in analyzing a frequency distribution.

Moment

What It Measures

First moment about origin

Mean

Second moment about mean

Variance

Third moment about mean

Skewness

Fourth moment about mean

Kurtosis

Kurtosis

The word Kurtosis comes from Greek, meaning “bulginess.” In statistics, kurtosis measures the degree of flatness or peakedness around the mode of a frequency curve. It indicates whether a distribution is more peaked or flatter than the normal curve:

  • Leptokurtic: More peaked than a normal curve.

  • Platykurtic: Less peaked than a normal curve.

  • Mesokurtic: Normal curve.

Kurtosis is rarely used in statistical analysis but provides insight into the shape of the distribution around its mode.

-000000-

About the author

Team Treasure Notes
We're here to make learning easier for you! If you have any questions or need clarification, feel free to drop a comment we’d love to help!

Post a Comment