4  Ordered and unordered factors – R Manuals :: An Introduction to R (2024)

A factor is a vector object used to specify a discrete classification (grouping) of the components of other vectors of the same length. R provides both ordered and unordered factors. While the “real” application of factors is with model formulae (see Contrasts), we here look at a specific example.

4.1 A specific example

Suppose, for example, we have a sample of 30 tax accountants from all the states and territories of Australia1 and their individual state of origin is specified by a character vector of state mnemonics as

1Readers should note that there are eight states and territories in Australia, namely the Australian Capital Territory, New South Wales, the Northern Territory, Queensland, South Australia, Tasmania, Victoria and Western Australia.

> state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act")

Notice that in the case of a character vector, “sorted” means sorted in alphabetical order.

A factor is similarly created using the factor() function:

The print() function handles factors slightly differently from other objects:

> statef [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa[16] tas sa nt wa vic qld nsw nsw wa sa act nsw vic vic actLevels: act nsw nt qld sa tas vic wa

To find out the levels of a factor the function levels() can be used.

> levels(statef)[1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa"

4.2 The function tapply() and ragged arrays

To continue the previous example, suppose we have the incomes of the same tax accountants in another vector (in suitably large units of money)

> incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, 59, 46, 58, 43)

To calculate the sample mean income for each state we can now use the special function tapply():

> incmeans <- tapply(incomes, statef, mean)

giving a means vector with the components labelled by the levels

 act nsw nt qld sa tas vic wa44.500 57.333 55.500 53.600 55.000 60.500 56.000 52.250

The function tapply() is used to apply a function, here mean(), to each group of components of the first argument, here incomes, defined by the levels of the second component, here statef2, as if they were separate vector structures. The result is a structure of the same length as the levels attribute of the factor containing the results. The reader should consult the help document for more details.

2Note that tapply() also works in this case when its second argument is not a factor, e.g., tapply(incomes, state), and this is true for quite a few other functions, since arguments are coerced to factors when necessary (using as.factor()).

Suppose further we needed to calculate the standard errors of the state income means. To do this we need to write an R function to calculate the standard error for any given vector. Since there is an builtin function var() to calculate the sample variance, such a function is a very simple one liner, specified by the assignment:

> stdError <- function(x) sqrt(var(x)/length(x))

(Writing functions will be considered later in Writing your own functions. Note that R’s a builtin function sd() is something different.) After this assignment, the standard errors are calculated by

> incster <- tapply(incomes, statef, stdError)

and the values calculated are then

> incsteract nsw nt qld sa tas vic wa1.5 4.3102 4.5 4.1061 2.7386 0.5 5.244 2.6575

As an exercise you may care to find the usual 95% confidence limits for the state mean incomes. To do this you could use tapply() once more with the length() function to find the sample sizes, and the qt() function to find the percentage points of the appropriate t-distributions. (You could also investigate R’s facilities for t-tests.)

The function tapply() can also be used to handle more complicated indexing of a vector by multiple categories. For example, we might wish to split the tax accountants by both state and sex. However in this simple instance (just one factor) what happens can be thought of as follows. The values in the vector are collected into groups corresponding to the distinct entries in the factor. The function is then applied to each of these groups individually. The value is a vector of function results, labelled by the levels attribute of the factor.

The combination of a vector and a labelling factor is an example of what is sometimes called a ragged array, since the subclass sizes are possibly irregular. When the subclass sizes are all the same the indexing may be done implicitly and much more efficiently, as we see in the next section.

4.3 Ordered factors

The levels of factors are stored in alphabetical order, or in the order they were specified to factor if they were specified explicitly.

Sometimes the levels will have a natural ordering that we want to record and want our statistical analysis to make use of. The ordered() function creates such ordered factors but is otherwise identical to factor. For most purposes the only difference between ordered and unordered factors is that the former are printed showing the ordering of the levels, but the contrasts generated for them in fitting linear models are different.

Footnotes

4  Ordered and unordered factors – R Manuals :: An Introduction to R (2024)

FAQs

What is the difference between ordered and unordered factors in R? ›

The ordered() function creates such ordered factors but is otherwise identical to factor . For most purposes the only difference between ordered and unordered factors is that the former are printed showing the ordering of the levels, but the contrasts generated for them in fitting linear models are different.

What is the ordered factor in R? ›

We can define power as an ordered factor in R using the ordered() function. We do that below and save the ordered factor version as powerF . Notice that calling head() to view the first 6 values of powerF shows us the ordering of the levels: 160 < 180 < 200 < 220 .

How to check levels of factors in R? ›

Get the Number of Levels of a Factor in R Programming – nlevels() Function. nlevels() function in R Language is used to get the number of levels of a factor.

How to order a factor vector in R? ›

To create an ordered factor, you have to add two additional arguments: ordered and levels . factor(some_vector, ordered = TRUE, levels = c("lev1", "lev2" ...)) By setting the argument ordered to TRUE in the function factor() , you indicate that the factor is ordered.

What is the difference between ordered and unordered list answer? ›

An unordered list ( <ul> ) is used to create a list of items in no particular order i.e. the order of items is not relevant. By default, the items in this list will be marked with bullets. Whereas, an ordered list ( <ol> ) is used to create a list of items in a specific order.

What does ordered and unordered mean? ›

Ordered lists, which have an inherent order and each item is numbered. Unordered lists, which have no inherent order and each item is bulleted. Description lists, which contain a list of terms and descriptions for each term.

What is an example of a factor in R? ›

Factors are used to categorize data. Examples of factors are: Demography: Male/Female. Music: Rock, Pop, Classic, Jazz.

What do R factors mean? ›

R factor. noun. ˈär- : a group of genes present in some bacteria that provide a basis for resistance to antibiotics and can be transferred from cell to cell by conjugation.

What is the formula for R factor? ›

R -factor is a formula for estimating errors in a data set. It is usually the sum of the absolute difference between observed (Fo) and calculated (Fc) over the sum of the observed: (3.2) R crystallographic = ∑ | F o - F c | ∑ | F o | .

How do you specify a factor in R? ›

The command used to create or modify a factor in R language is – factor() with a vector as input. The two steps to creating an R factor : Creating a vector. Converting the vector created into a factor using function factor()

How do you find the factors of a number in R? ›

Write a R program to find the factors of a given number.
  1. print_factors = function(n){
  2. print(paste("The factors of", n,"are : "))
  3. for(i in 1 : n){
  4. if(n %% i == 0){
  5. print(i)
  6. print_factors(10)
Jan 5, 2023

How do I access factors in R? ›

A factor in R is used to categorize data. The corresponding values in a factor are referred to as the items of the factor. To access these factor items, we refer to their respective index number or position in the factor using square brackets [] .

What is an ordered factor R? ›

Ordered factors levels are an extension of factors. It arranges the levels in increasing order. We use two functions: factor() and argument ordered(). Syntax: factor(data, levels =c(“”), ordered =TRUE)

What is order in R? ›

Overview. The order() function in R is used to return a permutation that simply orders or rearranges a sequence of numeric, complex, character, or logical vectors in ascending or descending order by their index positions.

How to order a list in R? ›

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

What is the difference between ordered and unordered categorical variables? ›

Remember that there are two types of categorical variables? On the one hand there's the nominal categorical variable, which does not have an implied order. The ordinal categorical variable, on the other hand, does have a natural ordering.

What is the difference between ordered & unordered set? ›

A set is an ordered sequence of unique keys, whereas an unordered_set is a set in which unique keys can be stored without any order, so unordered. The time complexity for set operations is O(log(n)), while for an unordered_set, it is O(1).

What is the difference between ordered and unordered combinations? ›

Ordered arrangements are called permutations. Unordered arrangements are called combinations.

What is the difference between ordered and unordered subsets? ›

In mathematics, an unordered pair or pair set is a set of the form {a, b}, i.e. a set having two elements a and b with no particular relation between them , where {a, b} = {b, a}. In contrast, an ordered pair (a, b) has a as its first element and b as its second element, which means (a, b) ≠ (b, a).

Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated:

Views: 5880

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.