Provides powers-of-ten formatting for a numerical vector or data frame column when creating documents in rmarkdown or quarto markdown.
Author
Richard Layton
Published
2022–11–21
Summary
Convert the elements of a numerical vector or data frame column to character strings in which the numbers are formatted using powers-of-ten notation in scientific or engineering form and delimited for rendering as inline equations in an rmarkdown document.
Initial release of the formatdown R package providing tools for formatting output in rmarkdown or quarto markdown documents.
This first version has one function only, format_power(), for converting numbers to character strings formatted in powers-of-ten notation and delimited in $...$ for rendering as inline equations in .Rmd or .qmd output documents. Provides two powers-of-ten formatting options—scientific notation and engineering notation—with an option to omit powers-of-ten notation for a specified range of exponents.
To illustrate the different formats, I show in Table 1 the same number rendered using different formats, all with 4 significant digits.
The R code for the post is listed under the “R code” pointers. In the examples, I use data.table syntax for data manipulation, though the code can be translated into base R or dplyr syntax if desired.
Table 1: Rendering a number using different formats
Background
My first attempt to provide powers-of-ten formatting was in my 2016 package, docxtools. That implementation has several shortcomings.
I wrote its formatting function to accept a data frame as input, which entailed a lot of programming overhead to separate numerical from non-numerical variable classes and to reassemble them after the numerical columns were formatted. This could have been simplified with judicious use of lapply(), with which I was not sufficiently experienced at the time. I also failed to take advantage of formatC() in constructing the output.
With formatdown, my goal is to provide similar functionality but with more concise code, greater flexibility, and a more balanced approach to package dependencies.
Improvements
The primary design change is that the format_power() function operates on a numerical vector instead of a data frame. The benefits of this change are: 1) simpler code that should be easier to revise and maintain; 2) scalar values can be formatted for rendering inline; and 3) data frames can still be formatted, by column, using lapply().
To illustrate formatting a scalar value inline, the markup for Avogadro’s number (x = 6.0221E+23) in engineering format is given by,
$N_A =$ `r format_power(x, digits = 5, format = "engr")`
which is rendered (in this output document) as \(N_A =\)\(602.21\times{10}^{21}\).
The second improvement is the addition of an option for scientific notation. For example, the markup for Avogadro’s number in scientific notation is given by,
$N_A =$ `r format_power(x, digits = 5, format = "sci")`
which renders as \(N_A =\)\(6.0221\times{10}^{23}\).
The third improvement is the addition of an option for omitting powers-of-ten notation over a range of exponents. For example, the markup for x = 1.23E-4 in decimal notation is given by,
A final (internal) improvement is a more balanced approach to package dependencies. With a tighter focus on what formatdown is to accomplish compared to docxtools, I have reduced the dependencies to checkmate, wrapr, and data.table.
The package vignette illustrates package usage in detail.
However, having successfully submitted the package to CRAN, I started working on this post and immediately (!) uncovered an issue that had not appeared while working on the package vignettes.
Delimiter issue
I wrote the package vignette using the rmarkdown::html_vignette output style per usual. All the formatted output rendered as expected in that document. I write this blog using quarto. As seen in the examples above, inline math is rendered as expected.
The issue arises when using knitr::kable() and kableExtra::kbl() to display data tables in this blog post. To illustrate, consider this data frame, included with formatdown (ideal gas properties of air at room temperature).
R code
density
date trial humidity T_K p_Pa R density
<Date> <char> <fctr> <num> <num> <int> <num>
1: 2018-06-12 a low 294.05 101100 287 1.197976
2: 2018-06-13 b high 294.15 101000 287 1.196384
3: 2018-06-14 c medium 294.65 101100 287 1.195536
4: 2018-06-15 d low 293.35 101000 287 1.199647
5: 2018-06-16 e high 293.85 101100 287 1.198791
Formatting the pressure column, the markup looks OK.
date trial humidity T_K p_Pa R density
<Date> <char> <fctr> <num> <char> <int> <num>
1: 2018-06-12 a low 294.05 $101.1\\times{10}^{3}$ 287 1.197976
2: 2018-06-13 b high 294.15 $101.0\\times{10}^{3}$ 287 1.196384
3: 2018-06-14 c medium 294.65 $101.1\\times{10}^{3}$ 287 1.195536
4: 2018-06-15 d low 293.35 $101.0\\times{10}^{3}$ 287 1.199647
5: 2018-06-16 e high 293.85 $101.1\\times{10}^{3}$ 287 1.198791
knitr::kable() yields the expected output with pressure formatted in engineering notation.
R code
knitr::kable(DT, align ="r")
date
trial
humidity
T_K
p_Pa
R
density
2018-06-12
a
low
294.05
\(101.1\times{10}^{3}\)
287
1.197976
2018-06-13
b
high
294.15
\(101.0\times{10}^{3}\)
287
1.196384
2018-06-14
c
medium
294.65
\(101.1\times{10}^{3}\)
287
1.195536
2018-06-15
d
low
293.35
\(101.0\times{10}^{3}\)
287
1.199647
2018-06-16
e
high
293.85
\(101.1\times{10}^{3}\)
287
1.198791
Problem
kableExtra::kbl() does not render the math markup as expected.
R code
kableExtra::kbl(DT, align ="r")
date
trial
humidity
T_K
p_Pa
R
density
2018-06-12
a
low
294.05
$101.1\times{10}^{3}$
287
1.197976
2018-06-13
b
high
294.15
$101.0\times{10}^{3}$
287
1.196384
2018-06-14
c
medium
294.65
$101.1\times{10}^{3}$
287
1.195536
2018-06-15
d
low
293.35
$101.0\times{10}^{3}$
287
1.199647
2018-06-16
e
high
293.85
$101.1\times{10}^{3}$
287
1.198791
In fact, having loaded kableExtra above, knitr::kable() now fails in the same way.
R code
knitr::kable(DT, align ="r")
date
trial
humidity
T_K
p_Pa
R
density
2018-06-12
a
low
294.05
$101.1\times{10}^{3}$
287
1.197976
2018-06-13
b
high
294.15
$101.0\times{10}^{3}$
287
1.196384
2018-06-14
c
medium
294.65
$101.1\times{10}^{3}$
287
1.195536
2018-06-15
d
low
293.35
$101.0\times{10}^{3}$
287
1.199647
2018-06-16
e
high
293.85
$101.1\times{10}^{3}$
287
1.198791
Solution
I found a suggestion from MathJax to replace the $ ... $ delimiters with \\( ... \\). I wrote a short function (below) to do that.
R code
# Substitute math delimiterssub_delim <-function(x) { x <-sub("\\$", "\\\\(", x) # first $ x <-sub("\\$", "\\\\)", x) # second $}DT$p_Pa <-sub_delim(DT$p_Pa)DT
date trial humidity T_K p_Pa R density
<Date> <char> <fctr> <num> <char> <int> <num>
1: 2018-06-12 a low 294.05 \\(101.1\\times{10}^{3}\\) 287 1.197976
2: 2018-06-13 b high 294.15 \\(101.0\\times{10}^{3}\\) 287 1.196384
3: 2018-06-14 c medium 294.65 \\(101.1\\times{10}^{3}\\) 287 1.195536
4: 2018-06-15 d low 293.35 \\(101.0\\times{10}^{3}\\) 287 1.199647
5: 2018-06-16 e high 293.85 \\(101.1\\times{10}^{3}\\) 287 1.198791
knitr::kable() yields the expected output.
R code
knitr::kable(DT, align ="c")
date
trial
humidity
T_K
p_Pa
R
density
2018-06-12
a
low
294.05
\(101.1\times{10}^{3}\)
287
1.197976
2018-06-13
b
high
294.15
\(101.0\times{10}^{3}\)
287
1.196384
2018-06-14
c
medium
294.65
\(101.1\times{10}^{3}\)
287
1.195536
2018-06-15
d
low
293.35
\(101.0\times{10}^{3}\)
287
1.199647
2018-06-16
e
high
293.85
\(101.1\times{10}^{3}\)
287
1.198791
kableExtra::kbl() yields the expected output.
R code
kableExtra::kbl(DT, align ="c")
date
trial
humidity
T_K
p_Pa
R
density
2018-06-12
a
low
294.05
\(101.1\times{10}^{3}\)
287
1.197976
2018-06-13
b
high
294.15
\(101.0\times{10}^{3}\)
287
1.196384
2018-06-14
c
medium
294.65
\(101.1\times{10}^{3}\)
287
1.195536
2018-06-15
d
low
293.35
\(101.0\times{10}^{3}\)
287
1.199647
2018-06-16
e
high
293.85
\(101.1\times{10}^{3}\)
287
1.198791
I can use the features from kableExtra to print a pretty table.
To address this issue, the next version of format_power() will include a new delim argument,
format_power(x, digits, format, omit_power, delim)
that allows a user to set the math delimiters to $ ... $ or \\( ... \\) or even custom left and right markup to suit their environment.
Fixed exponents
Preparing this post, I adapted a table of water properties from the hydraulics package to use as an example and discovered another, more subtle issue. First, I’ll construct the data frame.
The viscosity column displays three values using \(10^{-6}\) and two using \(10^{-9}\). Visually comparing the values in a column is easier if the powers of ten are identical. The table below illustrates the desired result, created by manually editing the two viscosity values.
This revision satisfies two conventions of tabulating empirical engineering information.
Units. With all the reported values reported to the same power-of-ten, the units can all be interpreted in the same way. In this case for example, the units of the viscosity coefficients (1.73, 1.31, etc.) are all micro-Pascal-seconds (\(\mu\)Pa-s).
Uncertainty. In rewriting the two viscosity values, I changed from three significant digits to two decimal places, consistent with the assumption that empirical information is reported to the same level of uncertainty unless noted otherwise.
Potential revision
Add the water data to formatdown and the following functionality to format_power().
A new argument (perhaps fixed_power) that automatically selects a fixed exponent for a numerical vector or permits the user to directly assign a fixed exponent.
format_power(x, digits, format, omit_power, delim, fixed_power)
In conjunction with the fixed power-of-ten, I would also round all numbers in a column to the same number of decimal places to address the uncertainty assumption. This could be a separate argument.
Units
And now for something completely different!
Thinking about measurement units, I looked for relevant R packages and found units. With appropriate units, powers-of-ten notation can be practically eliminated. For example, a pressure reading of \(2.02\times{10}^{9}\) Pa can be reported as \(2.02\) GPa.
With tools from the units package, I can define a symbol uP to represent micropoise (a non-SI viscosity unit equal to 10\(^{-7}\) Pa-s). And I can write a short function to convert the numbers from basic units to displayed units, for example, converting Pa to GPa (gigapascal) or Pa-s to \(\mu\)P (micropoise).
R code
library("units")# Define the uP unitsinstall_unit("uP", "micropoise", "micropoise")# Function to assign and convert units assign_units <-function(x, base_unit, display_unit) {# convert x to "Units" class in base unitsunits(x) <- base_unit# convert from basic to display unitsunits(x) <-as_units(display_unit)# return value x}
Convert each column and output the results.
R code
# Apply to one variable at a timeDT <-copy(water)DT$temperature <-assign_units(DT$temperature, "K", "degree_C")DT$density <-assign_units(DT$density, "kg/m^3", "kg/m^3")DT$specific_weight <-assign_units(DT$specific_weight, "N/m^3", "kN/m^3")DT$viscosity <-assign_units(DT$viscosity, "Pa*s", "uP")DT$bulk_modulus <-assign_units(DT$bulk_modulus, "Pa", "GPa")# OutputDT |>kbl(align ="r") |>kable_paper(lightable_options ="basic", full_width =TRUE) |>row_spec(0, background ="#c7eae5") |>column_spec(1:5, color ="black", background ="white")
temperature
density
specific_weight
viscosity
bulk_modulus
0 [°C]
1000 [kg/m^3]
9.809 [kN/m^3]
17.30 [uP]
2.02 [GPa]
10 [°C]
1000 [kg/m^3]
9.807 [kN/m^3]
13.10 [uP]
2.10 [GPa]
20 [°C]
998 [kg/m^3]
9.793 [kN/m^3]
10.20 [uP]
2.18 [GPa]
30 [°C]
996 [kg/m^3]
9.768 [kN/m^3]
8.17 [uP]
2.25 [GPa]
40 [°C]
992 [kg/m^3]
9.734 [kN/m^3]
6.70 [uP]
2.28 [GPa]
The entries in the data frame are still numeric but are of the “Units” class, enabling math operations among values with compatible units. See the units website for details.
R code
str(DT)
Classes 'data.table' and 'data.frame': 5 obs. of 5 variables:
$ temperature : Units: [°C] num 0 10 20 30 40
$ density : Units: [kg/m^3] num 1000 1000 998 996 992
$ specific_weight: Units: [kN/m^3] num 9.81 9.81 9.79 9.77 9.73
$ viscosity : Units: [uP] num 17.3 13.1 10.2 8.17 6.7
$ bulk_modulus : Units: [GPa] num 2.02 2.1 2.18 2.25 2.28
- attr(*, ".internal.selfref")=<externalptr>
If I were to refine this table further, I would report the numerical values without labels in each cell, moving the unit labels to a sub-header row. Possible future work.
Potential revision
Incorporate tools from the units package to create a new function (perhaps format_units()) that would convert basic units to display units that can substitute for powers-of-ten notation.
Closing
The new formatdown package formats numbers in powers-of-ten notation for inline math markup. A new argument is already in the works for managing the math delimiters. Potential new features include a fixed power-of-tens option as well as replacing powers-of-ten notation with deliberate manipulation of physical units.