1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
|
.. _pandas:
Pandas support
==============
It is convenient to use the `Pandas package`_ when dealing with numerical data, so Pint provides `PintArray`. A `PintArray` is a `Pandas Extension Array`_, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series.
For this to work, we rely on `Pandas Extension Types`_ which are still experimental. As a result, we are currently pinned to a specific commit, with version id ``0.24.0.dev0+625.gbdb7a16``, of Pandas.
Basic example
-------------
This example gives you the basics, but is slightly fiddly as you are not reading from a file. A more normal use case is given in `Reading a csv`_.
To use Pint with Pandas, as stated above, firstly ensure that you have the latest version of Pandas installed. Then import the relevant packages and create an instance of a Pint Quantity:
.. doctest::
>>> import pandas as pd
>>> import numpy as np
>>> import pint
>>> from pint.pandas_interface import PintArray
>>> ureg = pint.UnitRegistry()
>>> Q_ = ureg.Quantity
.. testsetup:: *
import pandas as pd
import numpy as np
import pint
from pint.pandas_interface import PintArray
ureg = pint.UnitRegistry()
Q_ = ureg.Quantity
Next, we can create a DataFrame with PintArray's as columns
.. doctest::
>>> torque = PintArray(Q_([1, 2, 2, 3], "lbf ft"))
>>> angular_velocity = PintArray(Q_([1000, 2000, 2000, 3000], "rpm"))
>>> df = pd.DataFrame({"torque": torque, "angular_velocity": angular_velocity})
>>> print(df)
torque angular_velocity
0 1 1000
1 2 2000
2 2 2000
3 3 3000
Operations with columns are units aware so behave as we would intuitively expect
.. doctest::
>>> df['power'] = df['torque'] * df['angular_velocity']
>>> print(df)
torque angular_velocity power
0 1 1000 1000
1 2 2000 4000
2 2 2000 4000
3 3 3000 9000
Each column can be accessed as a Pandas Series
.. doctest::
>>> print(df.power)
0 1000
1 4000
2 4000
3 9000
Name: power, dtype: pint
Which contains a PintArray
.. doctest::
>>> print(df.power.values)
PintArray([1000 foot * force_pound * revolutions_per_minute,
4000 foot * force_pound * revolutions_per_minute,
4000 foot * force_pound * revolutions_per_minute,
9000 foot * force_pound * revolutions_per_minute],
dtype='pint')
Which contains a Quantity
.. doctest::
>>> print(df.power.values.data)
[1000 4000 4000 9000] foot * force_pound * revolutions_per_minute
Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible.
.. doctest::
>>> print(df.power.pint.dimensionality)
[length] ** 2 * [mass] / [time] ** 3
>>> print(df.power.pint.to("kW"))
0 0.14198092353610375
1 0.567923694144415
2 0.567923694144415
3 1.2778283118249338
Name: power, dtype: pint
Standard pint conversions can still be performed on the underlying quantity, and will still return a quantity.
.. doctest::
>>> print(df.power.values.data.to("kW"))
[0.14198092 0.56792369 0.56792369 1.27782831] kilowatt
Reading a csv
-------------
Thanks to the DataFrame accessors, reading from files with unit information becomes trivial. The DataFrame accessors make it easy to get to PintArrays.
Setup
~~~~~
Here we create the DateFrame and save it to file, next we will show you how to load and read it.
We start with an DateFrame with column headers only.
.. doctest::
>>> speed = [1000, 1100, 1200, 1200]
>>> mech_power = [np.nan, np.nan, np.nan, np.nan]
>>> torque = [10, 10, 10, 10]
>>> rail_pressure = [1000, 1000000000000, 1000, 1000]
>>> fuel_flow_rate = [10, 10, 10, 10]
>>> fluid_power = [np.nan, np.nan, np.nan, np.nan]
>>> df_init = pd.DataFrame({"speed": speed, "mech power": mech_power, "torque": torque, "rail pressure": rail_pressure, "fuel flow rate": fuel_flow_rate, "fluid power": fluid_power,})
>>> print(df_init)
speed mech power torque rail pressure fuel flow rate fluid power
0 1000 NaN 10 1000 10 NaN
1 1100 NaN 10 1000000000000 10 NaN
2 1200 NaN 10 1000 10 NaN
3 1200 NaN 10 1000 10 NaN
Then we add a column header which contains units information
.. doctest::
>>> units = ["rpm", "kW", "N m", "bar", "l/min", "kW"]
>>> df_to_save = df_init.copy()
>>> df_to_save.columns = pd.MultiIndex.from_arrays([df_init.columns, units])
>>> print(df_to_save)
speed mech power torque rail pressure fuel flow rate fluid power
rpm kW N m bar l/min kW
0 1000 NaN 10 1000 10 NaN
1 1100 NaN 10 1000000000000 10 NaN
2 1200 NaN 10 1000 10 NaN
3 1200 NaN 10 1000 10 NaN
Now we save this to disk as a csv to give us our starting point.
.. doctest::
>>> test_csv_name = "pandas_test.csv"
>>> df_to_save.to_csv(test_csv_name, index=False)
Now we are in a position to read the csv we just saved. Let's start by reading the file with units as a level in a multiindex column.
.. doctest::
>>> df = pd.read_csv(test_csv_name, header=[0,1])
>>> print(df)
speed mech power torque rail pressure fuel flow rate fluid power
rpm kW N m bar l/min kW
0 1000 NaN 10 1000 10 NaN
1 1100 NaN 10 1000000000000 10 NaN
2 1200 NaN 10 1000 10 NaN
3 1200 NaN 10 1000 10 NaN
Then use the DataFrame's `pint.quantify` method to convert the columns from `np.ndarray`s to PintArrays, with units from the bottom column level.
.. doctest::
>>> df_ = df.pint.quantify(ureg, level=-1)
>>> print(df_)
speed mech power torque rail pressure fuel flow rate fluid power
0 1000.0 nan 10.0 1000.0 10.0 nan
1 1100.0 nan 10.0 1000000000000.0 10.0 nan
2 1200.0 nan 10.0 1000.0 10.0 nan
3 1200.0 nan 10.0 1000.0 10.0 nan
As previously, operations between DataFrame columns are unit aware
.. doctest::
>>> df_['mech power'] = df_.speed*df_.torque
>>> df_['fluid power'] = df_['fuel flow rate'] * df_['rail pressure']
>>> print(df_)
speed mech power torque rail pressure fuel flow rate fluid power
0 1000.0 10000.0 10.0 1000.0 10.0 10000.0
1 1100.0 11000.0 10.0 1000000000000.0 10.0 10000000000000.0
2 1200.0 12000.0 10.0 1000.0 10.0 10000.0
3 1200.0 12000.0 10.0 1000.0 10.0 10000.0
The DataFrame's `pint.dequantify` method then allows us to retrieve the units information as a header row once again
.. doctest::
>>> print(df_.pint.dequantify())
speed mech power \
revolutions_per_minute meter * newton * revolutions_per_minute
0 1000.0 10000.0
1 1100.0 11000.0
2 1200.0 12000.0
3 1200.0 12000.0
torque rail pressure fuel flow rate fluid power
meter * newton bar liter / minute bar * liter / minute
0 10.0 1.000000e+03 10.0 1.000000e+04
1 10.0 1.000000e+12 10.0 1.000000e+13
2 10.0 1.000000e+03 10.0 1.000000e+04
3 10.0 1.000000e+03 10.0 1.000000e+04
This allows for some rather powerful abilities. For example, to change single column units
.. doctest::
>>> df_['fluid power'] = df_['fluid power'].pint.to("kW")
>>> df_['mech power'] = df_['mech power'].pint.to("kW")
>>> print(df_.pint.dequantify())
speed mech power torque rail pressure \
revolutions_per_minute kilowatt meter * newton bar
0 1000.0 1.047198 10.0 1.000000e+03
1 1100.0 1.151917 10.0 1.000000e+12
2 1200.0 1.256637 10.0 1.000000e+03
3 1200.0 1.256637 10.0 1.000000e+03
fuel flow rate fluid power
liter / minute kilowatt
0 10.0 1.666667e+01
1 10.0 1.666667e+10
2 10.0 1.666667e+01
3 10.0 1.666667e+01
or the entire table's units
.. doctest::
>>> print(df_.pint.to_base_units().pint.dequantify())
speed mech power \
radian / second kilogram * meter ** 2 / second ** 3
0 104.719755 1047.197551
1 115.191731 1151.917306
2 125.663706 1256.637061
3 125.663706 1256.637061
torque rail pressure \
kilogram * meter ** 2 / second ** 2 kilogram / meter / second ** 2
0 10.0 1.000000e+08
1 10.0 1.000000e+17
2 10.0 1.000000e+08
3 10.0 1.000000e+08
fuel flow rate fluid power
meter ** 3 / second kilogram * meter ** 2 / second ** 3
0 0.000167 1.666667e+04
1 0.000167 1.666667e+13
2 0.000167 1.666667e+04
3 0.000167 1.666667e+04
Comments
--------
What follows is a short discussion about Pint's `PintArray` Object.
It is first useful to distinguish between three different things:
1. A scalar value
.. doctest::
>>> print(Q_(123,"m"))
123 meter
2. A scalar value
.. doctest::
>>> print(Q_([1, 2, 3], "m"))
[1 2 3] meter
3. A scalar value
.. doctest::
>>> print(Q_([[1, 2], [3, 4]], "m"))
[[1 2] [3 4]] meter
The first, a single scalar value is not intended to be stored in the PintArray as it's not an array, and should raise an error (TODO). The scalar Quantity is the scalar form of the PintArray, and is returned when performing operations that use `get_item`, eg indexing. A PintArray can be created from a list of scalar Quantitys using `PintArray._from_sequence`.
The second, a 1d array or list, is intended to be stored in the PintArray, and is stored in the PintArray.data attribute.
The third, 2d+ arrays or lists, are beyond the capabilities of ExtensionArrays which are limited to 1d arrays, so cannot be stored in the array, and should raise an error (TODO).
Most operations on the PintArray act on the Quantity stored in `PintArray.data`, so will behave similiarly to operations on a Quantity, with some caveats:
1. An operation that would return a 1d Quantity will return a PintArray containing the Quantity. This allows pandas to assign the result to a Series.
2. Arithemetic and comparative operations are limited to scalars and sequences of the same length as the stored Quantity. This ensures results are the same length as the stored Quantity, so can be added to the same DataFrame.
.. _`Pandas package`: https://pandas.pydata.org/pandas-docs/stable/index.html
.. _`Pandas Dataframes`: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
.. _`Pandas Extension Array`: https://pandas.pydata.org/pandas-docs/stable/extending.html#extensionarray
.. _`Pandas Extension Types`: https://pandas.pydata.org/pandas-docs/stable/extending.html#extension-types
.. _`Pandas README`: https://github.com/pandas-dev/pandas/blob/master/README.md
|