Analyzer

For a high abstraction and to be agnostic about technology, Hadar uses objects as glue for optimizer. Objects are cool, but are too complicated to manipulated for data analysis. Analyzer contains tools to help analyzing study and result.

Today, there is only ResultAnalyzer, with two features level:

  • high level user asks directly to compute global cost and global remain capacity, etc.

  • low level user build query and get raw data represented inside pandas Dataframe.

Before speaking about this features, let’s see how data are transformed.

Flatten Data

As said above, object is nice to encapsulate data and represent it into agnostic form. Objects can be serialized into JSON or something else to be used by another software maybe in another language. But keep object to analyze data is awful.

Python has a very efficient tool for data analysis : pandas. Therefore challenge is to transform object into pandas Dataframe. Solution is to flatten data to fill into table.

Consumption

For example with consumption. Data into Study is cost and asked quantity. And in Result it’s cost (same) and given quantity. This tuple (cost, asked, given) is present for each node, each consumption attached on this node, each scenario and each timestep. If we want to flatten data, we need to fill this table

cost

asked

given

node

name

scn

t

network

10

5

5

fr

load

0

0

default

10

7

7

fr

load

0

1

default

10

7

5

fr

load

1

0

default

10

6

6

fr

load

1

1

default

It is the purpose of _build_consumption(study: Study, result: Result) -> pd.Dataframe to build this array

Production

Production follow the same pattern. However, they don’t have asked and given but available and used quantity. Therefore table looks like

cost

avail

used

node

name

scn

t

network

10

100

21

fr

coal

0

0

default

10

100

36

fr

coal

0

1

default

10

100

12

fr

coal

1

0

default

10

100

81

fr

coal

1

1

default

It’s done by _build_production(study: Study, result: Result) -> pd.Dataframe method.

Storage

Storage follow the same pattern. Therefore table looks like.

max_capacity

capacity

max_flow_in

flow_in

max_flow_out

flow_out

cost

init_capacity

eff

node

name

scn

t

network

12000

678

400

214

400

0

10

0

.99

fr

cell

0

0

default

12000

892

400

53

400

0

10

0

.99

fr

cell

0

1

default

12000

945

400

0

400

87

10

0

.99

fr

cell

1

0

default

12000

853

400

0

400

0

10

0

.99

fr

cell

1

1

default

It’s done by _build_storage(study: Study, result: Result) -> pd.Dataframe method.

Converter

Converter follow the same pattern, it just split in two tables. One for source element:

max

ratio

flow

node

name

scn

t

network

100

.4

52

fr

conv

0

0

default

100

.4

87

fr

conv

0

1

default

100

.4

23

fr

conv

1

0

default

100

.4

58

fr

conv

1

1

default

It’s done by _build_src_converter(study: Study, result: Result) -> pd.Dataframe method.

And an other for destination element, tables are near identical. Source has special attributes called ratio and destintion has special attribute called cost:

max

cost

flow

node

name

scn

t

network

100

20

52

fr

conv

0

0

default

100

20

87

fr

conv

0

1

default

100

20

23

fr

conv

1

0

default

100

20

58

fr

conv

1

1

default

It’s done by _build_dest_converter(study: Study, result: Result) -> pd.Dataframe method.

Low level analysis power with a FluentAPISelector

When you observe flat data, there are two kind of data. Content like cost, given, asked and index describes by node, name, scn, t.

Low level API analysis provided by ResultAnalyzer lets user to

  1. Organize index level, for example set time, then scenario, then name, then node.

  2. Filter index, for example just time from 10 to 150, just ‘fr’ node, etc

User can said, I want ‘fr’ node productions for first scenario to 50 until 60 timestep. In this cas ResultAnalyzer will return

used

cost

avail

t

name

21

fr

uk

50

oil

36

fr

uk

coal

12

fr

uk

60

oil

81

fr

uk

If first index like node and scenario has only one element, there are removed.

This result can be done by this line of code.

agg = hd.ResultAnalyzer(study, result)
df = agg.network().node('fr').scn(0).time(slice(50, 60)).production()

For analyzer, Fluent API respect these rules:

  • API flow begin by network()

  • API flow must contain strictly one of node() , time(), scn() element

  • API flow must contain only one of element inside link() , production() , consumption()

  • Except for network(), API has no order. Order is free for user to give hierarchy data.

  • Therefore above rules, API will always be 5 elements length.

Behind this mechanism, there are Index objects. As you can see directly in the code

...
self.consumption = lambda x=None: self._append(ConsIndex(x))
...
self.time = lambda x=None: self._append(TimeIndex(x))
...

Each kind of index has to inherent from this class. Index object encapsulate column metadata to use and range of filtered elements to keep (accessible by overriding __getitem__ method). Then, Hadar has child classes with good parameters : ConsIndex , ProdIndex , NodeIndex , ScnIndex , TimeIndex , LinkIndex , DestIndex . For example you can find below NodeIndex implementation

class NodeIndex(Index[str]):
    """Index implementation to filter nodes"""
    def __init__(self):
        Index.__init__(self, column='node')
../_images/ulm-index.png

Index instantiation are completely hidden for user. Then, hadar will

  1. check that mandatory indexes are given with _assert_index method.

  2. pivot table to recreate indexing according to filter and sort asked with _pivot method.

  3. remove one-size top-level index with _remove_useless_index_level method.

As you can see, low level analyze provides efficient method to extract data from adequacy study result. However data returned remains a kind of roots and is not ready for business purposes.

High Level Analysis

Unlike low level, high level focus on provides ready to use data. Unlike low level, features should be designed one by one for business purpose. Today we have 2 features:

  • get_cost(self, node: str) -> np.ndarray: method which according to node given returns a matrix (scenario, horizon) shape with summarize cost.

  • get_balance(self, node: str) -> np.ndarray method which according to node given returns a matrix (scenario, horizon) shape with exchange balance (i.e. sum of exportation minus sum of importation)