For a high abstraction and to be agnostic about technology, Hadar uses objects as glue for optimizer. Objects are cool, but are too complicated to manipulated for data analysis. Analyzer contains tools to help analyzing study and result.
Today, there is only ResultAnalyzer, with two features level:
ResultAnalyzer
high level user asks directly to compute global cost and global remain capacity, etc.
low level user build query and get raw data represented inside pandas Dataframe.
Before speaking about this features, let’s see how data are transformed.
As said above, object is nice to encapsulate data and represent it into agnostic form. Objects can be serialized into JSON or something else to be used by another software maybe in another language. But keep object to analyze data is awful.
Python has a very efficient tool for data analysis : pandas. Therefore challenge is to transform object into pandas Dataframe. Solution is to flatten data to fill into table.
For example with consumption. Data into Study is cost and asked quantity. And in Result it’s cost (same) and given quantity. This tuple (cost, asked, given) is present for each node, each consumption attached on this node, each scenario and each timestep. If we want to flatten data, we need to fill this table
Study
Result
cost
asked
given
node
name
scn
t
network
10
5
fr
load
0
default
7
1
6
…
It is the purpose of _build_consumption(study: Study, result: Result) -> pd.Dataframe to build this array
_build_consumption(study: Study, result: Result) -> pd.Dataframe
Production follow the same pattern. However, they don’t have asked and given but available and used quantity. Therefore table looks like
avail
used
100
21
coal
36
12
81
It’s done by _build_production(study: Study, result: Result) -> pd.Dataframe method.
_build_production(study: Study, result: Result) -> pd.Dataframe
Storage follow the same pattern. Therefore table looks like.
max_capacity
capacity
max_flow_in
flow_in
max_flow_out
flow_out
init_capacity
eff
12000
678
400
214
.99
cell
892
53
945
87
853
It’s done by _build_storage(study: Study, result: Result) -> pd.Dataframe method.
_build_storage(study: Study, result: Result) -> pd.Dataframe
Link follow the same pattern. Hierarchical structure naming change. There are not node and name but source and destination. Therefore table looks like.
src
dest
uk
It’s done by _build_link(study: Study, result: Result) -> pd.Dataframe method.
_build_link(study: Study, result: Result) -> pd.Dataframe
Converter follow the same pattern, it just split in two tables. One for source element:
max
ratio
flow
.4
52
conv
23
58
It’s done by _build_src_converter(study: Study, result: Result) -> pd.Dataframe method.
_build_src_converter(study: Study, result: Result) -> pd.Dataframe
And an other for destination element, tables are near identical. Source has special attributes called ratio and destintion has special attribute called cost:
20
It’s done by _build_dest_converter(study: Study, result: Result) -> pd.Dataframe method.
_build_dest_converter(study: Study, result: Result) -> pd.Dataframe
When you observe flat data, there are two kind of data. Content like cost, given, asked and index describes by node, name, scn, t.
Low level API analysis provided by ResultAnalyzer lets user to
Organize index level, for example set time, then scenario, then name, then node.
Filter index, for example just time from 10 to 150, just ‘fr’ node, etc
User can said, I want ‘fr’ node productions for first scenario to 50 until 60 timestep. In this cas ResultAnalyzer will return
50
oil
60
If first index like node and scenario has only one element, there are removed.
This result can be done by this line of code.
agg = hd.ResultAnalyzer(study, result) df = agg.network().node('fr').scn(0).time(slice(50, 60)).production()
For analyzer, Fluent API respect these rules:
API flow begin by network()
network()
API flow must contain strictly one of node() , time(), scn() element
node()
time()
scn()
API flow must contain only one of element inside link() , production() , consumption()
link()
production()
consumption()
Except for network(), API has no order. Order is free for user to give hierarchy data.
Therefore above rules, API will always be 5 elements length.
Behind this mechanism, there are Index objects. As you can see directly in the code
Index
... self.consumption = lambda x=None: self._append(ConsIndex(x)) ... self.time = lambda x=None: self._append(TimeIndex(x)) ...
Each kind of index has to inherent from this class. Index object encapsulate column metadata to use and range of filtered elements to keep (accessible by overriding __getitem__ method). Then, Hadar has child classes with good parameters : ConsIndex , ProdIndex , NodeIndex , ScnIndex , TimeIndex , LinkIndex , DestIndex . For example you can find below NodeIndex implementation
__getitem__
ConsIndex
ProdIndex
NodeIndex
ScnIndex
TimeIndex
LinkIndex
DestIndex
class NodeIndex(Index[str]): """Index implementation to filter nodes""" def __init__(self): Index.__init__(self, column='node')
Index instantiation are completely hidden for user. Then, hadar will
check that mandatory indexes are given with _assert_index method.
_assert_index
pivot table to recreate indexing according to filter and sort asked with _pivot method.
_pivot
remove one-size top-level index with _remove_useless_index_level method.
_remove_useless_index_level
As you can see, low level analyze provides efficient method to extract data from adequacy study result. However data returned remains a kind of roots and is not ready for business purposes.
Unlike low level, high level focus on provides ready to use data. Unlike low level, features should be designed one by one for business purpose. Today we have 2 features:
get_cost(self, node: str) -> np.ndarray: method which according to node given returns a matrix (scenario, horizon) shape with summarize cost.
get_cost(self, node: str) -> np.ndarray:
get_balance(self, node: str) -> np.ndarray method which according to node given returns a matrix (scenario, horizon) shape with exchange balance (i.e. sum of exportation minus sum of importation)
get_balance(self, node: str) -> np.ndarray