Hadar is a adequacy python library for deterministic and stochastic computation
You are in the technical documentation.
Except where otherwise noted, this content is Copyright (c) 2020, RTE and licensed under a CC-BY-4.0 license.
Each kind of network has a needs of adequacy. On one side, some network nodes need to consume items such as watt, litter, package. And other side, some network nodes produce items. Applying adequacy on network, is tring to find the best available exchanges to avoid any lack at the best cost.
For example, a electric grid can have some nodes wich produce too more power and some nodes wich produce not enough power.
In this case, at t=0, A produce 10 more and B need 10 more. Then nodes are well balanced. And at t=2, B produce 10 more and A need 10 more.
For this example, perform adequacy will done ten quantities exachanges from A to B, then zero and at the end 10 quantities from B to A.
Hadar compute adequacy from simple to complex network. For example, to compute above network, just few line need:
Firstly, install hadar : ``pip install hadar``
import hadar as hd
study = hd.Study(horizon=3)\ .network()\ .node('a')\ .consumption(cost=10 ** 6, quantity=[20, 20, 20], name='load')\ .production(cost=10, quantity=[30, 20, 10], name='prod')\ .node('b')\ .consumption(cost=10 ** 6, quantity=[20, 20, 20], name='load')\ .production(cost=10, quantity=[10, 20, 30], name='prod')\ .link(src='a', dest='b', quantity=[10, 10, 10], cost=2)\ .link(src='b', dest='a', quantity=[10, 10, 10], cost=2)\ .build()
optimizer = hd.LPOptimizer() res = optimizer.solve(study)
Then you can analyze by yourself result or use hadar aggragator and plotting
plot = hd.HTMLPlotting(agg=hd.ResultAnalyzer(study, res), node_coord={'a': [2.33, 48.86], 'b': [4.38, 50.83]})
plot.network().node('a').stack()
At starts A export it production. Then it needs to import.
plot.network().node('b').stack(scn=0)
At start B needs to import then it can export its productions
plot.network().map(t=0, zoom=2.5)
plot.network().map(t=2, zoom=2.5)
Welcome to the next tutorial !
We will discover why hadar use cost and how to use it.
Hadar is an adequacy optimizer, like every optimizer it needs cost to determinie the best solution. In hadar, the cost to optimize represent a kind of cost needed to perform network adequacy. Than means Hadar will always try to: - use the cheaper production - use the cheaper path inside network - if hadar can’t match consumption asked, it will turn off cheaper unavailable consumption cost
Let’s start an example with a single node, there are 3 types of productions: solar, nuclear, oil. We want to use first all solar, then switch to nuclear and use oil only as last chance. To see production prioritize, we attach a growing consumption to this node.
import numpy as np import hadar as hd
study = hd.Study(horizon=30)\ .network()\ .node('a')\ .consumption(name='load', cost=10**6, quantity=np.arange(30))\ .production(name='solar', cost=10, quantity=10)\ .production(name='nuclear', cost=100, quantity=10)\ .production(name='oil', cost=1000, quantity=10)\ .build() # tips: If you give just one element, hadar will extended it according horizon size and scenario size
agg = hd.ResultAnalyzer(study=study, result=res)
plot = hd.HTMLPlotting(agg=agg)
Consumption is bit different. Consumption cost is a unavailabilty cost. Therefore, unlike production, Hadar must to use the highest consumption cost first.
For this example, imagine your are the futur. Hydrogen is the only energy source. You have the classic load, you need to match absolutely. Then you have car recharging consumption, has to be matched but could be stopped time to time. And you have also bitcoin mining, which could be stopped as you want.
study = hd.Study(horizon=30)\ .network()\ .node('a')\ .consumption(name='load', cost=10**6, quantity=10)\ .consumption(name='car', cost=10**4, quantity=10)\ .consumption(name='bitcoin', cost=10**3, quantity=10)\ .production(name='hydrogen', cost=10, quantity=np.arange(30))\ .build()
res = optimizer.solve(study) agg = hd.ResultAnalyzer(study=study, result=res) plot = hd.HTMLPlotting(agg=agg)
plot.network().node(node='a').stack(cons_kind='given')
As for production, border cost is a cost of use. Hadar will always select the cheapest cost at first.
For example, Belgium produces many eolien power. It’s a good new because England and France has a peek of consumption. However send energy to England by submarin cable is more expansive than send it to France by traditional line. When we modelize network, we keep this technical cost gap. Like that Hadar will firstly send energy to France and if some energy remain, it will be send to England.
study = hd.Study(horizon=2)\ .network()\ .node('be').production(name='eolien', cost=100, quantity=[10, 20])\ .node('fr').consumption(name='load', cost=10**6, quantity=10)\ .node('uk').consumption(name='load', cost=10**6, quantity=10)\ .link(src='be', dest='fr', cost=10, quantity=10)\ .link(src='be', dest='uk', cost=50, quantity=10)\ .build()
res = optimizer.solve(study) agg = hd.ResultAnalyzer(study=study, result=res) plot = hd.HTMLPlotting(agg=agg, node_coord={'fr': [2.33, 48.86], 'be': [4.38, 50.83], 'uk': [0, 52]})
plot.network().map(t=0, zoom=2.7)
At t=0, Belgium has not enough energy for both. Hadar will send it to France to optimize transfert cost.
plot.network().map(t=1, zoom=2.7)
At t=1, Belgium has enough energy for both.
In this example, we will test hadar on a realistic (yet simplify) use case. We will perform adequacy between France and Germainy during one day.
import pandas as pd import numpy as np import hadar as hd
fr = pd.read_csv('fr.csv') de = pd.read_csv('de.csv')
study = hd.Study(horizon=48).network()
France loves nuclear, so in this example most of production are nuclear. France has also a bit of solar and when needed country can turn on/off coal generator. We want to optimize adequacy by reduce CO2 production. Therefore: - solar is the cheaper at 10 - then we use nuclear at 30 - and coal at 100
study = study.node('fr')\ .consumption(name='load', cost=10**6, quantity=fr['cons'])\ .production(name='solar', cost=10, quantity=fr['solar'])\ .production(name='nuclear', cost=30, quantity=fr['nuclear'])\ .production(name='coal', cost=100, quantity=fr['coal'])
Germainy has stopped nuclear to switch from renewable energy. So we increase solar and eolien production. When renewable energy are off, Germainy need to start coal generation to match its consumption. Like for France, we want to minimize CO2 production: - solar at 10 - eolien at 15 - coal at 100
study = study.node('de')\ .consumption(name='load', cost=10**6, quantity=de['cons'])\ .production(name='solar', cost=10, quantity=de['solar'])\ .production(name='eolien', cost=15, quantity=de['eolien'])\ .production(name='coal', cost=100, quantity=de['coal'])
Then both side links are set with same cost at 5. In this network, Germany will be import from nuclear french before to start coal. And France will use germain coal to avoid any loss of load.
study = study\ .link(src='fr', dest='de', cost=5, quantity=4000)\ .link(src='de', dest='fr', cost=5, quantity=4000)\ .build()
agg = hd.ResultAnalyzer(study, res) plot = hd.HTMLPlotting(agg=agg, unit_symbol='MW', # Set unit quantity time_start='2020-02-01', # Set time interval time_end='2020-02-02')
plot.network().rac_matrix()
plot.network().node(node='fr').stack(prod_kind='used', cons_kind='asked')
plot.network().node('fr').consumption('load').gaussian(scn=0)
plot.network().node(node='de').stack()
Hadar found a loss of load near 6h in Germany and import from France. Then France had a loss of load, and Hadar exports to France.
plot.network().node('de').consumption(name='load').gaussian(scn=0)
In this example, you learn to use ResultAnalyzer. You has already use it in preivous example to instanciate plotting: agg = hd.ResultAnalyzer(study, result)
ResultAnalyzer
agg = hd.ResultAnalyzer(study, result)
Let’s begin by build little study with two nodes (A and B) both has a sinus-like load from 1500 to 500. Node A has a constant nuclear plan, node B has eolien with linear random.
import hadar as hd import numpy as np import pandas as pd
t = np.linspace(0, np.pi * 14, 168) load = 1000 + np.sin(t) * 500 eolien = np.random.rand(t.size) * 1000
study = hd.Study(horizon=t.size, nb_scn=1)\ .network()\ .node('a')\ .consumption(name='load', cost=10 ** 6, quantity=load)\ .production(name='nuclear', cost=100, quantity=1500)\ .node('b')\ .consumption(name='load', cost=10 ** 6, quantity=load)\ .production(name='eolien', cost=50, quantity=eolien)\ .link(src='a', dest='b', cost=5, quantity=2000)\ .link(src='b', dest='a', cost=5, quantity=2000)\ .build()
opt = hd.LPOptimizer() res = opt.solve(study)
Analyzer provide a low api, that means result could not be ready-to-use, but it’s a very flexible way to analyze data. Low API enable to thinks: - set order. data has for level : node, element, scn and time. Low API can organize for your these level - filtering: for each level you can apply a filter, to only select node ‘a’, or time from 10 to 35 timestep
For examples you want select consumption named load other all node just for 57 to 78 timestep
agg.network().scn(0).consumption('load').node().time(slice(57, 78))
TIP If filter return only one element, set it at first. First indexes with one element are removed to avoir useless indexes.
Another example: Analyze all production first 24 timestep
agg.network().scn(0).node().production().time(slice(0,24))
To summrize low api, you can organize and filter data by network, scenarios, time, node and elements on node.
High API is ready to use data. It gives you a business oriented data about adequacy. Today we have: - Get balance to compute net position on a node - Get cost to compute cost on a node - Get Remain Available Capacities
import plotly.graph_objects as go
def plot(y): return go.Figure(go.Scatter(x=t, y=y.flatten()))
data = agg.get_balance(node='a') # Compute net exchange for all scenario and timestep plot(data)
data = agg.get_cost(node='b') # Compute cost for all scenario and timestep plot(data)
data = agg.get_rac() # Compute Remain Available Capacities for all scenarios and timestep plot(data)
Welcome to this new tutorial. Off course Hadar is well designed to compute study for network adequacy. You can launch Hadar to compute adequacy for the next second or next year.
But Hadar can also be used like a asset investment tool. In this example, thanks to Hadar, we will make the best choice for renewable energy and network investment.
We have a small region, with metropole which doesn’t produce anything, a nuclear plan and two small cities with production.
First step parse data with pandas (and plot them)
import numpy as np import pandas as pd import hadar as hd import plotly.graph_objects as go
a = pd.read_csv('a.csv', index_col='date') fig = go.Figure() fig.add_traces(go.Scatter(x=a.index, y=a['consumption'], name='load')) fig.add_traces(go.Scatter(x=a.index, y=a['gas'], name='gas')) fig.update_layout(title_text='Node A', yaxis_title='MW')
b = pd.read_csv('b.csv', index_col='date') fig = go.Figure() fig.add_traces(go.Scatter(x=b.index, y=b['consumption'], name='load')) fig.update_layout(title_text='Node B (only consumption)', yaxis_title='MW')
c = pd.read_csv('c.csv', index_col='date') fig = go.Figure() fig.add_traces(go.Scatter(x=c.index, y=c['nuclear'], name='load')) fig.update_layout(title_text='Node C (only production)', yaxis_title='MW')
d = pd.read_csv('d.csv', index_col='date') fig = go.Figure() fig.add_traces(go.Scatter(x=d.index, y=d['consumption'], name='load')) fig.add_traces(go.Scatter(x=d.index, y=d['eolien'], name='eolien')) fig.update_layout(title_text='Node D', yaxis_title='MW')
Next step, code this network with Hadar
line = np.ones(8760) * 2000 # 2000 MW
base = hd.Study(horizon=8760)\ .network()\ .node('a')\ .consumption(name='load', cost=10**6, quantity=a['consumption'])\ .production(name='gas', cost=80, quantity=a['gas'])\ .node('b')\ .consumption(name='load', cost=10**6, quantity=b['consumption'])\ .node('c')\ .production(name='nuclear', cost=50, quantity=c['nuclear'])\ .node('d')\ .consumption(name='load', cost=10**6, quantity=d['consumption'])\ .production(name='eolien', cost=20, quantity=d['eolien'])\ .link(src='a', dest='b', cost=5, quantity=line)\ .link(src='b', dest='c', cost=5, quantity=line)\ .link(src='c', dest='a', cost=5, quantity=line)\ .link(src='c', dest='b', cost=10, quantity=line)\ .link(src='c', dest='d', cost=10, quantity=line)\ .link(src='d', dest='c', cost=10, quantity=line)\ .build()
optimizer = hd.LPOptimizer()
def compute_cost(study): res = optimizer.solve(study) agg = hd.ResultAnalyzer(study=study, result=res) return agg.get_cost().sum(axis=1), res.benchmark
def print_bench(bench): print('mapper', bench.mapper) print('modeler', sum(bench.modeler)) print('solver', sum(bench.solver)) print('total', bench.total)
base_cost, bench = compute_cost(base) base_cost = base_cost[0]
An investissor want to build a solar park with solar cells. According to last last data meteo, he could except the amount of production from this park. (Solar radiation is the same on each node of network).
What is the best node to install these solar pans ? (B is excluded because there are not enough space)
park = pd.read_csv('solar.csv', index_col='date') fig = go.Figure() fig.add_traces(go.Scatter(x=park.index, y=park['solar'], name='solar')) fig.update_layout(title_text='Forecast Solar Park Power', yaxis_title='MW')
We can build one study for each different scenarios. However, Hadar can compute many scenarios at once for a more efficient compute. Result are the same. The possibility to compute many scenarios at once, is very important for next topic Stochastic Study.
def build_sparce_data(total: int, at, data) -> np.ndarray: """ Build many scenarios input where all scenario is empty but one. :param total: number of scenario to generate :param at: scenario index to fill :param data: data to fill in selected scenario :return: matrix with shape (nb_scn, horizon) where only one scenario is not zero. """ if isinstance(data, pd.DataFrame): data = data.values.flatten() sparce = np.ones((total, data.size)) sparce[at, :] = data return sparce
We use start three studies one for each node.
solar = hd.Study(horizon=8760, nb_scn=3)\ .network()\ .node('a')\ .consumption(name='load', cost=10**6, quantity=a['consumption'])\ .production(name='gas', cost=80, quantity=a['gas'])\ .production(name='solar', cost=10, quantity=build_sparce_data(total=3, at=0, data=park))\ .node('b')\ .consumption(name='load', cost=10**6, quantity=b['consumption'])\ .node('c')\ .production(name='nuclear', cost=50, quantity=c['nuclear'])\ .production(name='solar', cost=10, quantity=build_sparce_data(total=3, at=1, data=park))\ .node('d')\ .consumption(name='load', cost=10**6, quantity=d['consumption'])\ .production(name='eolien', cost=20, quantity=d['eolien'])\ .production(name='solar', cost=10, quantity=build_sparce_data(total=3, at=2, data=park))\ .link(src='a', dest='b', cost=5, quantity=line)\ .link(src='b', dest='c', cost=5, quantity=line)\ .link(src='c', dest='a', cost=5, quantity=line)\ .link(src='c', dest='b', cost=10, quantity=line)\ .link(src='c', dest='d', cost=10, quantity=line)\ .link(src='d', dest='c', cost=10, quantity=line)\ .build()
costs, bench = compute_cost(solar) costs = pd.Series(data=costs, name='cost', index=['a', 'c', 'd'])
(base_cost - costs) / base_cost * 100
a 8.070145 c 2.342062 d 2.736793 Name: cost, dtype: float64
As we can see, network is more efficient if solar park is installed one node A (8% more efficient than only 2-3% for other node)
Add an extra difficulties ! Region want to invest in a new line between A->C, D->B, A->D, D->A.
In this case, What is the best place to install solar park and what is the more usefull line to build ?
solar_line = hd.Study(horizon=8760, nb_scn=12)\ .network()\ .node('a')\ .consumption(name='load', cost=10**6, quantity=a['consumption'])\ .production(name='gas', cost=80, quantity=a['gas'])\ .production(name='solar', cost=10, quantity=build_sparce_data(total=12, at=[0, 3, 6, 9], data=park))\ .node('b')\ .consumption(name='load', cost=10**6, quantity=b['consumption'])\ .node('c')\ .production(name='nuclear', cost=50, quantity=c['nuclear'])\ .production(name='solar', cost=10, quantity=build_sparce_data(total=12, at=[1, 4, 7, 10], data=park))\ .node('d')\ .consumption(name='load', cost=10**6, quantity=d['consumption'])\ .production(name='eolien', cost=20, quantity=d['eolien'])\ .production(name='solar', cost=10, quantity=build_sparce_data(total=12, at=[2, 5, 8, 11], data=park))\ .link(src='a', dest='b', cost=5, quantity=line)\ .link(src='b', dest='c', cost=5, quantity=line)\ .link(src='c', dest='a', cost=5, quantity=line)\ .link(src='c', dest='b', cost=10, quantity=line)\ .link(src='c', dest='d', cost=10, quantity=line)\ .link(src='d', dest='c', cost=10, quantity=line)\ .link(src='a', dest='c', cost=10, quantity=build_sparce_data(total=12, at=[0, 1, 2], data=line))\ .link(src='d', dest='b', cost=10, quantity=build_sparce_data(total=12, at=[3, 4, 5], data=line))\ .link(src='a', dest='d', cost=10, quantity=build_sparce_data(total=12, at=[6, 7, 8], data=line))\ .link(src='d', dest='a', cost=10, quantity=build_sparce_data(total=12, at=[9, 10, 11], data=line))\ .build()
costs2, bench = compute_cost(solar_line) costs2 = pd.DataFrame(data=costs2.reshape(4, 3), index=['a->c', 'd->b', 'a->d', 'd->a'], columns=['a', 'c', 'd'])
(base_cost - costs2) / base_cost * 100
Very interesting, new line is a game changer. D->A and D->B seem most valuable lines. If D->B is created, it’s more efficient to install solar park on node D !
When you want to simulate a network adequacy, you can perform a deterministic computation. That means you believe you won’t have too much fluky behavior in the future. If you perform adequacy for the next hour or day, it’s a good hypothesis. But if you simulate network for the next week, month or year, it’s sound curious.
Are you sur wind will blow next week or sun will shines ? If not, you eolian or solar production could change. Can you warrant that no failure will occur on your network next month or next year ?
Of course, we can not predict future with such precision. It’s why we use stochastic computation. Stochastic means there are fluky behavior in the physics we want simulate. An single simulation is quiet useless, if result can change due to little variation.
The best solution could be to compute a God function which tell you for each input variation (solar production, line, consumptions) what is the adequacy result. Like that, Hadar has just to analyze function, its derivatives, min, max, etc to predict future. But this God function doesn’t exist, we just have an algorithm which tell us adequacy according to one fixed set of input data.
It’s why we use Monte Carlo algorithm. Monte Carlo run many scenarios to analyze many different behavior. Scenario with more consumption in cities, less solar production, less coal production or one line deleted due to crash. By this method we recreate God function by sampling it with the Monte-Carlo method.
We will reuse network seen in Network Investment. If you don’t read this part, don’t worry we just reuse network no more. It’s look like
We use data generated in the next topic Workflow. Input data representes 10 scenarios with different load and eolien productions. There are also random faults for nuclear and gas. These 10 scenarios are unique. They are 10 random sampling on the God function to try to predict more widely network adequacy
import hadar as hd import numpy as np
def read_csv(name): return np.genfromtxt('%s.csv' % name, delimiter=' ').T
line = 2000 study = hd.Study(horizon=168, nb_scn=10)\ .network()\ .node('a')\ .consumption(name='load', cost=10**6, quantity=read_csv('load_A'))\ .production(name='gas', cost=80, quantity=read_csv('gas'))\ .node('b').consumption(name='load', cost=10**6, quantity=read_csv('load_B'))\ .node('c').production(name='nuclear', cost=50, quantity=read_csv('nuclear'))\ .node('d')\ .consumption(name='load', cost=10**6, quantity=read_csv('load_D'))\ .production(name='eolien', cost=20, quantity=read_csv('eolien'))\ .link(src='a', dest='b', cost=5, quantity=line)\ .link(src='b', dest='c', cost=5, quantity=line)\ .link(src='c', dest='a', cost=5, quantity=line)\ .link(src='c', dest='b', cost=10, quantity=line)\ .link(src='c', dest='d', cost=10, quantity=line)\ .link(src='d', dest='c', cost=10, quantity=line)\ .build()
agg = hd.ResultAnalyzer(study, res) plot = hd.HTMLPlotting(agg=agg, unit_symbol='MW', time_start='2020-06-19', time_end='2020-06-27', node_coord={'a': [1.6264, 47.8842], 'b': [1.9061, 47.9118], 'c': [1.6175, 47.7097], 'd': [1.9314, 47.7090]})
Let’s start by a quick overview of adequacy by plotting a remain available capacity. Blue squares mean network as enough energy to sustain consumption. Red square mean network has a lack of adequacy.
As you see it, stochastic is important. Some scenario like 5th is completly success. But if there are more consumption and less production due to unpredictable event, you will have unadequacy.
plot.network().node('b').consumption('load').timeline()
plot.network().node(node='b').stack(scn=7)
Hadar can also display valuable information about production. For examples, gas plan seems turn off most of the time
plot.network().node('a').production('gas').monotone(scn=7)
plot.network().node('d').production('eolien').timeline()
Then we can plot map to see exchange inside network
plot.network().map(t=4, scn=7, zoom=1.6)
When you want to simulate adequacy in a network for the next weeks or month, you need to create stochastic study, and generate scenarios (c.f. Begin Stochastic)
Workflow is the preprocessing module for Hadar. Workflow will help user to generate scenarios and sample them to create a stochastic study. It’s a toolbox to create pipelines to transform data for optimizer.
With workflow, you will plug stage themself to create pipeline. Stages can already be developed or you can develop your own Stage.
To understand workflow power we will generate data previously used in Begin Stochastic
Let’s begin by constant production like nuclear and gas. These productions are not stochastic by default. However fault can occur and it’s what we will generate. For this example all stages belongs to hadar ready-to-use library.
import hadar as hd import numpy as np import pandas as pd import plotly.graph_objects as go
# We generate 5 fault scenarios where a fault remove 100 MW with an odd of 1% by timestep, # minimum downtime are one step (one hour) and maximum downtime are 12 step. fault_pipe = hd.RepeatScenario(n=5) + hd.Fault(loss=300, occur_freq=0.01, downtime_min=1, downtime_max=12) + hd.ToShuffler('quantity')
In this case, we have to develop our own stage. Let’s begin with wind. We know max wind power, we will apply a linear random between 0 to max for each timestep
class WindRandom(hd.Stage): def __init__(self): hd.Stage.__init__(self, plug=hd.FreePlug()) # We will see in other example what is FreePlug # Method to implement from Stage to create your own Stage with its behaviour def _process_timeline(self, timeline: pd.DataFrame) -> pd.DataFrame: return timeline * np.random.rand(*timeline.shape)
wind_pipe = hd.RepeatScenario(n=3) + WindRandom() + hd.ToShuffler('quantity')
Then we generate load. For load we will apply a cumulative normal distribution with given value as mean.
class LoadRandom(hd.Stage): def __init__(self): hd.Stage.__init__(self, plug=hd.FreePlug()) # We will see in other example what is FreePlug # Method to implement from Stage to create your own Stage with its behaviour def _process_timeline(self, timeline: pd.DataFrame) -> pd.DataFrame: return timeline + np.cumsum(np.random.randn(*timeline.shape) * 10, axis=0)
load_pipe = hd.RepeatScenario(n=3) + LoadRandom() + hd.ToShuffler('quantity')
We use Shuffler object to generate data by pipeline and then sample 10 scenarios
ones = pd.DataFrame({'quantity': np.ones(168)}) # Load are simply a sinus shape sinus = pd.DataFrame({'quantity': np.sin(np.linspace(-1, -1+np.pi*14, 168))*.2 + .8}) shuffler = hd.Shuffler() shuffler.add_pipeline(name='gas', data=ones * 1000, pipeline=fault_pipe) shuffler.add_pipeline(name='nuclear', data=ones * 5000, pipeline=fault_pipe) shuffler.add_pipeline(name='eolien', data=ones * 1000, pipeline=wind_pipe) shuffler.add_pipeline(name='load_A', data=sinus * 2000, pipeline=load_pipe) shuffler.add_pipeline(name='load_B', data=sinus * 3000, pipeline=load_pipe) shuffler.add_pipeline(name='load_D', data=sinus * 1000, pipeline=load_pipe)
sampling = shuffler.shuffle(nb_scn=10)
def input_plot(title, raw, generate): x = np.arange(raw.size) fig = go.Figure() for i, scn in enumerate(generate): fig.add_trace(go.Scatter(x=x, y=scn, name='scn %d' % i, line=dict(color='rgba(100, 100, 100, 0.2)'))) fig.add_traces(go.Scatter(x=x, y=raw.values.T[0], name='raw')) fig.update_layout(title_text=title) return fig
input_plot('Gas', ones * 1000, sampling['gas'])
input_plot('Nuclear', ones * 5000, sampling['nuclear'])
input_plot('eolien', ones * 1000, sampling['eolien'])
input_plot('load_A', sinus * 2000, sampling['load_A'])
# for name, values in sampling.items(): # np.savetxt('../Begin Stochastic/%s.csv' % name, values.T, delimiter=' ', fmt='%04.2f')
In Workflow we saw how to easly create simple stage and links stages to build pipeline. It’s time to see complet workflow features to create more complex Stage.
First takes a look at how data are represented inside stage, a, b, c are column names provide by user and used by stages:
scn 0
scn 1
scn …
t
a
b
…
0
_
<tr> <td style="border: 1px solid black">1</td> <td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td> <td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td> <td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td> </tr> <tr> <td style="border: 1px solid black">...</td> <td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td> <td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td> <td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td> </tr>
Pipeline could be more flexible, and allow user input without scenarios. Like that, it will be standardized by adding a default 0th scenario.
<tr> <td style="border: 1px solid black">1</td> <td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td> </tr> <tr> <td style="border: 1px solid black">...</td> <td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td><td style="border: 1px solid black">_</td> </tr>
As you see above, data contains scenarios and each scenario contains columns with generic names. These names become a constraint. For example some stages expectes to receive strict name, or will produce other columns with new name. Hadar provide a mechanism to handle this complexity called Plug. You has already seen hd.FreePlug which mean stage has no constraint: It doesn’t expected any particular input and doesn’t produce specific column.
hd.FreePlug
For example, if you juste need to multiply by twice data, you can create a Stage with FreePlug:
FreePlug
class Twice(hd.Stage): def __init__(self): Stage.__init__(plug=hd.FreePlug()) def _process_timeline(tl): return tl * 2
It simple, but some time, you expected strictly column name to process timeline. In this case you will use hd.RestrictedPlug(input, output), input declare what column names you expected to perform calcul, output says what is new column names created during calcul.
hd.RestrictedPlug(input, output)
Now we care about column name, we often need to apply calcul scenario by scenario and not at the global dataframe. To handle, this mechanism, hadar provides you a FocusStage which give you a _process_scenario(scn, tl) to implement.
FocusStage
_process_scenario(scn, tl)
In last example, we created a Stage to generate wind power, just by apply a linear random generation. Now we want more precise generation. Whereas previous stage just use max variables to generate linear random, we use two variables mean and std to generate normal random.
class Wind(hd.FocusStage): # Compute will be done scenario by scenario so we use FocusStage def __init__(self): # Use Restricted plug to force constraint hd.FocusStage.__init__(self, plug=hd.RestrictedPlug(inputs=['mean', 'std'], outputs=['wind'])) def _process_scenarios(self, nb_scn, tl): return tl['mean'] + np.random.randn(tl.shape[0]) * tl['std']
Wind can be plug, upstream stages have to provide mean and std, downstream stage should use wind. For example, hd.Clip and hd.RepeadScenario are a free plug, you can plug them every where
hd.Clip
hd.RepeadScenario
pipe = hd.RepeatScenario(5) + Wind() + hd.Clip(lower=0) # Make sur no negative production are generated
But if you want to plug Fault, error will raise, because Fault expectes a quantity column
Fault
try: pipe = hd.RepeatScenario(5) + Wind() + hd.Clip(lower=0) \ + hd.Fault(occur_freq=0.01, loss=100, downtime_min=1, downtime_max=10) except ValueError as e: print('ValueError:', e)
ValueError: Pipeline can't be added current outputs are ['wind'] and Fault has input ['quantity']
In this case, you can use hd.Rename to refix stages with good column name. To summerize pipeline : 1. copy 5 time data in new scenarios 2. apply random generation for each scenarios 3. cap data below 0 (a negativ productoin doesn’t exist) 4. Rename data column from wind to quantity 5. Generate random fault for each scenarios
hd.Rename
pipe = hd.RepeatScenario(5) + Wind() + hd.Clip(lower=0) \ + hd.Rename(wind='quantity') + hd.Fault(occur_freq=0.01, loss=100, downtime_min=1, downtime_max=10)
Check is performed when stages are linked together, but also when user give input data. Lines above will raise error since input doesn’t have mean columns name
t = np.linspace(0, 4*3.14, 168)
try: i = pd.DataFrame({'NOT-mean': np.sin(t) * 1000 + 1000, 'std': np.sin(t*2)* 200 + 200}) o = pipe(i) except ValueError as e: print('ValueError:', e)
ValueError: Pipeline accept ['mean', 'std'] in input, but receive ['NOT-mean' 'std']
i = pd.DataFrame({'mean': np.sin(t) * 1000 + 1000, 'std': np.sin(t*2) * 200 + 200}) o = pipe(i.copy())
fig = go.Figure() fig.add_traces(go.Scatter(x=t, y=i['mean'], name='mean')) fig.add_traces(go.Scatter(x=t, y=i['std']+i['mean'], name='std+', line=dict(color='red', dash='dash'))) fig.add_traces(go.Scatter(x=t, y=-i['std']+i['mean'], name='std-', line=dict(color='red', dash='dash'))) for n in range(5): fig.add_traces(go.Scatter(x=t, y=o[n]['quantity'], name='wind %d' % n, line=dict(color='rgba(100, 100, 100, 0.5)'))) fig
We has already seen Consumption, Production and Link to attach on node. Hadar has also a Stockage element. We will work on a simple network with two nodes : one with two producitons (stochastic and constant) other with consumption and stockage
img
np.random.seed(12684681) eolien = np.random.rand(168) * 500 + 200 # random from 200 to 700 load = np.sin(np.linspace(-1, -1+np.pi*14, 168)) * 250 + 750 # sinus moving 500 to 1000
Start storage by remove storage !
study = hd.Study(horizon=eolien.size)\ .network()\ .node('a')\ .production(name='gas', cost=100, quantity=200)\ .production(name='nulcear', cost=50, quantity=300)\ .production(name='eolien', cost=10, quantity=eolien)\ .node('b')\ .consumption(name='load', cost=10 ** 6, quantity=load)\ .link(src='a', dest='b', cost=1, quantity=2000)\ .build() optim = hd.LPOptimizer() res = optim.solve(study) plot_without = hd.HTMLPlotting(agg=hd.ResultAnalyzer(study=study, result=res), unit_symbol='MW')
plot_without.network().node('b').stack()
Node B has a lot of lost of load. Network has not enough power to sustain consumption during peak.
plot_without.network().node('a').stack()
Productions are used immediately just to match load
Now we add a storage. In our case cell efficiency is 80%, efficient must be < 1, Hadar use eff=0.99 as default. Other important parameter is cost it represents cost of storage per quantity during on time-step. cost at 0 or positive mean we want to minimize storage used. By default Hadar use cost=0.
eff=0.99
cost
cost=0
So in the configuration, cost=0and eff=0.80. Therefore, a quantity stored costs 25% (\(\frac{1}{0.8} = 1.25\)) higher than same production without stored before. At any time Hadar has choice between these productions and cost.
eff=0.80
Moreover than just fix lost of load, storage can also optimize productions. Looks, a stored nuclear or eolien production is cheaper than a direct gas production. Hadar knows it and will use it !
study = hd.Study(horizon=eolien.size)\ .network()\ .node('a')\ .production(name='gas', cost=100, quantity=200)\ .production(name='nulcear', cost=50, quantity=300)\ .production(name='eolien', cost=10, quantity=eolien)\ .node('b')\ .consumption(name='load', cost=10 ** 6, quantity=load)\ .storage(name='cell', init_capacity=200, capacity=800, flow_in=400, flow_out=400, eff=.8)\ .link(src='a', dest='b', cost=1, quantity=2000)\ .build() res = optim.solve(study) plot = hd.HTMLPlotting(agg=hd.ResultAnalyzer(study=study, result=res), unit_symbol='MW')
plot.network().node('b').stack()
Yeah ! We avoid network shutdown !
plot.network().node('b').storage('cell').candles()
Hadar fills cell before each peaks.
And yes, Hadars starts nuclear before peak, and use less gas during peak.
What happen, if we use negative cost ?
In this case, storage has some interest. If interest is higher than gain from optimizing productions. Hadar will automatically fill cell.
study = hd.Study(horizon=eolien.size)\ .network()\ .node('a')\ .production(name='gas', cost=100, quantity=200)\ .production(name='nulcear', cost=50, quantity=300)\ .production(name='eolien', cost=10, quantity=eolien)\ .node('b')\ .consumption(name='load', cost=10 ** 6, quantity=load)\ .storage(name='cell', init_capacity=200, capacity=800, flow_in=400, flow_out=400, eff=.8, cost=-10)\ .link(src='a', dest='b', cost=1, quantity=2000)\ .build() res = optim.solve(study) plot_cost_neg = hd.HTMLPlotting(agg=hd.ResultAnalyzer(study=study, result=res), unit_symbol='MW')
plot_cost_neg.network().node('b').storage('cell').candles()
Hadar doesn’t try to optimize import, now it saves into storage to earn interest.
Hadar is designed to manage many kind of energy. Indeed, the only restriction is the mathematical equation applies on each node, if user case fill in this equation, Hadar can handle user case.
That means Hadar is not designed for a specific energy. And moreover, Hadar can handle many energies in one study, called multi-energies. To do that, Hadar use network which organize node inside the same energy. Node inside the same network manage the same energy, we use Link to plug them together. If user has many network, therefore many energies, he has to use Converter. Converter is more powerfull than Link, user can specify conversion from many different nodes to one node.
Link
Converter
#### Set problem data
No electricity in this tutorial, we will modelize an explosion engine. There are three kind of energies in an engine: oil (gramme), compressed air (gramme) and work (Joule).
figure
Data problem: - 1g of oil = 41868J - for engine, ratio oil/air is 15:1, 1g of oil for 15g of air - engine has an efficiency about 36%
In Hadar, we have to set ratio \(R_i\) such as \(In_i * R_i = Out\) for each input \(i\).
Equation applies to oil conversion gives:
Equation applies to air conversion gives:
Work is modellized by a consumption such as \(10000*(1-e^{-t/25})\)
work = 10000*(1 - np.exp(-np.arange(100)/25))
study = hd.Study(horizon=100)\ .network('work')\ .node('work')\ .consumption(name='work', cost=10**6, quantity=work)\ .network('oil')\ .node('oil')\ .production(name='oil', cost=10, quantity=10)\ .to_converter(name='engine', ratio=15072.5)\ .network('air')\ .node('air')\ .production(name='air', cost=10, quantity=150)\ .to_converter(name='engine', ratio=1005)\ .converter(name='engine', to_network='work', to_node='work', max=10000)\ .build()
optim = hd.LPOptimizer() res = optim.solve(study)
agg = hd.ResultAnalyzer(study=study, result=res) plot = hd.HTMLPlotting(agg=agg, unit_symbol='J')
plot.network('work').node('work').stack()
Work energy comes from engine converter. If we analyze oil and air used in result, we found correct ratio.
oil = agg.network('oil').scn(0).node('oil').production('oil').time()['used'] air = agg.network('air').scn(0).node('air').production('air').time()['used']
(air / oil).plot()
<AxesSubplot:xlabel='t'>
Welcome to the Hadar Architecture Documentation.
Hadar purpose is to be an adequacy library for everyone.
Why these goals ?
We design Hadar in the same spirit of python libraries like numy or scipy, and moreover like scikit-learn. Before scikit-learn, people who want to develop machine learning have to had strong skill in mathematics background to develop their own code. Some ready to go codes existed but were not easy to use and flexible.
Scikit-learn release the power of machine learning by abstract complex algorithms into very straight forward API. It was designed like a toolbox to handle full machine learning framework, where user can just assemble scikit-learn component or build their own.
Hadar want to be the next scikit-learn for adequacy. Hadar has to be easy to use and flexible, which if we translate into architecture terms become high abstraction level and independent modules.
User has the choice : Use only Hadar components, assemble them and create a full solution to generate, solve and analyze adequacy study. Or build their parts.
To reach this constraint, we split Hadar into 4 main modules which can be use together or apart :
As said, these modules can be used together to handle complete adequacy study lifecycle or used apart.
TODO graph architecture module
Each above modules are like a tiny independent libraries. Therefore each module has a high level API. High abstraction, is a bit confuse to handle and benchmark. For us a high abstraction is when user doesn’t need to know mathematics or technicals stuffs when he uses library.
Scikit-learn is the best example of high abstraction level API. For example, if we just want to start a complete SVM research
from sklean.svm import SVR svm = SVR() svm.fit(X_train, y_train) y_pred = svm.predict(X_test)
How many people using this feature know that scikit-learn tries to project data into higher space to find a linear regression inside. And to accelerate computation, it uses mathematics a feature called a kernel trick because problem respect strict requirements ? Perhaps just few people and it’s all the beauty of an high level API, it hidden background gear.
Hadar tries to keep this high abstraction features. Look at the Get Started example
import hadar as hd study = hd.Study(horizon=3)\ .network()\ .node('a')\ .consumption(cost=10 ** 6, quantity=[20, 20, 20], name='load')\ .production(cost=10, quantity=[30, 20, 10], name='prod')\ .node('b')\ .consumption(cost=10 ** 6, quantity=[20, 20, 20], name='load')\ .production(cost=10, quantity=[10, 20, 30], name='prod')\ .link(src='a', dest='b', quantity=[10, 10, 10], cost=2)\ .link(src='b', dest='a', quantity=[10, 10, 10], cost=2)\ .build() optim = hd.LPOptimizer() res = optim.solve(study)
Create a study like you will draw it on a paper. Put your nodes, attach some production, consumption, link and run optimizer.
Optimizer, Analayzer and Viewer parts are build around the same API called inside code Fluent API Selector. Each part has its flavours.
Now goals are fixed, we can go deeper into specific module documentation. All architecture focuses on : High Abstraction and Independent module. You can also read the best practices guide to understand more development choice made in Hadar.
Let’t start code explanation.
Workflow is the preprocessing module for Hadar. It’s a toolbox to create pipelines to transform data for optimizer.
Of course, we can not predict future with such precision. It’s why we use stochastic computation. Stochastic means there are fluky behavior in the physics we want simulate. Simulation is quiet useless, if result is a unique result.
Workflow will help user to generate these scenarios and sample them to create a stochastic study.
The main issue when we want to help people generating their scenarios is they are as many generating process as user. Therefore workflow is build upon a Stage and Pipeline Architecture.
Stage is an atomic process applied on data. In workflow, data is a pandas Dataframe. Index is time. First column level is for scenario, second is for data (it could be anything like mean, max, sigma, …). Dataframe is represented below:
A stage will perform compute to this Dataframe. As you assume it, stages can be linked together to create pipeline. Hadar has its own stages very generic, each user can build these stages and create these pipelines.
For examples, you have many coal production. Each production plan has 10 generators of 100 MW. That means a coal plan production has 1,000 MW of power. You know that sometime, some generators crash or need shutdown for maintenance. With Hadar you can create a pipeline to generate these fault scenarios.
# In this example, one timestep = one hour import hadar as hd import numpy as np import hadar as hd import matplotlib.pyplot as plt # coal production over 8 weeks with hourly step coal = pd.DataFrame({'quantity': np.ones(8 * 168) * 1000}) # Copy scenarios ten times copy = hd.RepeatScenario(n=10) # Apply on each scenario random fault, such as power drop is 100 MW, there is 0.1% chance of failure each hour # if failure, it's a least for the whole day and until next week. fault = hd.Fault(loss=100, occur_freq=0.001, downtime_min=24, downtime_max=168) pipe = copy + fault out = pipe.compute(coal) out.plot() plt.show()
Output:
RepeatScenario, Fault and all other are build upon Stage abstract class. A Stage is specified by its Plug (we will see sooner) and a _process_timeline(self, timeline: pd.DataFrame) -> pd.DataFrame to implement. timeline variable inside method is the data passed thought pipeline to transform.
RepeatScenario
Stage
Plug
_process_timeline(self, timeline: pd.DataFrame) -> pd.DataFrame
timeline
For example, you need to multiply by 2 during your pipeline. You can create your stage by
class Twice(Stage): def __init__(self): Stage.__init__(self, FreePlug()) def _process_timeline(self, timelines: pd.DataFrame) -> pd.DataFrame: return timelines * 2
Implement Stage will work every time. Often, you want to apply function independently for each scenario. You can of course handle yourself this mechanism to split current timeline apply method and rebuild at the end. Or use FocusStage, same thing but already coded. In this case, you need to inherent from FocusStage and implement _process_scenarios(self, n_scn: int, scenario: pd.DataFrame) -> pd.DataFrame method.
_process_scenarios(self, n_scn: int, scenario: pd.DataFrame) -> pd.DataFrame
For example, you have thousand of scenarios, your stage has to generate gaussian series according to mean and sigma given.
class Gaussian(FocusStage): def __init__(self): FocusStage.__init__(self, plug=RestrictedPlug(input=['mean', 'sigma'], output=['gaussian'])) def _process_scenarios(self, n_scn: int, scenario: pd.DataFrame) -> pd.DataFrame: scenario['gaussian'] = np.random.randn(scenario.shape[0]) scenario['gaussian'] *= scenario['sigma'] scenario['gaussian'] += scenario['mean'] return scenario.drop(['mean', 'sigma'], axis=1)
You are already see FreePlug and RestrictedPlug, what’s it ?
RestrictedPlug
Stage are linked together to build pipeline. Some Stage accept every thing as input, like Twice, but other need specific data like Gaussian. How we know that stage can be link together and data given at the beginning of pipeline is correct for all pipeline.
Twice
Gaussian
First solution is saying : We don’t care about. During execution, if data is missing, error will be raised and it’s enough. Indeed… That’s work, but if pipeline job is heavy, takes hour, and failed just due to a misspelling column name, it’s ugly.
Plug object describe linkable constraint for Stage and Pipeline. Like Stage, Plug can be added together. In this case, constraint are merged. You can use FreePlug telling this Stage is not constraint and doesn’t expected any column name to run. Or use RestrictedPlug(inputs=[], outputs=[]) to specify inputs mandatory columns and new columns generated.
RestrictedPlug(inputs=[], outputs=[])
Plug arithmetic rules are described below (\(\emptyset\) = FreePlug)
User can create as many pipeline as he want. At the end, he could have some pipelines and input data or directly input data pre-generated. He needs to sampling this dataset to create study. For example, he could have 10 coal generation, 25 solar, 10 consumptions. He needs to create study with 100 scenarios.
Of course he can develop sampling algorithm, but he can also use Shuffler. Indeed Shuffler does a bit more than just sampling:
Shuffler
Timeline
PipelineTimeline
Below an example how to use Shuffler
shuffler = Shuffler() # Add raw data as a numpy array shuffler.add_data(name='solar', data=np.array([[1, 2, 3], [5, 6, 7]])) # Add pipeline and its input data i = pd.DataFrame({(0, 'a'): [3, 4, 5], (1, 'a'): [7, 8, 9]}) pipe = RepeatScenario(2) + ToShuffler('a') shuffler.add_pipeline(name='load', data=i, pipeline=pipe) # Shuffle to sample 3 scenarios res = shuffler.shuffle(3) # Get result according name given solar = res['solar'] load = res['load']
Optimizer is the heart of Hadar. Behind it, there are :
Study
Result
Therefore Optimizer is an abstract class build on Strategy pattern. User can select optimizer or create their own by implemented Optimizer.solve(study: Study) -> Result
Optimizer
Optimizer.solve(study: Study) -> Result
Today, two optimizers are present LPOptimizer and RemoteOptimizer
LPOptimizer
RemoteOptimizer
Let’s start by the simplest. RemoteOptimizer is a client to hadar server. As you may know Hadar exist like a python library, but has also a tiny project to package hadar inside web server. You can find more details on this server in this repository.
Client implements Optimizer interface. Like that, to deploy compute on a data-center, only one line of code changes.
import hadar as hd # Normal : optim = hd.LPOptimizer() optim = hd.RemoteOptimizer(host='example.com') res = optim.solve(study=study)
Before read this chapter, we kindly advertise you to read Linear Model
LPOptimizer translate data into optimization problem. Hadar algorithms focus only on modeling problem and uses or-tools to solve problem.
To achieve modeling goal, LPOptimizer is designed to receive Study object, convert data into or-tools Variables. Then Variables are placed inside objective and constraint equations. Equations are solved by or-tools. Finally Variables are converted to Result object.
Variables
Analyze that in details.
If you look in code, you will see three domains. One at hadar.optimizer.input, hadar.optimizer.output and another at hadar.optimizer.lp.domain . If you look carefully it seems the same Consumption , OutputConsumption in one hand, LPConsumption in other hand. The only change is a new attribute in LP* called variable . Variables are the parameters of the problem. It’s what or-tools has to find, i.e. power used for production, capacity used for border and lost of load for consumption.
hadar.optimizer.input
hadar.optimizer.output
hadar.optimizer.lp.domain
Consumption
OutputConsumption
LPConsumption
LP*
variable
Therefore, InputMapper roles are just to create new object with ortools Variables initialized, like we can see in this code snippet.
InputMapper
# hadar.optimizer.lp.mapper.InputMapper.get_var LPLink(dest=l.dest, cost=float(l.cost), src=name, quantity=l.quantity[scn, t], variable=self.solver.NumVar(0, float(l.quantity[scn, t]), 'link on {} to {} at t={} for scn={}'.format(name, l.dest, t, scn) ) )
At the end, OutputMapper does the reverse thing. LP* objects have computed Variables. We need to extract result found by or-tool to Result object.
OutputMapper
Mapping of LPProduction and LPLink are straight forward. I propose you to look at LPConsumption code
LPProduction
LPLink
self.nodes[name].consumptions[i].quantity[scn, t] = vars.consumptions[i].quantity - vars.consumptions[i].variable.solution_value()
Line seems strange due to complex indexing. First we select good node name, then good consumption i, then good scenario scn and at the end good timestep t. Rewriting without index, this line means :
Keep in mind that \(Cons_{var}\) is the lost of load. So we need to subtract it from initial consumption to get really consumption sustained.
Hadar has to build problem optimization. These algorithms are encapsulated inside two builders.
ObjectiveBuilder takes node by its method add_node. Then for all productions, consumptions, links, it adds \(variable * cost\) into objective equation.
ObjectiveBuilder
add_node
StorageBuilder build constraints for each storage element. Constraints care about a strict volume integrity (i.e. volume is the sum of last volume + input - output)
StorageBuilder
ConverterBuilder build ratio constraints between each inputs converter to output.
ConverterBuilder
AdequacyBuilder is a bit more tricky. For each node, it will create a new adequacy constraint equation (c.f. Linear Model). Coefficients, here are 1 or -1 depending of inner power or outer power. Have you seen these line ?
AdequacyBuilder
self.constraints[(t, link.src)].SetCoefficient(link.variable, -1) # Export from src self.importations[(t, link.src, link.dest)] = link.variable # Import to dest
Hadar has to set power importation to dest node equation. But maybe this node is not yet setup and its constraint equation doesn’t exist yet. Therefore it has to store all constraint equations and all link capacities. And at the end build() is called, which will add importation terms into all adequacy constraints to finalize equations.
build()
def build(self): """ Call when all node are added. Apply all import flow for each node. :return: """ # Apply import link in adequacy for (t, src, dest), var in self.importations.items(): self.constraints[(t, dest)].SetCoefficient(var, 1)
solve_batch method resolve study for one scenario. It iterates over node and time, calls InputMapper, then constructs problem with *Buidler, and asks or-tools to solve problem.
solve_batch
*Buidler
solve_lp applies the last iteration over scenarios and it’s the entry point for linear programming optimizer. After all scenarios are solved, results are mapped to Result object.
solve_lp
Scenarios are distributed over cores by mutliprocessing library. solve_batch is the compute method called by multiprocessing. Therefore all input data received by this method and output data returned must be serializable by pickle (used by multiprocessing). However, output has ortools Variable object which is not serializable.
Variable
Hadar doesn’t need complete Variable object. Indeed, it just want value solution found by or-tools. So we will help pickle by creating more simpler object, we carefully recreate same API solution_value() to be compliant with downstream code
solution_value()
class SerializableVariable(DTO): def __init__(self, var: Variable): self.val = var.solution_value() def solution_value(self): return self.val
Then specify clearly how to serialize object by implementing __reduce__ method
__reduce__
# hadar.optimizer.lp.domain.LPConsumption def __reduce__(self): """ Help pickle to serialize object, specially variable object :return: (constructor, values...) """ return self.__class__, (self.quantity, SerializableVariable(self.variable), self.cost, self.name)
It should work, but in fact not… I don’t know why, when multiprocessing want to serialize returned data, or-tools Variable are empty, and mutliprocessing failed. Whatever, we just need to handle serialization oneself
# hadar.optimizer.lp.solver._solve_batch return pickle.dumps(variables)
InputNetwork
InputNode
Production
Storage
Most important attribute could be quantity which represent quantity of power used in network. For link, is a transfert capacity. For production is a generation capacity. For consumption is a forced load to sustain.
quantity
User can construct Study step by step thanks to a Fluent API Selector
In the case of optimizer, Fluent API Selector is represented by NetworkFluentAPISelector , and NodeFluentAPISelector classes. As you assume with above example, optimizer rules for API Selector are :
NetworkFluentAPISelector
NodeFluentAPISelector
network()
node()
consumption()
converter()
To help user, quantity and cost fields are flexible:
Study includes also check mechanism to be sure: node exist, consumption is unique, etc.
Result look like Study, it has the same hierarchical structure, same element, just different naming to respect Domain Driven Development . Indeed, Result is used as output computation, therefore we can’t reuse the same object. Result is the glue between optimizer and analyzer (or any else postprocessing).
Result shouldn’t be created by user. User will only read it. So, Result has not fluent API to help construction.
For a high abstraction and to be agnostic about technology, Hadar uses objects as glue for optimizer. Objects are cool, but are too complicated to manipulated for data analysis. Analyzer contains tools to help analyzing study and result.
Today, there is only ResultAnalyzer, with two features level:
Before speaking about this features, let’s see how data are transformed.
As said above, object is nice to encapsulate data and represent it into agnostic form. Objects can be serialized into JSON or something else to be used by another software maybe in another language. But keep object to analyze data is awful.
Python has a very efficient tool for data analysis : pandas. Therefore challenge is to transform object into pandas Dataframe. Solution is to flatten data to fill into table.
For example with consumption. Data into Study is cost and asked quantity. And in Result it’s cost (same) and given quantity. This tuple (cost, asked, given) is present for each node, each consumption attached on this node, each scenario and each timestep. If we want to flatten data, we need to fill this table
It is the purpose of _build_consumption(study: Study, result: Result) -> pd.Dataframe to build this array
_build_consumption(study: Study, result: Result) -> pd.Dataframe
Production follow the same pattern. However, they don’t have asked and given but available and used quantity. Therefore table looks like
It’s done by _build_production(study: Study, result: Result) -> pd.Dataframe method.
_build_production(study: Study, result: Result) -> pd.Dataframe
Storage follow the same pattern. Therefore table looks like.
It’s done by _build_storage(study: Study, result: Result) -> pd.Dataframe method.
_build_storage(study: Study, result: Result) -> pd.Dataframe
Link follow the same pattern. Hierarchical structure naming change. There are not node and name but source and destination. Therefore table looks like.
It’s done by _build_link(study: Study, result: Result) -> pd.Dataframe method.
_build_link(study: Study, result: Result) -> pd.Dataframe
Converter follow the same pattern, it just split in two tables. One for source element:
It’s done by _build_src_converter(study: Study, result: Result) -> pd.Dataframe method.
_build_src_converter(study: Study, result: Result) -> pd.Dataframe
And an other for destination element, tables are near identical. Source has special attributes called ratio and destintion has special attribute called cost:
It’s done by _build_dest_converter(study: Study, result: Result) -> pd.Dataframe method.
_build_dest_converter(study: Study, result: Result) -> pd.Dataframe
When you observe flat data, there are two kind of data. Content like cost, given, asked and index describes by node, name, scn, t.
Low level API analysis provided by ResultAnalyzer lets user to
User can said, I want ‘fr’ node productions for first scenario to 50 until 60 timestep. In this cas ResultAnalyzer will return
60
If first index like node and scenario has only one element, there are removed.
This result can be done by this line of code.
agg = hd.ResultAnalyzer(study, result) df = agg.network().node('fr').scn(0).time(slice(50, 60)).production()
For analyzer, Fluent API respect these rules:
time()
scn()
link()
production()
Behind this mechanism, there are Index objects. As you can see directly in the code
Index
... self.consumption = lambda x=None: self._append(ConsIndex(x)) ... self.time = lambda x=None: self._append(TimeIndex(x)) ...
Each kind of index has to inherent from this class. Index object encapsulate column metadata to use and range of filtered elements to keep (accessible by overriding __getitem__ method). Then, Hadar has child classes with good parameters : ConsIndex , ProdIndex , NodeIndex , ScnIndex , TimeIndex , LinkIndex , DestIndex . For example you can find below NodeIndex implementation
__getitem__
ConsIndex
ProdIndex
NodeIndex
ScnIndex
TimeIndex
LinkIndex
DestIndex
class NodeIndex(Index[str]): """Index implementation to filter nodes""" def __init__(self): Index.__init__(self, column='node')
Index instantiation are completely hidden for user. Then, hadar will
_assert_index
_pivot
_remove_useless_index_level
As you can see, low level analyze provides efficient method to extract data from adequacy study result. However data returned remains a kind of roots and is not ready for business purposes.
Unlike low level, high level focus on provides ready to use data. Unlike low level, features should be designed one by one for business purpose. Today we have 2 features:
get_cost(self, node: str) -> np.ndarray:
get_balance(self, node: str) -> np.ndarray
Even with the highest level analyzer features. Data remains simple matrix or tables. Viewer is the end of Hadar framework, it will create amazing plot to bring most valuable data for human analysis.
Viewer use Analyzer API to build plots. It like an extract layer to convert numeric result to visual result.
Viewer is split in two domains. First part implements the FluentAPISelector, use ResultAnalyzer to compute result and perform last compute before display graphics. This behaviour are coded inside all *FluentAPISelector classes.
*FluentAPISelector
These classes are directly used by user when asking for a graphics
plot = ... plot.network().node('fr').consumption('load').gaussian(t=4) plot.network().map(t=0, scn=0) plot.network().node('de').stack(scn=7)
For Viewer, Fluent API has these rules:
network
Second part belonging to Viewer is only for plotting. Hadar can handle many different libraries and technologies for plotting. New plotting has just to implement ABCPlotting and ABCElementPlotting . Today one HTML implementation exist with plotly library inside HTMLPlotting and HTMLElementPlotting.
ABCPlotting
ABCElementPlotting
HTMLPlotting
HTMLElementPlotting
Data send to plotting classes are complete, pre-computed and ready to display.
The main optimizer is LPOptimizer. It creates linear programming problem representing network adequacy. We will see mathematics problem, step by step
Basic adequacy equations
Add lack of adequacy terms (lost of load and spillage)
As you will see, \(\Gamma_x\) represents a quantity in network, \(\overline{\Gamma_x}\) is the maximum, \(\underline{\Gamma_x}\) is the minimum, \(\overline{\underline{\Gamma_x}}\) is the maximum and minimum a.k.a it’s a forced quantity. Upper case grec letter is for quantity, and lower case grec letter is for cost \(\gamma_x\) associated to this quantity.
Let’s begin by the first adequacy behavior. We have a graph \(G(N, L)\) with \(N\) nodes on the graph and \(L\) unidirectional edges on this graph.
Edge variables
Productions variables
Consumptions variables
First constraint is from Kirschhoff law and describes balance between productions and consumptions
Then productions and edges need to be bounded
Sometime, there are a lack of adequacy because there are not enough production, called lost of load.
Like \(\Gamma_x\) means quantity present in network, \(\Lambda_x\) represents a lack in network (consumption or production) to reach adequacy. Like for \(\Gamma_x\) , lower case grec letter \(\lambda_x\) is for cost associated to this lack.
Objective has a new term
Kirschhoff law needs an update too. Lost of Load is represented like a fantom import of energy to reach adequacy.
Lost of load must be bounded
Storage is a element inside Hadar to store quantity on a node. We have:
Kirschhoff law needs an update too. Warning with naming : Input flow for storage is a output flow for node, so goes into consuming flow. And as you assume output flow for storage is a input flow for node, and goes into production flow.
And all these things are bounded :
Storage has also a new constraint. This constraint applies over time to ensure capacity integrity.
Hadar handle multi-energies. In the code, one energy lives inside one network. Multi-energies means multi-networks. Mathematically, there are all the same. That why we don’t talk about multi graph, there are always one graph \(G\), nodes remains the same, with same equation for every kind of energies.
The only difference is how we link node together. If nodes belongs to same network, we use link (or edge) seen before. When nodes belongs to different energies we need to use converter. All things above remains true, we just add now a new element \(V\) converters ont this graph \(G(N, L, V)\) .
Converter can take energy form many nodes in different network. Each converter input has a ratio between output quantity and input quantity. Converter has only one output to only on node.
Of course Kirschhoff need a little update. Like for storage Warning with naming ! Converter input is a consuming flow for node, converter output is a production flow for node.
Now, we need to fix ratios conversion by a new constraints
First off, thank you to considering contributing to Hadar. We believe technology can change the world. But only great community and open source can improve the world.
Following these guidelines helps to communicate that you respect the time of the developers managing and developing this open source project. In return, they should reciprocate that respect in addressing your issue, assessing changes, and helping you finalize your pull requests.
We try to describe most of Hadar behavior and organization to avoid any shadow part. Additionally, you can read Dev Guide section or Architecture to learn hadar purposes and processes.
You can participate on Hadar from many ways:
Issue tracker are only for features, bug or improvment; not for support. If you have some question please go to TODO . Any support issue will be closed.
Little changes can be directly send into a pull request. Like :
For all other, you need first to create an issue. If issue receives good feedback. Then you can fork project, work on your side and send a Pull Request
If you find a security bug, please DON’T create an issue. Contact use at admin@hadar-simulator.org
First be sure it’s a bug and not a misuse ! Issues are not for technical support. To speed up bug fixing (and avoid misuse), you need to clearly explain bug, with most simple step by step guide to reproduce bug. Specify us all details like OS, Hadar version and so on.
Please provide us response to these questions
- What version of Hadar and python are you using ? - What operating system and processor architecture are you using? - What did you do? - What did you expect to see? - What did you see instead?
We try to code the most clear and maintainable software. Your Pull Request has to follow some good practices:
TL;TR: code as Uncle Bob !
Hadar repository is split in many parts.
hadar/
tests/
examples/
docs/
.github/
We use all github features to organize development. We implement a Agile methodology and try to recreate Jira behavior in github. Therefore we swap Jira features to Github such as :
We respect git flow pattern. Main developments are on develop branch. We accept feature/** branch but is not mandatory.
develop
feature/**
CI pipelines are backed on git flow, actions are sum up in table below :
hadar.analyzer.result.
Bases: object
object
Single object to encapsulate all postprocessing aggregation.
check_index
Check indexes cohesion :param indexes: list fo indexes :param type: Index type to check inside list :return: true if at least one type is in list False else
filter
Aggregate according to index level and filter.
get_balance
Compute balance over time on asked node.
timeline array with balance exchanges value
get_cost
Compute adequacy cost on a node, network or whole study.
matrix (scn, time)
get_elements_inside
Get numbers of elements by node.
(nb of consumptions, nb of productions, nb of storages, nb of links (export), nb of converters (export), nb of converters (import)
get_rac
Compute Remain Availabale Capacities on network.
horizon
Shortcut to get study horizon.
nb_scn
Shortcut to get study number of scenarios.
Entry point for fluent api :param name: network name. ‘default’ as default :return: Fluent API Selector
nodes
Shortcut to get list of node names
Fluent Api Selector to analyze network element.
User can join network, node, consumption, production, link, time, scn to create filter and organize hierarchy. Join can me in any order, except: - join begin by network - join is unique only one element of node, time, scn are expected for each query - production, consumption and link are excluded themself, only on of them are expected for each query
FULL_DESCRIPTION
hadar.optimizer.domain.input.
Bases: hadar.optimizer.utils.JSON
hadar.optimizer.utils.JSON
Consumption element.
from_json
Link element
Production element
Storage element
Converter element
to_json
Network element
Node element
Main object to facilitate to build a study
add_link
Add a link inside network.
add_network
Entry point to create study with the fluent api.
Network level of Fluent API Selector.
build
Build study.
converter
Add a converter element.
link
Add a link on network.
NetworkAPISelector with new link.
Go to network level.
node
Go to node level.
Node level of Fluent API Selector
consumption
Add consumption on node.
NodeFluentAPISelector with new consumption
Go to different node level.
production
Add production on node.
NodeFluentAPISelector with new production
storage
Create storage.
to_converter
Add an ouptput to converter.
hadar.optimizer.domain.numeric.
ColumnNumericValue
Bases: hadar.optimizer.domain.numeric.NumpyNumericalValue
hadar.optimizer.domain.numeric.NumpyNumericalValue
Implementation with one time step by scenario with shape (nb_scn, 1)
flatten
flat data into 1D matrix. :return: [v[0, 0], v[0, 1], v[0, 2], …, v[1, i], v[2, i], …, v[j, i])
MatrixNumericalValue
Implementation with complex matrix with shape (nb_scn, horizon)
NumericalValue
Bases: hadar.optimizer.utils.JSON, abc.ABC, typing.Generic
abc.ABC
typing.Generic
Interface to handle numerical value in study
NumericalValueFactory
create
NumpyNumericalValue
Bases: hadar.optimizer.domain.numeric.NumericalValue, abc.ABC
hadar.optimizer.domain.numeric.NumericalValue
Half-implementation with numpy array as numerical value. Implement only compare methods.
RowNumericValue
Implementation with one scenario wiht shape (horizon, ).
ScalarNumericalValue
Bases: hadar.optimizer.domain.numeric.NumericalValue
Implement one scalar numerical value i.e. float or int
hadar.optimizer.domain.output.
OutputProduction
OutputNode
build_like_input
Use an input node to create an output node. Keep list elements fill quantity by zeros.
OutputNode like InputNode with all quantity at zero
OutputStorage
OutputLink
Consumption element
OutputNetwork
OutputConverter
Result of study
hadar.optimizer.lp.domain.
JSONLP
Bases: hadar.optimizer.utils.JSON, abc.ABC
Bases: hadar.optimizer.lp.domain.JSONLP
hadar.optimizer.lp.domain.JSONLP
Consumption element for linear programming.
LPConverter
Converter element for linear programming
Link element for linear programming
LPNetwork
Network element for linear programming
LPNode
Node element for linear programming
Production element for linear programming.
LPStorage
LPTimeStep
create_like_study
hadar.optimizer.lp.mapper.
Input mapper from global domain to linear programming specific domain
get_conv_var
Map Converter to LPConverter.
get_node_var
Map InputNode to LPNode.
LPNode according to node name at t in study
Output mapper from specific linear programming domain to global domain.
get_result
Get result.
set_converter_var
set_node_var
Map linear programming node to global node (set inside intern attribute).
None (use get_result)
hadar.optimizer.lp.optimizer.
Build adequacy flow constraint.
add_converter
Add converter element in equation. Sources are like consumptions, destination like production
Add flow constraint for a specific node.
Call when all node are added. Apply all import flow for each node.
ConverterMixBuilder
Build equation to determine ratio mix between sources converter.
Build objective cost function.
Add converter. Apply cost on output of converter.
Add cost in objective for each node element.
Build storage constraints
Solve adequacy flow problem with a linear optimizer.
Result object with optimal solution
hadar.optimizer.remote.optimizer.
ServerError
Bases: Exception
Exception
check_code
solve_remote
Send study to remote server.
result received from server
hadar.optimizer.optimizer.
Bases: hadar.optimizer.optimizer.Optimizer
hadar.optimizer.optimizer.Optimizer
Basic Optimizer works with linear programming.
solve
Solve adequacy study.
Use a remote optimizer to compute on cloud.
hadar.optimizer.utils.
DTO
Implement basic method for DTO objects
JSON
Bases: hadar.optimizer.utils.DTO, abc.ABC
hadar.optimizer.utils.DTO
Object to be serializer by json
convert
hadar.viewer.abc.
Bases: abc.ABC
Abstract interface to implement to plot graphics
candles
Plot candle stick with open close :param open: candle open data :param close: candle close data :param title: title to plot :return:
gaussian
Plot gaussian.
map_exchange
Plot map with exchanges as arrow.
matrix
Plot matrix (heatmap)
monotone
Plot monotone.
stack
Plot stack.
Plot timeline with all scenarios.
Abstract method to plot optimizer result.
Entry point to use fluent API.
ConsumptionFluentAPISelector
Bases: hadar.viewer.abc.FluentAPISelector
hadar.viewer.abc.FluentAPISelector
Consumption level of fluent api.
Plot gaussian graphics
Plot monotone graphics.
Plot timeline graphics. :return:
DestConverterFluentAPISelector
Source converter level of fluent api
FluentAPISelector
not_both
LinkFluentAPISelector
Link level of fluent api
Network level of fluent API
map
Plot map exchange graphics
Go to node level fo fluent API :param node: node name :return: NodeFluentAPISelector
rac_matrix
plot RAC matrix graphics
Node level of fluent api
Go to consumption level of fluent API
from_converter
get a converter importation level fluent API :param name: :return:
got to link level of fluent API
Go to production level of fluent API
Plot with production stacked with area and consumptions stacked by dashed lines.
plotly figure or jupyter widget to plot
Got o storage level of fluent API
get a converter exportation level fluent API :param name: :return:
ProductionFluentAPISelector
Production level of fluent api
SrcConverterFluentAPISelector
StorageFluentAPISelector
Storage level of fluent API
hadar.viewer.html.
Bases: hadar.viewer.abc.ABCPlotting
hadar.viewer.abc.ABCPlotting
Plotting implementation interactive html graphics. (Use plotly)
hadar.workflow.pipeline.
Bases: hadar.workflow.pipeline.Plug
hadar.workflow.pipeline.Plug
Implementation where stage expect presence of precise columns.
linkable_to
Defined if next stage is linkable with current. In this implementation, plug is linkable only if input of next stage are present in output of current stage.
Plug implementation when stage can use any kind of DataFrame, whatever columns present inside.
Defined if next stage is linkable with current. In this implementation, plug is always linkable
Abstract method which represent an unit of compute. It can be addition with other to create workflow pipeline.
build_multi_index
Create column multi index.
multi-index like [(scn, type), …]
get_names
get_scenarios
standardize_column
Timeline must have first column for scenario and second for data timeline. Add the Oth scenario index if not present.
Bases: hadar.workflow.pipeline.Stage, abc.ABC
hadar.workflow.pipeline.Stage
Stage focuses on same behaviour for any scenarios.
Drop
Bases: hadar.workflow.pipeline.Stage
Drop columns by name.
Rename
Rename column names.
Bases: hadar.workflow.pipeline.FocusStage
hadar.workflow.pipeline.FocusStage
Generate a random fault for each scenarios.
Repeat n-time current scenarios.
ToShuffler
Bases: hadar.workflow.pipeline.Rename
hadar.workflow.pipeline.Rename
To Connect pipeline to shuffler
Pipeline
Compute many stages sequentially.
assert_computable
Verify timeline is computable by pipeline.
assert_to_shuffler
Clip
Cut data according to upper and lower boundaries. Same as np.clip function.
hadar.workflow.shuffler.
Receive all data sources like raw matrix or pipeline. Schedule pipeline generation and shuffle all timeline to create scenarios.
add_data
Add raw data by numpy array. If you generate data by pipeline use add_pipeline. It will parallelize computation and manage swap. :param name: timeline name :param data: numpy array with shape as (scenario, horizon) :return: self
add_pipeline
Add data by pipeline and input data for pipeline.
self
shuffle
Start pipeline generation and shuffle result to create scenario sampling.
Manage data used to generate timeline. Perform sampling too.
compute
Compute method called before sampling. For Timeline method just return data.
sample
Perform sampling. Compute data is needed before.