Chaos Engineeringを調べてるときに「OpenStackにもFault injectionでOpenStackを試験するプロジェクトがあるよ」と聞いたので調べてみた。
OpenStack Eris
OpenStackに様々な負荷をかけて、OpenStackの性能改善やResilienceにするためのテストフレームワークや、テストスートを作るプロジェクト
- Load Injection
- Fault Injection
もできる(予定)
OpenStack Erisに関する情報
OpenStack Summit Sydney のフォーラムに2件あったと教えてもらった
Extreme/Destructive Testing
LCOO = Large Contributing OpenStack Operators
フォーラムのディスカッション内容
- What sort of tests do you run today that are destructive in nature?
- What is your desired workflow for issues that come up?
- How do you determine success/failure of your testing scenarios?
- Do you publicly publish your results (if so where)?
- What KPI do you evaluate today?
- What workloads do you run and against what architectures?
sessionは3part
- References: Extreme testing is non-deterministic. Such testing generally is valid with the following reference.
- Test Suite: What test scenarios are we trying to get done?
- Frameworks & Tools: How do we enable the test suite?
etherpadはこっち https://etherpad.openstack.org/p/SYD-extreme-testing
フォーラムではどういうテストをするのかという議論が行われていたっぽい。 また、Rallyとは何が違うのかというQAがあったのもログに残っている。
- What is the difference between Eris and Rally? (From spec it looks the same) (boris-42) (samP) Discussion is here http://lists.openstack.org/pipermail/openstack-dev/2017-November/124156.html (gautamdivgi) We are looking for a solution/mechanism where there the capacity for - Load generation (Control + Data Plane) - Failure injection (all over the cloud) - Data capture (all touch points - not just API requests) - KPI calculation (all touch points - not just API requests) - Most importantly - have one part "influence" the other (E.g. run failures or a sequence of failures based on system criteria like netstat session counts or free memory remaining) Rally does load generation and there are failure injection hooks - but cannot participate in more complex feedback, monitoring & computation mechanisms. Rally is used today to generate load. Definitely looking to contribute some plugins for load gen & metrics collections enhancements.
LCOO-Extreme Testing-QA-ERIS
ここではErisのデモ行われたらしいがフォーラムだったのでビデオはないらしい。
Erisのデモ用の手順
etherpadはこちら https://etherpad.openstack.org/p/LCOO-Extreme_Testing-QA-ERIS
os-faultsなどが使われるのかというQAのログが残っているがframeworkのプラグインにすればという話が出ている
Will Eris support os-faults in the future? https://github.com/openstack/os-faults - Need for extensible pluggable framework for fault injection, e.g. - Define levels of redundancy within infrastructure and inject faults up to the maximum which can be tolerated given that amount of redundancy (e.g. 2 node failures in a 5-node cluster) - Use os-faults to test failure scenarios defined by Self-Healing SIG and ensure that self-healing works - Preference for reusing existing projects if possible - If not possible, include gaps analysis in the spec Maybe split spec into smaller pieces? Can some example results / demo be published so that people can start looking at Eris and get a feel for what it can do?
まとめ
2年前ぐらいの情報しか見当たらないので特に進展はなさそうなのでもしOpenStackにたいしてChaos Engineeringをするのであれば別のツールを使ったほうが良いかも知れない。