GitXplorerGitXplorer
M

skirmish

public
2 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
1376a662743548fd35c7ec6e65c1d78bc32f9249

Ignore sleep time if you're running DryRun mode

committed 5 years ago
Unverified
f0c33ae66b55c1b722939fc53727824c94a75b13

Make things easier to process

committed 5 years ago
Unverified
8270662fde2d80efaa0b43ba176cb7f2a6db8560

Fixed instance killer to work

committed 5 years ago
Unverified
a922db6b5b1d2307e366ef239e8295a4a09e4a32

Adding more info for those who want it

committed 5 years ago
Unverified
44d17d934263871ae444b173c99f731674ab6f25

Ensured the correct level of things were being processed

committed 5 years ago
Unverified
3ef052c3b5c32559455d552f393dd5c6facee823

Fixing liniting issue

committed 5 years ago

README

The README file for this repository.

Skirmish

A game day orchestration tool
Build Status Maintainability Go Report Card Docker Repository on Quay
This application allows for users to run scripted game day events and be able to restore services if required.

Usage

In order to run skirmish, you need to follow the steps here so that the application can assume a role. If the skirmish is intended to run across multiple projects, then the account will need to have the correct permissions in each one.

To start using skirmish, it is as simple as:

skirmish --plan-path path/to/plan.yml

Considering a skirmish can run for over several hours, it is not recommend running within a CI environment that has timed usage.

Note: Skirmish has checks inbuilt to ensure it can restore services if repairable but it makes no guarantees if it receives a SIGKILL.

An example of a game day plan would be:

mode: repairable
projects: # projects defined here ensure the steps will fail if they are mistyped or should be part of the game day
    - staging
    - canary
steps:
    - name: Fail random instances
      description: |-
        Ensure that the systems are inplace for ensuring instance count or 
        that the correct procedure and alerts happen.
      operations: 
        - instance
      projects:
        - staging
      exclude:
        wildcards:        # wildcards support regular expressions
          - "data-node*"
          - "demo-server"
        regions:          # regions / zones support prefix matching
          - "us-west"
      wait: "10m"         # wait 10 minutes to restore instances
      sample: 80.0        # each valid instance will have an 80% chance of being paused
    - name: Stop communication of integration platform components
      description: |-
        Ensure that our platform is still operational when the integration pipeline is cut off
        from communicating
      operations: 
        - egress
      projects:
        - canary
      settings:
        network:
          name: data-ingestion
          deny:
            - protocol: "tcp"
              ports:
                - "8080"
                - "443"
                - "80"
      wait: "30m"
    - name: enstil fear in the cold hearted
      description: |- 
        Let the orchestration platform destroy as much of the project as possible to highlight worst case scenario.
        This will automatically recover once the step has reached its wait time.
      operations:
        - instance
        - egress
        - ingress
      projects:
        - staging
        - canary
      settings:
        network:
          # apply to the default work by leaving name blank
          deny:
            - protocol: "tcp"
              ports:
                - "443"
                - "80"
       wait: "20m"