GitXplorerGitXplorer
j

safe_storage_timeouts

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
349b1470aeb35b9bb53bc44b2785de5fbc0cf58d

Update README.md to mention future test development

jjonathanunderwood committed 4 years ago
Unverified
695410049c230ccff06f2eb9fded42f8c520eae8

Run molecule container in privileged mode

jjonathanunderwood committed 4 years ago
Unverified
e0970d1ed1cb2b7ad98b3055dab4a3a1a418ed1a

Fix installation of udev in molecule container

jjonathanunderwood committed 4 years ago
Unverified
fcacbb0c3d1fb6c7872c08b3681fb207472c62b6

Set base docker image to fedora:latest

jjonathanunderwood committed 4 years ago
Unverified
240fde94715d717cf7c0f9331233663c28e0941b

Set dockerfile in molecule.yml

jjonathanunderwood committed 4 years ago
Unverified
f21e00c50795c91e44abe2eb76c1750226142136

Add Dockerfile.j2

jjonathanunderwood committed 4 years ago

README

The README file for this repository.

safe-storage-timeouts

This Ansible module adds a udev rule that attempts to set safe kernel driver timeouts for drives depending on whether they have SCTERC/TLER functionality enabled.

This is an attempt to address the issue of drives dropping out of RAID arrays caused by incorrect timeouts.

An earlier effort at managing this situation identified only dnly those disks with redundant (raid1 or higher) mdraid partitions and set the timeout for those disks. However, as Chris Murphy points out that approach is insufficient as the problem with incorrect timeouts also affects non-redundant disks and other RAID implementations such as BTRFS.

The approach taken with this module is to has a udev rule that sets the kernel driver timeout for each drive using the following logic:

  1. If the drive has SCTERC enabled, then set the kernel timeout to be around 5 secs more than the SCTERC read timeout;2. Otherwise, if the drive has SCTERC functionality disabled, attempt to activate it and set a suitable device read and write timeout of 7 seconds
  2. If activating SCTERC fails or if SCTERC functionality is not present, set the kernel timeout to 180 secs.

Requirements

This role will install the smartmontools package using the operating system package manager. The role requires a version of smartmontools recent enough to support the --json command line option of the smartctl command.

Role Variables

Variable Default Description
helper_script_dir /usr/local/lib/udev The location to install the required udev helper script to.
smartmontools_pkg_name smartmontools The distribution package name for smartmontools
devices [] (empty list) A list of devices to apply the rules too. If the list is empty, the rules will be applied to all devices

Dependencies

None

Example Playbook

- hosts: servers
  roles:
    - role: safe-storage-timeouts

Possible Improvements

  1. Expose the device and kernel timeouts as variables that can be set. As a first step this would be values applied to all devices, and as a follow on, it might be useful to be able to set values per device.
  2. Testing: molecule is set up and configured for testing, and at present checks that the role successfully runs with default values, and also lints the code. Expanding the testing would be valuable for the future.

License

GPL 3.0+

Author Information

Jonathan G. Underwood

Further Reading