GitXplorerGitXplorer
n

sanlock

public
11 stars
7 forks
0 issues

Commits

List of commits on branch master.
Unverified
1b306f0abb173393cb478f09d58a9b0777bc1387

README.dev: update userstorage doc

aaesteve-rh committed 2 years ago
Unverified
38ff68588078af1664787e20256ffba3ae166489

userstorage: replace by PyPI userstorage

aaesteve-rh committed 2 years ago
Unverified
7de76581b25d36375638478fb487ac45ef9925df

tox.ini: add py310 to environments

aaesteve-rh committed 2 years ago
Unverified
0cb0978b7125609443bd69bba2059e68b4b18e14

README.dev: update prerequisites

aaesteve-rh committed 2 years ago
Unverified
45563a6e0c9d388f58178e03c9dd0b2708b2969c

wdmd: close watchdog when not used

tteigland committed 2 years ago
Unverified
748e8325fd0b2e09469c76f584b8e08c1ef03ca6

watchdog timeout configuration

tteigland committed 2 years ago

README

The README file for this repository.

See https://pagure.io/sanlock

Mailing list https://lists.fedorahosted.org/admin/lists/sanlock-devel.lists.fedorahosted.org/

From sanlock(8) at sanlock.git/src/sanlock.8

::

SANLOCK(8) System Manager's Manual SANLOCK(8)

NAME sanlock - shared storage lock manager

SYNOPSIS sanlock [COMMAND] [ACTION] ...

DESCRIPTION sanlock is a lock manager built on shared storage. Hosts with access to the storage can perform locking. An application running on the hosts is given a small amount of space on the shared block device or file, and uses sanlock for its own application-specific synchroniza‐ tion. Internally, the sanlock daemon manages locks using two disk- based lease algorithms: delta leases and paxos leases.

   · delta leases are slow to acquire and demand  regular  i/o  to  shared
     storage.   sanlock  only  uses them internally to hold a lease on its
     "host_id" (an integer host identifier from 1-2000).  They prevent two
     hosts  from using the same host identifier.  The delta lease renewals
     also indicate if a host is alive.  ("Light-Weight Leases for Storage-
     Centric Coordination", Chockler and Malkhi.)

   · paxos  leases are fast to acquire and sanlock makes them available to
     applications as general purpose  resource  leases.   The  disk  paxos
     algorithm uses host_id's internally to represent different hosts, and
     the owner of a paxos lease.  delta leases  provide  unique  host_id's
     for  implementing  paxos  leases, and delta lease renewals serve as a
     proxy for paxos lease renewal.  ("Disk Paxos", Eli Gafni  and  Leslie
     Lamport.)

   Externally, the sanlock daemon exposes a locking interface through lib‐
   sanlock in terms of "lockspaces" and "resources".   A  lockspace  is  a
   locking  context that an application creates for itself on shared stor‐
   age.  When the application on each host  is  started,  it  "joins"  the
   lockspace.  It can then create "resources" on the shared storage.  Each
   resource represents an application-specific  entity.   The  application
   can acquire and release leases on resources.

   To use sanlock from an application:

   · Allocate  shared  storage for an application, e.g. a shared LUN or LV
     from a SAN, or files from NFS.

   · Provide the storage to the application.

   · The application  uses  this  storage  with  libsanlock  to  create  a
     lockspace and resources for itself.

   · The application joins the lockspace when it starts.

   · The application acquires and releases leases on resources.

   How lockspaces and resources translate to delta leases and paxos leases
   within sanlock:

   Lockspaces

   · A lockspace is based on delta leases held  by  each  host  using  the
     lockspace.

   · A  lockspace  is  a series of 2000 delta leases on disk, and requires
     1MB of storage.  (See Storage below for size variations.)

   · A lockspace can support up to 2000 concurrent hosts  using  it,  each
     using a different delta lease.

   · Applications  can  i)  create,  ii)  join and iii) leave a lockspace,
     which corresponds to i) initializing the set of delta leases on disk,
     ii)  acquiring  one  of the delta leases and iii) releasing the delta
     lease.

   · When a lockspace is created, a unique lockspace name and  disk  loca‐
     tion is provided by the application.

   · When a lockspace is created/initialized, sanlock formats the sequence
     of 2000 on-disk delta lease structures on  the  file  or  disk,  e.g.
     /mnt/leasefile (NFS) or /dev/vg/lv (SAN).

   · The  2000  individual  delta  leases in a lockspace are identified by
     number: 1,2,3,...,2000.

   · Each delta lease is a 512 byte sector in the 1MB lockspace, offset by
     its  number,  e.g. delta lease 1 is offset 0, delta lease 2 is offset
     512, delta lease 2000 is offset 1023488.  (See Storage below for size
     variations.)

   · When  an application joins a lockspace, it must specify the lockspace
     name, the lockspace location  on  shared  disk/file,  and  the  local
     host's  host_id.  sanlock then acquires the delta lease corresponding
     to the host_id, e.g. joining the lockspace with  host_id  1  acquires
     delta lease 1.

   · The  terms  delta  lease, lockspace lease, and host_id lease are used
     interchangably.

   · sanlock acquires a delta lease by writing the host's unique  name  to
     the delta lease disk sector, reading it back after a delay, and veri‐
     fying it is the same.

   · If a unique host name is not specified, sanlock generates a  uuid  to
     use  as  the host's name.  The delta lease algorithm depends on hosts
     using unique names.

   · The application on each host  should  be  configured  with  a  unique
     host_id, where the host_id is an integer 1-2000.

   · If hosts are misconfigured and have the same host_id, the delta lease
     algorithm is designed to detect this conflict, and only one host will
     be able to acquire the delta lease for that host_id.

   · A  delta  lease  ensures  that a lockspace host_id is being used by a
     single host with the unique name specified in the delta lease.

   · Resolving delta lease conflicts is slow,  because  the  algorithm  is
     based  on waiting and watching for some time for other hosts to write
     to the same delta lease sector.  If multiple hosts  try  to  use  the
     same  delta  lease,  the delay is increased substantially.  So, it is
     best to configure applications to use unique host_id's that will  not
     conflict.

   · After sanlock acquires a delta lease, the lease must be renewed until
     the application leaves the lockspace (which corresponds to  releasing
     the delta lease on the host_id.)

   · sanlock  renews delta leases every 20 seconds (by default) by writing
     a new timestamp into the delta lease sector.

   · When a host acquires a delta lease in a lockspace, it can be referred
     to  as "joining" the lockspace.  Once it has joined the lockspace, it
     can use resources associated with the lockspace.

   Resources

   · A lockspace is a  context  for  resources  that  can  be  locked  and
     unlocked by an application.

   · sanlock  uses  paxos  leases  to  implement leases on resources.  The
     terms paxos lease and resource lease are used interchangably.

   · A paxos lease exists on shared storage and requires 1MB of space.  It
     contains a unique resource name and the name of the lockspace.

   · An  application assigns its own meaning to a sanlock resource and the
     leases on it.  A sanlock resource could represent some shared  object
     like a file, or some unique role among the hosts.

   · Resource leases are associated with a specific lockspace and can only
     be used by hosts that have joined that lockspace (they are holding  a
     delta lease on a host_id in that lockspace.)

   · An  application  must  keep  track  of  the  disk  locations  of  its
     lockspaces and resources.  sanlock does not maintain  any  persistent
     index  or directory of lockspaces or resources that have been created
     by applications, so applications need to  remember  where  they  have
     placed their own leases (which files or disks and offsets).

   · sanlock  does  not  renew  paxos leases directly (although it could).
     Instead, the renewal of a host's delta lease represents  the  renewal
     of  all  that  host's  paxos  leases  in the associated lockspace. In
     effect, many paxos lease renewals are factored  out  into  one  delta
     lease renewal.  This reduces i/o when many paxos leases are used.

   · The  disk  paxos  algorithm  allows  multiple hosts to all attempt to
     acquire the same paxos lease at once, and will produce a single  win‐
     ner/owner  of  the  resource lease.  (Shared resource leases are also
     possible in addition to the default exclusive leases.)

   · The disk paxos algorithm involves a specific sequence of reading  and
     writing  the  sectors  of the paxos lease disk area.  Each host has a
     dedicated 512 byte sector in the  paxos  lease  disk  area  where  it
     writes  its own "ballot", and each host reads the entire disk area to
     see the ballots of other hosts.  The first sector of the disk area is
     the  "leader  record" that holds the result of the last paxos ballot.
     The winner of the paxos ballot writes the result of the ballot to the
     leader  record  (the  winner  of the ballot may have selected another
     contending host as the owner of the paxos lease.)

   · After a paxos lease is acquired, no further i/o is done in the  paxos
     lease disk area.

   · Releasing  the  paxos lease involves writing a single sector to clear
     the current owner in the leader record.

   · If a host holding a paxos lease fails, the disk  area  of  the  paxos
     lease  still  indicates  that  the paxos lease is owned by the failed
     host.  If another host attempts to acquire the paxos lease, and finds
     the  lease  is held by another host_id, it will check the delta lease
     of that host_id.  If the delta lease of the host_id is being renewed,
     then  the  paxos lease is owned and cannot be acquired.  If the delta
     lease of the owner's host_id has expired, then  the  paxos  lease  is
     expired  and  can  be  taken  (by going through the paxos lease algo‐
     rithm.)

   · The "interaction" or "awareness" between hosts of each other is  lim‐
     ited  to the case where they attempt to acquire the same paxos lease,
     and need to check if the referenced delta lease has expired or not.

   · When hosts do not attempt to lock the  same  resources  concurrently,
     there  is  no host interaction or awareness.  The state or actions of
     one host have no effect on others.

   · To speed up checking delta lease expiration (in the case of  a  paxos
     lease  conflict), sanlock keeps track of past renewals of other delta
     leases in the lockspace.

   Resource Index

   The resource index (rindex) is an optional sanlock feature that  appli‐
   cations  can  use to keep track of resource lease offsets.  Without the
   rindex, an application must keep track of  where  its  resource  leases
   exist on disk and find available locations when creating new leases.

   The  sanlock  rindex  uses  two  align-size areas on disk following the
   lockspace.  The first area holds rindex entries; each entry  records  a
   resource  lease  name  and  location.   The second area holds a private
   paxos lease, used by sanlock internally to protect rindex updates.

   The application creates the rindex on disk with the "format"  function.
   Format  is  a  disk-only  operation and does not interact with the live
   lockspace, so it can be called  without  first  calling  add_lockspace.
   The application needs to follow the convention of writing the lockspace
   at the start of the device (offset 0) and formatting the rindex immedi‐
   ately  following  the lockspace area.  When formatting, the application
   must set flags for sector size and align size to match  those  for  the
   lockspace.

   To use the rindex, the application:

   · Uses  the  "create"  function to create a new resource lease on disk.
     This takes the place of  the  write_resource  function.   The  create
     function  requires the location of the rindex and the name of the new
     resource lease.  sanlock finds a free  lease  area,  writes  the  new
     resource  lease  at  that  location,  updates  the  rindex  with  the
     name:offset, and returns the offset to the caller.  The  caller  uses
     this offset when acquiring the resource lease.

   · Uses  the  "delete"  function to remove a resource disk on disk (also
     corresponding to the write_resource function.)   sanlock  clears  the
     resource  lease  and  the  rindex entry for it.  A subsequent call to
     create may use this same  disk  location  for  a  different  resource
     lease.

   · Uses the "lookup" function to discover the offset of a resource lease
     given the resource lease name.  The caller would typically call  this
     prior to acquiring the resource lease.

   · Uses  the  "rebuild" function to recreate the rindex if it is damaged
     or becomes inconsistent.  This function scans the disk  for  resource
     leases and creates new rindex entries to match the leases it finds.

   · The  "update" function manipulates rindex entries directly and should
     not normally be used by the application.  In normal usage, the create
     and  delete  functions  manipulate  rindex entries.  Update is mainly
     useful for testing or repairs.

   Expiration

   · If a host fails to renew its delta lease, e.g. it  looses  access  to
     the  storage, its delta lease will eventually expire and another host
     will be able to take over any resource leases held by the host.  san‐
     lock  must  ensure that the application on two different hosts is not
     holding and using the same lease concurrently.

   · When sanlock has failed to renew a delta lease for a period of  time,
     it  will begin taking measures to stop local processes (applications)
     from using any resource leases associated with the expiring lockspace
     delta  lease.   sanlock enters this "recovery mode" well ahead of the
     time when another host could take  over  the  locally  owned  leases.
     sanlock  must  have  sufficient time to stop all local processes that
     are using the expiring leases.

   · sanlock uses three methods to stop local  processes  that  are  using
     expiring leases:

     1.  Graceful  shutdown.   sanlock  will execute a "graceful shutdown"
     program that the application previously specified for this case.  The
     shutdown  program  tells  the  application  to  shut down because its
     leases are expiring.  The application must respond  by  stopping  its
     activities  and  releasing  its  leases (or exit).  If an application
     does not specify a graceful shutdown program, sanlock  sends  SIGTERM
     to  the process instead.  The process must release its leases or exit
     in a prescribed amount of time (see -g), or sanlock proceeds  to  the
     next method of stopping.

     2. Forced shutdown.  sanlock will send SIGKILL to processes using the
     expiring leases.  The processes have a fixed amount of time  to  exit
     after  receiving  SIGKILL.   If any do not exit in this time, sanlock
     will proceed to the next method.

     3. Host reset.  sanlock will trigger the host's  watchdog  device  to
     forcibly  reset  it.   sanlock  carefully  manages  the timing of the
     watchdog device so that it fires shortly before any other host  could
     take over the resource leases held by local processes.

   Failures

   If  a  process holding resource leases fails or exits without releasing
   its leases, sanlock  will  release  the  leases  for  it  automatically
   (unless persistent resource leases were used.)

   If  the  sanlock daemon cannot renew a lockspace delta lease for a spe‐
   cific period of time (see Expiration),  sanlock  will  enter  "recovery
   mode"  where  it  attempts  to  stop  and/or kill any processes holding
   resource leases in the expiring lockspace.  If  the  processes  do  not
   exit  in  time, sanlock will force the host to be reset using the local
   watchdog device.

   If the sanlock daemon crashes or hangs, it will not  renew  the  expiry
   time  of the per-lockspace connections it had to the wdmd daemon.  This
   will lead to the expiration of the local watchdog device, and the  host
   will be reset.

   Watchdog

   sanlock  uses  the wdmd(8) daemon to access /dev/watchdog.  wdmd multi‐
   plexes multiple timeouts onto  the  single  watchdog  timer.   This  is
   required because delta leases for each lockspace are renewed and expire
   independently.

   sanlock maintains a wdmd connection  for  each  lockspace  delta  lease
   being  renewed.  Each connection has an expiry time for some seconds in
   the future.  After each successful delta lease renewal, the expiry time
   is  renewed for the associated wdmd connection.  If wdmd finds any con‐
   nection expired, it will not  renew  the  /dev/watchdog  timer.   Given
   enough  successive  failed  renewals, the watchdog device will fire and
   reset the host.  (Given the multiplexing nature of wdmd, shorter  over‐
   lapping  renewal failures from multiple lockspaces could cause spurious
   watchdog firing.)

   The direct link between delta lease renewals and watchdog renewals pro‐
   vides  a  predictable watchdog firing time based on delta lease renewal
   timestamps that are visible from other hosts.  sanlock knows  the  time
   the  watchdog  on another host has fired based on the delta lease time.
   Furthermore, if the watchdog device on another host fails to fire  when
   it should, the continuation of delta lease renewals from the other host
   will make this evident and prevent leases from  being  taken  from  the
   failed host.

   If  sanlock  is  able  to  stop/kill  all  processing using an expiring
   lockspace,  the  associated  wdmd  connection  for  that  lockspace  is
   removed.   The expired wdmd connection will no longer block /dev/watch‐
   dog renewals, and the host should avoid being reset.

   Storage

   The sector size and the align size should be  specified  when  creating
   lockspaces and resources (and rindex).  The "align size" is the size on
   disk of a lockspace or a resource, i.e. the amount  of  disk  space  it
   uses.   Lockspaces  and  resources should use matching sector and align
   sizes, and must use offsets in multiples of the align  size.   The  max
   number  of  hosts  that  can use a lockspace or resource depends on the
   combination of sector size and align size, shown below.  The host_id of
   hosts using the lockspace can be no larger than the max_hosts value for
   the lockspace.

   Accepted combinations of sector size and align  size,  and  the  corre‐
   sponding max_hosts (and max host_id) are:

   sector_size 512, align_size 1M, max_hosts 2000
   sector_size 4096, align_size 1M, max_hosts 250
   sector_size 4096, align_size 2M, max_hosts 500
   sector_size 4096, align_size 4M, max_hosts 1000
   sector_size 4096, align_size 8M, max_hosts 2000

   When sector_size and align_size are not specified, the behavior matches
   the behavior before these sizes could be configured: on  devices  which
   report  sector  size  512, 512/1M/2000 is used, on devices which report
   sector size 4096, 4096/8M/2000 is used, and on  files,  512/1M/2000  is
   always  used.  (Other combinations are not compatible with sanlock ver‐
   sion 3.6 or earlier.)

   Using sanlock on shared block devices that do host based  mirroring  or
   replication  is  not  likely  to work correctly.  When using sanlock on
   shared files, all sanlock io should go to one file server.

   Example

   This is an example of creating and using lockspaces and resources  from
   the command line.  (Most applications would use sanlock through libsan‐
   lock rather than through the command line.)

   1.  Allocate shared storage for sanlock leases.

       This example assumes 512 byte sectors on the device, in which  case
       the lockspace needs 1MB and each resource needs 1MB.

       The  example  shared  block  device  accessible  to  all  hosts  is
       /dev/leases.

   2.  Start sanlock on all hosts.

       The -w 0 disables use of the watchdog for testing.

       # sanlock daemon -w 0

   3.  Start a dummy application on all hosts.

       This sanlock command registers with sanlock, then execs  the  sleep
       command  which  inherits the registered fd.  The sleep process acts
       as the dummy application.  Because the sleep process is  registered
       with sanlock, leases can be acquired for it.

       # sanlock client command -c /bin/sleep 600 &

   4.  Create a lockspace for the application (from one host).

       The lockspace is named "test".

       # sanlock client init -s test:0:/dev/leases:0

   5.  Join the lockspace for the application.

       Use a unique host_id on each host.

       host1:
       # sanlock client add_lockspace -s test:1:/dev/leases:0
       host2:
       # sanlock client add_lockspace -s test:2:/dev/leases:0

   6.  Create two resources for the application (from one host).

       The  resources  are  named  "RA" and "RB".  Offsets are used on the
       same device as the lockspace.  Different LVs or files could also be
       used.

       # sanlock client init -r test:RA:/dev/leases:1048576
       # sanlock client init -r test:RB:/dev/leases:2097152

   7.  Acquire resource leases for the application on host1.

       Acquire an exclusive lease (the default) on the first resource, and
       a shared lease (SH) on the second resource.

       # export P=`pidof sleep`
       # sanlock client acquire -r test:RA:/dev/leases:1048576 -p $P
       # sanlock client acquire -r test:RB:/dev/leases:2097152:SH -p $P

   8.  Acquire resource leases for the application on host2.

       Acquiring the exclusive lease  on  the  first  resource  will  fail
       because  it  is  held  by host1.  Acquiring the shared lease on the
       second resource will succeed.

       # export P=`pidof sleep`
       # sanlock client acquire -r test:RA:/dev/leases:1048576 -p $P
       # sanlock client acquire -r test:RB:/dev/leases:2097152:SH -p $P

   9.  Release resource leases for the application on both hosts.

       The sleep pid could also be killed, which will result in  the  san‐
       lock daemon releasing its leases when it exits.

       # sanlock client release -r test:RA:/dev/leases:1048576 -p $P
       # sanlock client release -r test:RB:/dev/leases:2097152 -p $P

   10. Leave the lockspace for the application.

       host1:
       # sanlock client rem_lockspace -s test:1:/dev/leases:0
       host2:
       # sanlock client rem_lockspace -s test:2:/dev/leases:0

   11. Stop sanlock on all hosts.

       # sanlock shutdown

OPTIONS COMMAND can be one of three primary top level choices

   sanlock daemon start daemon
   sanlock client send request to daemon (default command if none given)
   sanlock direct access storage directly (no coordination with daemon)

Daemon Command sanlock daemon [options]

   -D no fork and print all logging to stderr

   -Q 0|1 quiet error messages for common lock contention

   -R 0|1 renewal debugging, log debug info for each renewal

   -L pri write logging at priority level and up to logfile (-1 none)

   -S pri write logging at priority level and up to syslog (-1 none)

   -U uid user id

   -G gid group id

   -t num max worker threads

   -g sec seconds for graceful recovery

   -w 0|1 use watchdog through wdmd

   -h 0|1 use high priority (RR) scheduling

   -l num use mlockall (0 none, 1 current, 2 current and future)

   -b sec seconds a host id bit will remain set in delta lease bitmap

   -e str local host name used in delta leases

Client Command sanlock client action [options]

   sanlock client status

   Print processes, lockspaces, and resources being managed by the sanlock
   daemon.  Add -D to show extra internal  daemon  status  for  debugging.
   Add  -o  p  to  show  resources  by  pid,  or -o s to show resources by
   lockspace.

   sanlock client host_status

   Print state of host_id delta  leases  read  during  the  last  renewal.
   State  of  all  lockspaces  is shown (use -s to select one).  Add -D to
   show extra internal daemon status for debugging.

   sanlock client gets

   Print lockspaces being managed by the sanlock  daemon.   The  LOCKSPACE
   string  will  be  followed  by ADD or REM if the lockspace is currently
   being added or removed.  Add -h 1 to also show hosts in each lockspace.

   sanlock client renewal -s LOCKSPACE

   Print a history of renewals with timing details.  See the Renewal  his‐
   tory section below.

   sanlock client log_dump

   Print the sanlock daemon internal debug log.

   sanlock client shutdown

   Ask  the  sanlock daemon to exit.  Without the force option (-f 0), the
   command will be ignored if any lockspaces exist.  With the force option
   (-f  1), any registered processes will be killed, their resource leases
   released, and lockspaces removed.  With the wait  option  (-w  1),  the
   command  will  wait for a result from the daemon indicating that it has
   shut down and is exiting, or cannot shut down because lockspaces  exist
   (command fails).

   sanlock client init -s LOCKSPACE

   Tell  the  sanlock  daemon  to  initialize a lockspace on disk.  The -o
   option can be used to specify the io  timeout  to  be  written  in  the
   host_id  leases.  The -Z and -A options can be used to specify the sec‐
   tor size and align size, and both should be set  together.   (Also  see
   sanlock direct init.)

   sanlock client init -r RESOURCE

   Tell the sanlock daemon to initialize a resource lease on disk.  The -Z
   and -A options can be used to specify the sector size and  align  size,
   and both should be set together.  (Also see sanlock direct init.)

   sanlock client read -s LOCKSPACE

   Tell  the  sanlock  daemon  to  read  a  lockspace from disk.  Only the
   LOCKSPACE path and offset are required.  If host_id is zero, the  first
   record  at  offset  (host_id  1)  is  used.   The complete LOCKSPACE is
   printed.  Add -D to print other  details.   (Also  see  sanlock  direct
   read_leader.)

   sanlock client read -r RESOURCE

   Tell  the  sanlock daemon to read a resource lease from disk.  Only the
   RESOURCE path and  offset  are  required.   The  complete  RESOURCE  is
   printed.   Add  -D  to  print  other details.  (Also see sanlock direct
   read_leader.)

   sanlock client add_lockspace -s LOCKSPACE

   Tell the sanlock  daemon  to  acquire  the  specified  host_id  in  the
   lockspace.   This will allow resources to be acquired in the lockspace.
   The -o option can be used to specify the io timeout  of  the  acquiring
   host, and will be written in the host_id lease.

   sanlock client inq_lockspace -s LOCKSPACE

   Inquire about the state of the lockspace in the sanlock daemon, whether
   it is being added or removed, or is joined.

   sanlock client rem_lockspace -s LOCKSPACE

   Tell the sanlock  daemon  to  release  the  specified  host_id  in  the
   lockspace.   Any  processes  holding  resource leases in this lockspace
   will be killed, and the resource leases not released.

   sanlock client command -r RESOURCE -c path args

   Register with the sanlock daemon, acquire the specified resource lease,
   and  exec  the  command at path with args.  When the command exits, the
   sanlock daemon will release the lease.  -c must be the final option.

   sanlock client acquire -r RESOURCE -p pid
   sanlock client release -r RESOURCE -p pid

   Tell the sanlock daemon to acquire or release  the  specified  resource
   lease  for  the given pid.  The pid must be registered with the sanlock
   daemon.  acquire  can  optionally  take  a  versioned  RESOURCE  string
   RESOURCE:lver,  where  lver  is  the  version of the lease that must be
   acquired, or fail.

   sanlock client convert -r RESOURCE -p pid

   Tell the sanlock daemon to convert the mode of the  specified  resource
   lease  for the given pid.  If the existing mode is exclusive (default),
   the mode of the lease can be converted to shared with RESOURCE:SH.   If
   the  existing mode is shared, the mode of the lease can be converted to
   exclusive with RESOURCE (no :SH suffix).

   sanlock client inquire -p pid

   Print the resource leases held the given pid.  The  format  is  a  ver‐
   sioned RESOURCE string "RESOURCE:lver" where lver is the version of the
   lease held.

   sanlock client request -r RESOURCE -f force_mode

   Request the owner of a resource do something specified  by  force_mode.
   A  versioned  RESOURCE:lver  string must be used with a greater version
   than is presently held.  Zero lver and force_mode clears the request.

   sanlock client examine -r RESOURCE

   Examine the request record for the currently held  resource  lease  and
   carry out the action specified by the requested force_mode.

   sanlock client examine -s LOCKSPACE

   Examine  requests  for  all resource leases currently held in the named
   lockspace.  Only lockspace_name is used from the LOCKSPACE argument.

   sanlock client set_event -s LOCKSPACE -i host_id -g gen -e num -d num

   Set an event for another host.  When the sanlock daemon next renews its
   delta  lease  for the lockspace it will: set the bit for the host_id in
   its bitmap, and set the generation, event and data values  in  its  own
   delta  lease.   An application that has registered for events from this
   lockspace on the destination host will get the event that has been  set
   when  the  destination  sees  the  event  during  its  next delta lease
   renewal.

   sanlock client set_config -s LOCKSPACE

   Set a configuration value for a lockspace.  Only lockspace_name is used
   from  the  LOCKSPACE  argument.  The USED flag has the same effect on a
   lockspace as a process holding a resource lease  that  will  not  exit.
   The  USED_BY_ORPHANS flag means that an orphan resource lease will have
   the same effect as the USED.
   -u 0|1 Set (1) or clear (0) the USED flag.
   -O 0|1 Set (1) or clear (0) the USED_BY_ORPHANS flag.

   sanlock client format -x RINDEX

   Create a resource index on disk.  Use -Z and -A to set the sector  size
   and align size to match the lockspace.

   sanlock client create -x RINDEX -e resource_name

   Create  a  new  resource lease on disk, using the rindex to find a free
   offset.

   sanlock client delete -x RINDEX -e resource_name[:offset]

   Delete an existing resource lease on disk.

   sanlock client lookup -x RINDEX -e resource_name

   Look up the offset of an existing resource lease by name on disk, using
   the rindex.  With no -e option, lookup returns the next free lease off‐
   set.  If -e specifes both name and offset, the lookup verifies both are
   correct.

   sanlock client update -x RINDEX -e resource_name[:offset] [-z 0|1]

   Add (-z 0) or remove (-z 1) an rindex entry on disk.

   sanlock client rebuild -x RINDEX

   Rebuild the rindex entries by scanning the disk for resource leases.

Direct Command sanlock direct action [options]

   -o sec io timeout in seconds

   sanlock direct init -s LOCKSPACE
   sanlock direct init -r RESOURCE

   Initialize  storage  for  a  lockspace  or resource.  Use the -Z and -A
   flags to specify the sector size and align size.  The  max  hosts  that
   can use the lockspace/resource (and the max possible host_id) is deter‐
   mined by the sector/align size combination.  Possible combinations are:
   512/1M,  4096/1M,  4096/2M, 4096/4M, 4096/8M.  Lockspaces and resources
   both use the same amount of space (align_size)  for  each  combination.
   When  initializing  a  lockspace,  sanlock initializes delta leases for
   max_hosts in the given space.  When initializing  a  resource,  sanlock
   initializes  a single paxos lease in the space.  With -s, the -o option
   specifies the io timeout to be written in the host_id leases.  With -r,
   the  -z 1 option invalidates the resource lease on disk so it cannot be
   used until reinitialized normally.

   sanlock direct read_leader -s LOCKSPACE
   sanlock direct read_leader -r RESOURCE

   Read a leader record from disk and print the fields.  The leader record
   is  the  single sector of a delta lease, or the first sector of a paxos
   lease.

   sanlock direct dump path[:offset[:size]]

   Read disk sectors and print leader records for delta or  paxos  leases.
   Add  -f 1 to print the request record values for paxos leases, host_ids
   set in delta lease bitmaps, and rindex entries.

   sanlock direct format -x RINDEX
   sanlock direct lookup -x RINDEX -e resource_name
   sanlock direct update -x RINDEX -e resource_name[:offset] [-z 0|1]
   sanlock direct rebuild -x RINDEX

   Access the resource index on disk without  going  through  the  sanlock
   daemon.   This  precludes  using  the  internal  paxos lease to protect
   rindex modifications.  See client equivalents for descriptions.

LOCKSPACE option string -s lockspace_name:host_id:path:offset

   lockspace_name name of lockspace
   host_id local host identifier in lockspace
   path path to storage to use for leases
   offset offset on path (bytes)

RESOURCE option string -r lockspace_name:resource_name:path:offset

   lockspace_name name of lockspace
   resource_name name of resource
   path path to storage to use leases
   offset offset on path (bytes)

RESOURCE option string with suffix -r lockspace_name:resource_name:path:offset:lver

   lver leader version

   -r lockspace_name:resource_name:path:offset:SH

   SH indicates shared mode

RINDEX option string -x lockspace_name:path:offset

   lockspace_name name of lockspace
   path path to storage to use for leases
   offset offset on path (bytes) of rindex

Defaults sanlock help shows the default values for the options above.

   sanlock version shows the build version.

OTHER Request/Examine The first part of making a request for a resource is writing the request record of the resource (the sector following the leader record). To make a successful request:

   · RESOURCE:lver must be greater than the lver  presently  held  by  the
     other  host.  This implies the leader record must be read to discover
     the lver, prior to making a request.

   · RESOURCE:lver must be greater than or equal  to  the  lver  presently
     written  to the request record.  Two hosts may write a new request at
     the same time for the same lver, in which case  both  would  succeed,
     but the force_mode from the last would win.

   · The force_mode must be greater than zero.

   · To  unconditionally  clear  the  request  record  (set  both lver and
     force_mode to 0), make request with RESOURCE:0 and force_mode 0.

   The owner of the requested resource will not know of the request unless
   it  is  explicitly  told  to  examine  its  resources via the "examine"
   api/command, or otherwise notfied.

   The second part of making a request is  notifying  the  resource  lease
   owner  that  it  should  examine  the  request  records of its resource
   leases.  The notification will cause the lease owner  to  automatically
   run  the  equivalent  of  "sanlock client examine -s LOCKSPACE" for the
   lockspace of the requested resource.

   The notification is made using a bitmap in each  host_id  delta  lease.
   Each  bit represents each of the possible host_ids (1-2000).  If host A
   wants to notify host B to examine its resources, A sets the bit in  its
   own  bitmap  that  corresponds to the host_id of B.  When B next renews
   its delta lease, it reads the delta leases for  all  hosts  and  checks
   each  bitmap  to see if its own host_id has been set.  It finds the bit
   for its own host_id set  in  A's  bitmap,  and  examines  its  resource
   request  records.   (The  bit  remains  set  in A's bitmap for set_bit‐
   map_seconds.)

   force_mode determines the action the resource lease owner should take:

   · FORCE (1): kill the process holding the  resource  lease.   When  the
     process has exited, the resource lease will be released, and can then
     be acquired by anyone.  The kill signal is  SIGKILL  (or  SIGTERM  if
     SIGKILL is restricted.)

   · GRACEFUL  (2): run the program configured by sanlock_killpath against
     the process holding the resource lease.  If no killpath  is  defined,
     then FORCE is used.

Persistent and orphan resource leases A resource lease can be acquired with the PERSISTENT flag (-P 1). If the process holding the lease exits, the lease will not be released, but kept on an orphan list. Another local process can acquire an orphan lease using the ORPHAN flag (-O 1), or release the orphan lease using the ORPHAN flag (-O 1). All orphan leases can be released by setting the lockspace name (-s lockspace_name) with no resource name.

Renewal history sanlock saves a limited history of lease renewal information in each lockspace. See sanlock.conf renewal_history_size to set the amount of history or to disable (set to 0).

   IO times are measured in delta lease renewal (each delta lease  renewal
   includes one read and one write).

   For each successful renewal, a record is saved that includes:

   · the timestamp written in the delta lease by the renewal

   · the time in milliseconds taken by the delta lease read

   · the time in milliseconds taken by the delta lease write

   Also  counted  and  recorded  are  the  number io timeouts and other io
   errors that occur between successful renewals.

   Two consecutive successful renewals would be recorded as:
   timestamp=5332 read_ms=482 write_ms=5525 next_timeouts=0 next_errors=0
   timestamp=5353 read_ms=99 write_ms=3161 next_timeouts=0 next_errors=0

   Those fields are:

   · timestamp is the value written  into  the  delta  lease  during  that
     renewal.

   · read_ms/write_ms   are   the   milliseconds  taken  for  the  renewal
     read/write ios.

   · next_timeouts are the number of io timeouts that  occured  after  the
     renewal recorded on that line, and before the next successful renewal
     on the following line.

   · next_errors are the number of io errors (not timeouts)  that  occured
     after  renewal  recorded on that line, and before the next successful
     renewal on the following line.

   The command 'sanlock client renewal -s lockspace_name' reports the full
   history  of renewals saved by sanlock, which by default is 180 records,
   about 1 hour of history when using a 20 second renewal interval  for  a
   10 second io timeout.

INTERNALS Disk Format · This example uses 512 byte sectors.

   · Each  lockspace  is 1MB.  It holds 2000 delta_leases, one per sector,
     supporting up to 2000 hosts.

   · Each paxos_lease is 1MB.  It is used as a lease for one resource.

   · The leader_record structure is used differently by each lease type.

   · To display all leader_record fields, see sanlock direct read_leader.

   · A lockspace is often followed on disk by the paxos_leases used within
     that lockspace, but this layout is not required.

   · The request_record and host_id bitmap are used for requests/events.

   · The mode_block contains the SHARED flag indicating a lease is held in
     the shared mode.

   · In a  lockspace,  the  host  using  host_id  N  writes  to  a  single
     delta_lease in sector N-1.  No other hosts write to this sector.  All
     hosts read all lockspace sectors when renewing their own delta_lease,
     and are able to monitor renewals of all delta_leases.

   · In a paxos_lease, each host has a dedicated sector it writes to, con‐
     taining its own paxos_dblock and mode_block structures.   Its  sector
     is based on its host_id; host_id 1 writes to the dblock/mode_block in
     sector 2 of the paxos_lease.

   · The paxos_dblock structures are used by  the  paxos_lease  algorithm,
     and the result is written to the leader_record.

   0x000000 lockspace foo:0:/path:0

   (There  is  no representation on disk of the lockspace in general, only
   the sequence of specific delta_leases which collectively represent  the
   lockspace.)

   delta_lease foo:1:/path:0
   0x000 0         leader_record         (sector 0, for host_id 1)
                   magic: 0x12212010
                   space_name: foo
                   resource_name: host uuid/name
                   ...
                   host_id bitmap        (leader_record + 256)

   delta_lease foo:2:/path:0
   0x200 512       leader_record         (sector 1, for host_id 2)
                   magic: 0x12212010
                   space_name: foo
                   resource_name: host uuid/name
                   ...
                   host_id bitmap        (leader_record + 256)

   delta_lease foo:3:/path:0
   0x400 1024      leader_record         (sector 2, for host_id 3)
                   magic: 0x12212010
                   space_name: foo
                   resource_name: host uuid/name
                   ...
                   host_id bitmap        (leader_record + 256)

   delta_lease foo:2000:/path:0
   0xF9E00         leader_record         (sector 1999, for host_id 2000)
                   magic: 0x12212010
                   space_name: foo
                   resource_name: host uuid/name
                   ...
                   host_id bitmap        (leader_record + 256)

   0x100000 paxos_lease foo:example1:/path:1048576
   0x000 0         leader_record         (sector 0)
                   magic: 0x06152010
                   space_name: foo
                   resource_name: example1

   0x200 512       request_record        (sector 1)
                   magic: 0x08292011

   0x400 1024      paxos_dblock          (sector 2, for host_id 1)
   0x480 1152      mode_block            (paxos_dblock + 128)

   0x600 1536      paxos_dblock          (sector 3, for host_id 2)
   0x680 1664      mode_block            (paxos_dblock + 128)

   0x800 2048      paxos_dblock          (sector 4, for host_id 3)
   0x880 2176      mode_block            (paxos_dblock + 128)

   0xFA200         paxos_dblock          (sector 2001, for host_id 2000)
   0xFA280         mode_block            (paxos_dblock + 128)

   0x200000 paxos_lease foo:example2:/path:2097152
   0x000 0         leader_record         (sector 0)
                   magic: 0x06152010
                   space_name: foo
                   resource_name: example2

   0x200 512       request_record        (sector 1)
                   magic: 0x08292011

   0x400 1024      paxos_dblock          (sector 2, for host_id 1)
   0x480 1152      mode_block            (paxos_dblock + 128)

   0x600 1536      paxos_dblock          (sector 3, for host_id 2)
   0x680 1664      mode_block            (paxos_dblock + 128)

   0x800 2048      paxos_dblock          (sector 4, for host_id 3)
   0x880 2176      mode_block            (paxos_dblock + 128)

   0xFA200         paxos_dblock          (sector 2001, for host_id 2000)
   0xFA280         mode_block            (paxos_dblock + 128)

Lease ownership Not shown in the leader_record structures above are the owner_id, owner_generation and timestamp fields. These are the fields that define the lease owner.

   The  delta_lease at sector N for host_id N+1 has leader_record.owner_id
   N+1.  The leader_record.owner_generation is incremented each  time  the
   delta_lease   is   acquired.   When  a  delta_lease  is  acquired,  the
   leader_record.timestamp field is set to the time of the  host  and  the
   leader_record.resource_name  is  set  to  the  unique name of the host.
   When   the   host   renews   the   delta_lease,   it   writes   a   new
   leader_record.timestamp.  When a host releases a delta_lease, it writes
   zero to leader_record.timestamp.

   When a host acquires a  paxos_lease,  it  uses  the  host_id/generation
   value  from  the  delta_lease  it holds in the lockspace.  It uses this
   host_id/generation to identify itself in the paxos_dblock when  running
   the  paxos  algorithm.   The  result  of  the  algorithm is the winning
   host_id/generation - the new owner of  the  paxos_lease.   The  winning
   host_id/generation      are      written     to     the     paxos_lease
   leader_record.owner_id and  leader_record.owner_generation  fields  and
   leader_record.timestamp is set.  When a host releases a paxos_lease, it
   sets leader_record.timestamp to 0.

   When a paxos_lease is free  (leader_record.timestamp  is  0),  multiple
   hosts  may  attempt  to  acquire  it.   The  paxos algorithm, using the
   paxos_dblock structures, will select only one of the hosts as  the  new
   owner, and that owner is written in the leader_record.  The paxos_lease
   will no longer be free (non-zero timestamp).  Other hosts will see this
   and will not attempt to acquire the paxos_lease until it is free again.

   If  a  paxos_lease is owned (non-zero timestamp), but the owner has not
   renewed its delta_lease for a specific length of time, then  the  owner
   value  in the paxos_lease becomes expired, and other hosts will use the
   paxos algorithm to acquire the paxos_lease, and set a new owner.

FILES /etc/sanlock/sanlock.conf

   · quiet_fail = 1
     See -Q

   · debug_renew = 0
     See -R

   · logfile_priority = 4
     See -L

   · logfile_use_utc = 0
     Use UTC instead of local time in log messages.

   · syslog_priority = 3
     See -S

   · names_log_priority = 4
     Log resource names at this priority level (uses syslog priority  num‐
     bers).   If  this  is greater than or equal to logfile_priority, each
     requested resource name and location is recorded in sanlock.log.

   · use_watchdog = 1
     See -w

   · high_priority = 1
     See -h

   · mlock_level = 1
     See -l

   · sh_retries = 8
     The number of times to try acquiring a paxos lease when  acquiring  a
     shared lease when the paxos lease is held by another host acquiring a
     shared lease.

   · uname = sanlock
     See -U

   · gname = sanlock
     See -G

   · our_host_name = <str>
     See -e

   · renewal_read_extend_sec = <seconds>
     If a renewal read i/o times out, wait this  many  additional  seconds
     for  that  read  to  complete  at the start of the subsequent renewal
     attempt.  When  not  configured,  sanlock  waits  for  an  additional
     io_timeout seconds for a previous timed out read to complete.

   · renewal_history_size = 180
     See -H

   · paxos_debug_all = 0
     Include all details in the paxos debug logging.

   · debug_io = <str>
     Add  debug logging for each i/o.  "submit" (no quotes) produces debug
     output at submission time, "complete" produces debug output  at  com‐
     pletion time, and "submit,complete" (no space) produces both.

   · max_sectors_kb = <str>|<num>
     Set  to  "ignore"  (no  quotes)  to  prevent sanlock from checking or
     changing max_sectors_kb  for  the  lockspace  disk  when  starting  a
     lockspace.   Set to "align" (no quotes) to set max_sectors_kb for the
     lockspace disk to the align size of the lockspace.  Set to  a  number
     to set a specific number of KB for all lockspace disks.

SEE ALSO wdmd(8)

                              2015-01-23                        SANLOCK(8)

WDMD(8) System Manager's Manual WDMD(8)

NAME wdmd - watchdog multiplexing daemon

SYNOPSIS wdmd [OPTIONS]

DESCRIPTION This daemon opens /dev/watchdog and allows multiple independent sources to detmermine whether each KEEPALIVE is done. Every test interval (10 seconds), the daemon tests each source. If any test fails, the KEEPALIVE is not done. In a standard configuration, the watchdog timer will reset the system if no KEEPALIVE is done for 60 seconds ("fire timeout"). This means that if a single test fails 5-6 times in row, the watchdog will fire and reset the system. With multiple test sources, fewer separate failures back to back can also cause a reset, e.g.

   T seconds, P pass, F fail
   T00: test1 P, test2 P, test3 P: KEEPALIVE done
   T10: test1 F, test2 F, test3 P: KEEPALIVE skipped
   T20: test1 F, test2 P, test3 P: KEEPALIVE skipped
   T30: test1 P, test2 F, test3 P: KEEPALIVE skipped
   T40: test1 P, test2 P, test3 F: KEEPALIVE skipped
   T50: test1 F, test2 F, test3 P: KEEPALIVE skipped
   T60: test1 P, test2 F, test3 P: KEEPALIVE skipped
   T60: watchdog fires, system resets

   (Depending  on timings, the system may be reset sometime shortly before
   T60, and the tests at T60 would not be run.)

   A crucial aspect to the design and function of wdmd is that if any sin‐
   gle  source  does  not pass tests for the fire timeout, the watchdog is
   guaranteed to fire, regardless of whether other sources on  the  system
   have passed or failed.  A spurious reset due to the combined effects of
   multiple failing tests as shown above, is an accepted side effect.

   The wdmd init script will load the softdog module if no other  watchdog
   module has been loaded.

   wdmd  cannot be used on the system with any other program that needs to
   open /dev/watchdog, e.g. watchdog(8).

Test Source: clients Using libwdmd, programs connect to wdmd via a unix socket, and send regular messages to wdmd to update an expiry time for their connection. Every test interval, wdmd will check if the expiry time for a connec‐ tion has been reached. If so, the test for that client fails.

Test Source: scripts wdmd will run scripts from a designated directory every test interval. If a script exits with 0, the test is considered a success, otherwise a failure. If a script does not exit by the end of the test interval, it is considered a failure.

OPTIONS --version, -V Print version.

   --help, -h
            Print usage.

   --dump, -d
            Print debug information from the daemon.

   --probe, -p
            Print path of functional watchdog device.  Exit code  0  indi‐
          cates a
            functional  device  was  found.  Exit code 1 indicates a func‐
          tional device
            was not found.

   -D
            Enable debugging to stderr and don't fork.

   -H 0|1
            Enable (1) or disable (0) high priority features such as real‐
          time
            scheduling priority and mlockall.

   -G name
            Group ownership for the socket.

   -S 0|1
            Enable (1) or disable (0) script tests.

   -s path
            Path to scripts dir.

   -k num
            Kill unfinished scripts after num seconds.

   -w path
            The path to the watchdog device to try first.

                              2011-08-01                           WDMD(8)

::