Sun Java Solaris Communities My SDN Account Join SDN
 
Documentation

Steps in Making a Solaris Device Driver DR/AP Compliant

Follow these guidelines  to make Solaris device drivers compliant with the new Sun technologies of Dynamic Reconfiguration (DR) and Alternate Pathing (AP).

You will need enough hardware information/support to be able to shut off and restart interrupts on the device. This capability might already be coded in the driver as a function of the existing detach routines, but it might require vendor support.  Acquiring this information could be the single most difficult part of the task.

1. Make the driver able to be unloaded without generating memory leaks. You should be able to sustain thousands of driver configurations and modunloads without errors or memory leaks. (This can be tested with a script.)

2.  Make the driver capable of detaching a single instance. XX_detach() only touches the data structures for the specified instance. Global resources should be deallocated in _fini(). (This can be tested by adding an ioctl().)

3.  Make the driver capable of sustaining hundreds of passes of attach/detach/attach/... of a single instance.

4.  Make the XX_detach() routine robust enough to handle a subsequent call to XX_detach () for the same device in case there are problems in detaching and it exits before completion. 
    For example:.

#if NONONONO
kmem_free(un->un_buf, un->un_buf_len);
#else
if (un->un_buf) {
kmem_free(un->un_buf, un->un_buf_len);
un->un_buf = NULL;
}
#endif

5.  Make the driver capable of dealing with non-monotonically increasing instance numbers.
    For Example:

sysbd 0 contains device XX instance 7
sysbd 3 contains device XX instance 2
sysbd 8 contains device XX instance 1

Back to Top



Since instances will be XX_attached() in hardware order, the driver must be able to handle out of sequence presentation of instances. (This can be tested by careful editing of /etc/path_to_inst.)

6.  Add the suspend/resume code.

a. XXattach() support DDI_RESUME
        1. clear XXSUSPENDED flag
        2. for each active interface, re-enable interrupts, restart timeout() routines and helper threads.
b. XXdetach() support DDI_SUSPEND cmd
        1. set XXSUSPENDED flag
        2. for each active interface, disable interrupts, kill timeout() routines and helper threads.
c. XXdetach() support DDI_DETACH cmd
       1. return DDI_FAILURE if instance is SUSPENDED or RUNNING
       2. perform regular DDI_DETACH operation
       3. execute ddi_set_driver_private(dip, NULL);
d. add remaining SUSPEND code
        1. XXinit:
           if (un->un_flags & XXSUSPENDED) {
     ddi_dev_is_needed(un->un_dip, 0, 1);
    }
Note the following:
DDI_SUSPEND is followed by DDI_RESUME with no power interruption; DDI_PM_SUSPEND may be followed by power interruption, and is followed by DDI_PM_RESUME;
DDI_DETACH may be followed by power interruption, any further references to the device will need to be preceded by a DDI_ATTACH.

(Suspend/resume testing can be performed on a  machine running Solaris 2.6. Generate I/O on your device, and then press the top-right button on the keyboard and suspend the machine. Resume the machine. There should be no panics, error messages or data corruption.)

Back to Top


7. Follow these suggestions on the use of timeout() routines:
a. avoid using them if at all possible - they are generally the indication of a poorly designed driver.
b. be careful that you do not have multiple instances of the same routine running - for example, a second call to un->un_tid = timeout(XX_to, arg, ticks) will cause un_tid to be over-written, removing the ability to untimeout() the first timeout() routine.
c. Be aware that self-rescheduling timeout() routines (those routines that contain a call to timeout() to reschedule themselves) are particularly difficult to kill. The timeout() routine should take the form of:

static void
XX_to(caddr_t arg)
{
    struct xx *un = (struct un *)arg;

    mutex_enter(&un->un_lock);
    .....
    XX_start_to(un);
    mutex_exit(&un->un_lock);
}

static void
XX_start_to(struct xx *un)
{
    ASSERT(MUTEX_HELD(&un->un_lock));
    if ((un->un_tid == 0) && ((un->un_flags & XXSTOP) == 0)){
        un->un_tid = timeout(XX_to, arg, ticks);
    }
}

static void
XX_stop_to(struct xx *un)
{
    int tid;
    mutex_enter(&un->un_lock);
    if ((tid = un->un_tid) != 0) {
        un->un_flags |= XXSTOP; /* do not reschedule */
        mutex_exit(&un->un_lock); /* no hold across */
       (void) untimeout(tid);
    } else {
        mutex_exit(&un->un_lock);
    }
}

Back to Top



8. For streams network device drivers, add AP support code to allow write modification of the kstat space. This code will allow the XX per instance structure copy of the kstats to be updated with the kstat structure values when a KSTAT_WRITE command is received by the kstat_install() callback.

For example:

static int
XXstat_kstat_update(kstat_t *ksp, int rw)
{
    struct XX *XXp;
    struct XXkstat *XXkp;

    XXp = (struct XX *)ksp->ks_private;
    XXkp = (struct XXkstat *)ksp->ks_data;

    if (rw == KSTAT_WRITE) {
        XXp->XX_ipackets = XXkp->XXk_ipackets.value.ul;
        XXp->XX_ierrors = XXkp->XXk_ierrors.value.ul;
        XXp->XX_opackets = XXkp->XXk_opackets.value.ul;
        XXp->XX_oerrors = XXkp->XXk_oerrors.value.ul;
        XXp->XX_txcoll = XXkp->XXk_txcoll.value.ul;
    } else {
        [....]
    }
    return (0);
}

Back to Top