Sun Java Solaris Communities My SDN Account Join SDN

Article

Bringing Your Application Into the Zone

 
By Paul Lovvik, Joseph Balenzano, May 27, 2005  
Abstract: This article offers a process to help you support your application within a zone (using the Solaris Zones feature of the Solaris 10 OS). The process includes software installation and configuration as well as zone limitations with workarounds. Also discussed are best practices for application vendors.

Contents:

  1. Getting Software to Work in a Local Zone - An Overview
  2. Zone Limitations
  3. Software That Doesn't Work Correctly in a Local Zone
  4. Installation and Configuration in a Local Zone
  5. QA
  6. Key Points
  7. Resources
  8. About the Authors
 
 
 
 

"Will my software work in a Zone?"

Having worked with many partners that are interested in integrating some of the new features of the Solaris 10 Operating System into their software, this is the single-most common question I am asked about supporting the Solaris Zones feature. While most software supported on the Solaris 10 OS will run properly in a non-global zone, this question is not an easy one to answer because there are very real issues that stand in the way of non-global zone support. Without a process to help guide vendors through non-global zone support, application vendors might be uncomfortable or unable to bring zones support to their products.

The aim of this writing is to help you through the process of supporting your application within a zone. This process includes software installation and configuration as well as zone limitations with workarounds where possible. Also included is a discussion on best practices for application vendors desiring to support zones.

Before we continue, a few words on the term "zone" are in order. Zones are the software partitioning technology used to virtualize operating system services and provide an isolated and secure environment for running applications. The Solaris 10 OS supports the notion of the global zone and the non-global zone. For all intents and purposes, a global zone is the global view of the Solaris operating environment. There is always one global zone per Solaris instance appropriately named global. Each software partition that is created within the Solaris instance is known as a non-global zone. Many non-global zones can exist within an instance of the Solaris OS, each with a unique name. For the remainder of this document, the term zone will refer to the latter. When a distinction is required between a non-global zone and the global zone, the fully qualified zone type will be used.

The Simplest Answer

If you want to add zone support to your software, you would benefit from knowing the fastest way to get started. What problems will you encounter? How can you estimate the amount of effort it would take?

Zones provide the standard Solaris interfaces and application environment, and do not impose a new ABI or API. Applications do not need to be recompiled in the vast majority of cases in order to work within a zone. That is big news and that fact alone probably qualifies your application. There are a small number of limitations imposed on processes that run in a zone which guarantee that software in a zone does not cause harm to other zones or the global zone. Each of these will be explored in detail later, but the common trait among all of these limitations is that they require privileges only available to the superuser (root) in prior releases of Solaris.

If your software runs as an unprivileged user, the simple answer to the question "will my software run in a zone?" is yes. This is the easiest way to cut the zones support question down to size. For example, if your software ran under Solaris 8 or Solaris 9 as a non-root user, you know that you are not using any system or library calls that require the sort of permission that would be limited in a zone under the Solaris 10 OS. If this is the case, it is fantastic news for you, as you will not have to perform a full qualification cycle to verify that your software runs properly in a zone.

That is a pretty simple answer, and covers most software that runs on Solaris, but there is a catch even in this simple case. The problem is that during installation, configuration, and administration of your application a privileged user is probably required to perform certain actions. Also, if your software includes executables with the SUID (set user ID) permission bit set, you must look deeper to find out if those parts of the software need more qualification. These are the areas that will require your attention. Be sure to actually install and perform sanity tests at the very least before claiming support for a local zone configuration. It is easy to get surprised by a dependency on a device or file system that is not exposed into a zone by default. Even if your software is guaranteed to run in a zone, it makes sense to understand the zone configuration required to support your software correctly.

If a privileged user is required to run your software, your software will likely work in a zone, but to be certain, you have to do a bit more investigation.

The Harder Road

If your software does require a privileged user for execution, you have to do more investigation to determine whether or not your application will run correctly in a non-global zone.

The problem is that you must determine if you are using any of the APIs that are restricted in ways that will not function as expected in a zone because of security reasons. The next section, titled Zone Limitations, lists all of the known system call limitations of zones.

If you determine that you are using only APIs that are unrestricted in zones, your software will run correctly and completely in a zone. Rest assured that this is the case for most software. If you are using restricted APIs, your software will have limitations when running in a zone. If this is the case, it is possible to still support execution in a non-global zone. Perhaps the software will have known limitations. The application could be modified to be "zone aware" and behave in a slightly different way when executed in a zone. It would make sense to have an application turn off functionality or features when running in a zone to avoid running into problems.

 
 
 

Because zones do not define a new ABI or API, most software that runs on the Solaris 10 OS will work correctly in a zone. This section is dedicated to those system calls and associated library calls that serve as exceptions to this rule.

There are several ways to approach the problem of finding such limitations in your software. Automated searching through the source code will probably prove to be the most complete type of search, but it is also possible to catch issues solely through testing or at runtime with tools such as privilege debugging, apptrace(1), truss(1), and dtrace(1M).

This section is dedicated to describing as fully as practical what each of the limitations entails. In a subsequent section, the use of various system utilities to find these issues will be explored.

Before we address the particulars of the various system call behaviors in non-global zones, discussions are in order around zone security, the new Process Rights Management framework introduced in the Solaris 10 OS, and zone resources and services virtualization.

Zone Security

Each non-global zone has a security boundary around it. The security boundary is maintained by:

  • adopting Solaris 10 Process Rights Management (privileges(5)),
  • name spaces (for example, /proc, /dev) isolation, and
  • allowing zones to only communicate between themselves using networking APIs such as sockets
Process Rights Management
 

Process rights management enables processes to be restricted at the command, user, role, or system level. The Solaris OS implements process rights management through privileges. Privileges decrease the security risk that is associated with one user or one process having full superuser capabilities on a system.

A privilege is a discrete right that a process requires to perform an operation. The right is enforced in the kernel. A program that operates within the bounds of the Solaris basic set of privileges operates within the bounds of the system security policy. Setuid programs are examples of programs that operate outside the bounds of the system security policy. By using privileges, programs eliminate the need for calls to setuid.

Privileges discretely enumerate the kinds of operations that are possible on a system. Programs can be run with the exact privileges that enable the program to succeed. For example, a program that sets the date and writes the date to an administrative file might require the file_dac_write and sys_time privileges. This capability eliminates the need to run any program as root.

Historically, systems have not followed the privilege model. Rather, systems used the superuser model. In the superuser model, processes run as root or as a user. User processes were limited to acting on the user's directories and files. root processes could create directories and files anywhere on the system. A process that required creation of a directory outside the user's directory would run with a UID=0, that is, as root. Security policy relied on DAC, discretionary access control, to protect system files. Device nodes were protected by DAC. For example, devices owned by group sys could be opened only by members of group sys. However, setuid programs, file permissions, and administrative accounts are vulnerable to misuse. The actions that a setuid process is permitted are more numerous than the process requires to complete its operation. A setuid program can be compromised by an intruder who then runs as the all-powerful root user. Similarly, any user with access to the root password can compromise the entire system. In contrast, a system that enforces policy with privileges allows a gradation between user capabilities and root capabilities. A user can be granted privileges to perform activities that are beyond the capabilities of ordinary users, and root can be limited to fewer privileges than root currently possesses.

The privilege model provides greater security than the superuser model. Privileges that have been removed from a process cannot be exploited. Process privileges prevent a program or administrative account from gaining access to all capabilities. Process privileges can provide an additional safeguard for sensitive files, where DAC protections alone can be exploited to gain access. Privileges, then, can restrict programs and processes to just the capabilities that the program requires. This capability is called the principle of least privilege. On a system that implements least privilege, an intruder who captures a process has access to only those privileges that the process has. The rest of the system cannot be compromised.

The privileges(5) man page provides descriptions of every privilege. The command ppriv -lv prints a description of every privilege to standard out.

For most privileges, absence of the privilege simply results in a failure (EPERM error). In some instances, the absence of a privilege can cause system calls to behave differently. In other instances, the removal of a privilege can force a set-uid application to seriously malfunction.

Zone Process Privileges
 

All processes running in a zone are privilege aware. That means all processes in a zone are constrained by the privilege sets that are assigned to them when the process is created. When the system creates a non-global zone, an init(1M) process is created for it and that process is the root process of the zone. In general, all processes in a non-global zone are descendants of this init(1M) process. The inheritable privilege set of init determines the effective privilege set of processes in the zone.

It was previously stated that the "basic" privileges used to be always available to unprivileged processes, and by default, processes still have the basic privileges. Unprivileged processes executing in a non-global zone, share the same "basic" privilege set as unprivileged processes running in the global zone. This is the reason why from a privilege standpoint, your unprivileged software is guaranteed to run in a zone (provided the zone was configured properly). Of the privileges listed below the privileges file_link_any, proc_info, proc_session, proc_fork and proc_exec make up the "basic" privilege set.

Table 1 lists the privileges available in the Solaris 10 OS and whether they are available in a non-global zone. The set of privileges in a non-global zone are a subset of the privileges available in the global zone. The functionality that these missing privileges provide (with the exception of the DTrace privileges, which are new to the Solaris 10 OS) is only available to the superuser in prior releases of Solaris.

Table 1: Solaris 10 Privileges and Their Non-Global Zone Availability
Privilege
Zone Privilege
contract_event
yes
contract_observer
yes
cpc_cpu
no
dtrace_kernel
no
dtrace_proc
no
dtrace_user
no
file_chown
yes
file_chown_self
yes
file_dac_execute
yes
file_dac_read
yes
file_dac_search
yes
file_dac_write
yes
file_link_any
yes
file_owner
yes
file_setid
yes
ipc_dac_read
yes
ipc_dac_write
yes
ipc_owner
yes
net_icmpaccess
yes
net_privaddr
yes
net_rawaccess
no
proc_audit
yes
proc_chroot
yes
proc_clock_highres
no
proc_exec
yes
proc_fork
yes
proc_info
yes
proc_lock_memory
no
proc_owner
yes
proc_priocntl
no
proc_session
yes
proc_setid
yes
proc_taskid
yes
proc_zone
no
sys_acct
yes
sys_admin
yes
sys_audit
yes
sys_config
no
sys_devices
no
sys_ipc_config
no
sys_linkdir
no
sys_mount
yes
sys_net_config
no
sys_nfs
yes
sys_res_config
no
sys_resource
yes
sys_suser_compat
no
sys_time
no
 
 
System Calls

Because of restricted privileges of a process in a non-global zone, certain system calls when called with certain parameters may return errors. In most cases, EPERM will be returned for a process that does not possess the privilege. All the failing cases required superuser privilege in prior versions of Solaris.

adjtime, stime, ntp_adjtime

adjtime(2) - correct the time to allow synchronization of the system clock
stime(2) - set system time and date
ntp_adjtime(2) - adjust local clock parameters

Limitation: Cannot set the system's notion of time in a non-global zone.

Required Privilege: sys_time

Impact: Software that needs to adjust the system's idea of the current time (for example, to synchronize with another machine).

Workaround: N/A

Associated Command(s): date(1), nptdate(1M), xntpd(1M)

creat, chmod, open

creat(2) - create a new file or rewrite an existing one
chmod(2) - change the permissions mode of a file
open(2) - open a file

Limitation: Creating or changing a regular file with the S_ISVTX mode (sticky bit) set.

Required Privilege: sys_config

Impact: The sticky bit set on a regular file (that is, not a directory) that does not have the executable mode set indicates that the file is a swap file. Therefore, the system's page cache will not be used to hold the contents of the file with the sticky bit set. It is fair to assume the impact of this limitation is minimal, as not many applications create files with the sticky bit set. The impact is felt more by a system administrator who would use this mode directly - or perhaps indirectly through the use of mkfile(1M). Note that backup and restoration utilities that preserve such modes for later recovery could read and preserve the sticky bit for files, but would not be able to recreate the file with the mode upon restoration.

Workaround: The sticky bit can only be applied to files within the file system from the global zone. No workaround for executing within a zone is known at this time. Operations that attempt to set the sticky bit on a regular file in a local zone will fail with no error or warning.

Associated Command(s): mkfile(1M), chmod(1), tar(1)

ioctl

ioctl(2) - device control

Limitation: Cannot pop a streams module if an anchor is in place.

Required Privilege: sys_net_config

Impact: An anchor (I_ANCHOR) is a lock that prevents the removal of a STREAMS module with an I_POP ioctl call. You place an anchor in a stream on a module you want to lock. All modules at or below the anchor are locked, and can only be popped by a sufficiently privileged process. In a zone, this privilege is not available.

Workaround: N/A

Associated Command: autopush(1M)

link, unlink

link(2), unlink(2) - link and unlink files and directories

Limitation: Cannot create a link or unlink a directory in a zone.

Required Privilege: sys_linkdir

Impact: This could have an impact during the installation/configuration of software that creates links to directories. This also has an impact on software that may create temporary directories that are later removed with calls to unlink(2).

Workaround: Symbolic links (symlink(2)) to directories are allowed in a zone. The unlink(2) directory functionality can be replaced by the rmdir(2) system call.

Associated Command(s): link(1M), unlink(1M)

memcntl

memcntl(2) - memory management control

Limitation: MC_LOCK, MC_LOCKAS, MC_UNLOCK and MC_UNLOCKAS are not supported therefore a process cannot lock and unlock memory.

Required Privilege: proc_lock_memory

Impact: This can impact on software that needs to lock memory. For instance, a database program may want to lock memory to keep data table buffers in non-pageable memory for performance reasons.

Workaround: If you are locking a shared memory segment, refer to workaround section for shmctl(2).

mknod

mknod(2) - make a special file

Limitation: Cannot create a block (S_IFBLK) or character (S_IFCHR) special file.

Required Privilege: sys_devices

Impact: Software that needs to create device nodes on the fly (for example, Sun Ray Server Software) is impacted by this. Backup and restoration utilities (for example, tar(1)) could read and preserve special files, but would not be able to recreate the special files upon restoration.

Workaround: The special file creation could be omitted from the software. Instead, the zone's configuration as specified by zonecfg(1M) can include a "device" resource which will specify that the device file in question should be created when the zone is booted. Restoration of special files must be performed from the global zone.

Associated Command(s): cpio(1), disks(1M), mknod(1M), tapes(1M), tar(1)

msgctl

msgctl(2) - message control operations

Limitation: IPC_SET cannot be used to increase the message queue bytes (msg_qbytes).

Required Privilege: sys_ipc_config

Impact: Software that dynamically sizes the message queue is affected by this.

Workaround: The system-defined limit used to initialize msg_qbytes is the minimum enforced value of the calling process' process.max-msg-qbytes resource control. So it's possible to initialize msg_qbytes to the upper limits that your application requires when the message queue is initialized.

nice

nice(2) - change priority of a process

Limitation: This call will fail if the increment argument is negative or greater than 40.

Required Privilege: proc_priocntl

Impact: Depending upon the nature of your application requirements, your software may need to set the scheduling priority. Calling the nice function has no effect on the priority of processes or threads with the scheduling policy SCHED_FIFO or SCHED_RR.

Workaround: If your software really wants to adjust (raise) its priority using nice(2), then some other process in the global zone will need to perform that on behalf of the client in the non-global zone. Or, binding the non-global zone that the application runs in to a pool can also achieve the same effect (unless the process is competing for CPU with other processes in the same zone, in which case the Fair Share Scheduler can be used to specify which projects should get more of the CPU).

Associated Command: nice(1)

p_online

p_online(2) - return or change processor operational status

Limitation: P_ONLINE, P_OFFLINE, P_NOINTR, P_FAULTED, P_SPARE, and P_FORCED flags are not supported.

Required Privilege: sys_res_config

Impact: This will impact software that needs to disable/enable CPUs.

Workaround: N/A

Associated Command: psradm(1M)

priocntl

priocntl(2) - process scheduler control

Limitation: Changing the scheduling parameters of an LWP (using PC_SETPARMS or PC_SETXPARMS) is not supported.

Required Privilege: proc_priocntl

Impact: Depending upon the nature of your application requirements, your software may need to set the kernel-level scheduling priority of a LWP.

Workaround: N/A

Associated Command: priocntl(1)

pset_create, pset_destroy, pset_assign, pset_bind, pset_setattr, processor_bind

pset_create(2), pset_destroy(2), pset_assign(2) - manage set of processors
pset_bind(2) - bind LWPs to a set of processors
pset_setattr(2) - set processor set attributes

Limitation: These functions control the creation and management of sets of processors. Since processors are systemwide resources, manipulation of them from within a zone is not allowed.

Required Privilege: sys_res_config

Impact: Software that takes advantage of SMP systems to bind LWPs to a specific set of processors for performance, concurrency or resource control reasons. Your software may limit itself to the number of processors it can run on for licensing reasons.

Workaround: You can set up a resource pool using poolcfg(1M) and pooladm(1M) and then bind the zone that the application will run in to the resource pool using zonecfg(1M) and the "pool" property. You can use processor_bind(2) to bind LWPs to a single processor.

Associated Command: psrset(1M)

shmctl

shmctl(2) - shared memory control operations

Limitation: SHM_LOCK and SHM_UNLOCK are not supported, therefore a process cannot lock and unlock memory.

Required Privilege: proc_lock_memory

Impact: This can have an impact on software that needs to lock memory. For instance, a database program may want to lock memory to keep data table buffers in non-pageable memory for performance reasons.

Workaround: If the reason you are locking memory is for performance, you may want to investigate the use of the Intimate Shared Memory (ISM) feature of Solaris (shmat(2) SHM_SHARE_MMU). There are numerous benefits of using ISM, one of which is ISM pages are locked, significantly improving performance by reducing the kernel code path as well as preventing pages from being swapped out. It should be noted that the use of ISM can cause certain Dynamic Reconfiguration events (for example, those invoked using the cfgadm(1M) command) to fail.

socket

socket(2) - create an endpoint for communication

Limitation: Attempts to create a raw socket with protocol set to IPPROTO_RAW or IPPROTO_IGMP will return a EPROTONOSUPPORT error.

Required Privilege: net_rawaccess

Impact: This will impact software that is using the raw socket interface to implement network protocols or software that needs to create/inspect TCP/IP headers.

Workaround: N/A

Associated Command: N/A

swapctl

swapctl(2) - manage swap space.

Limitation: Cannot add (SC_ADD) or remove (SC_REMOVE) swapping resources.

Required Privilege: sys_config

Impact: Any software that needs to add or remove swap resources will be affected. This will most likely affect your installation and configuration.

Workaround: Swap space is a systemwide resource, therefore it has to be configured from the global zone.

Associated Command: swap(1M)

uadmin

uadmin(2) - administrative control

Limitation: The A_REMOUNT A_FREEZE, A_DUMP commands are not supported (ENOTSUP). The AD_IBOOT function of the A_SHUTDOWN command is not supported (ENOTSUP).

Required Privilege: sys_config

Impact: This could impact software that may want to force a crash dump under certain conditions.

Workaround: N/A

Associated Command: uadmin(1M)

Library Functions

Not unlike system calls, because of the restricted privileges of a process in a zone, certain library calls may return errors. In most cases, EPERM will be returned for a process that does not possess the appropriate privilege. The failing cases required superuser privilege in prior versions of Solaris.

clock_settime

clock_settime(3RT) - high resolution clock operations

Limitation: Cannot set the CLOCK_REALTIME and CLOCK_HIGHRES clocks since they are systemwide clocks.

Required Privilege: sys_time

Impact: Realtime software is most likely affected by the inability to set the clock.

Workaround: N/A

cpc_bind_cpu

cpc_bind_cpu(3CPC) - bind request sets to hardware counters

Limitation: This function binds the set to the specified CPU and measures events occurring on that CPU regardless of which LWP is running. This is not allowed in a zone because you could monitor the CPU events of processes not in your zone. The call fails because the function tries to open up a special file in the /devices directory which represents the CPU and the /devices directory is not part of the name space of a zone. Because there is no /devices, the open(2) system call issued by cpc_bind_cpu(3CPC) will generate an ENOENT return code.

Required Privilege: cpc_cpu

Impact: This could impact your development environment. For instance, you could be making calls to cpc_bind_cpu(3CPC) to determine the cache hit ratio of your code.

Workaround: The cpc_bind_curlwp(3CPC) is allowed in a zone, so you are able to monitor CPU counters for the LWP the call was issued from.

mlock, munlock, mlockall, munlocall, plock

mlock(3C), munlock(3C) - lock or unlock pages in memory
mlockall(3C), munlockall(3C) - lock or unlock address space
plock(3C) - lock or unlock into memory process, text, or data

Limitation: Cannot use these library functions to lock and unlock memory. This is the same issue as for memcntl(2).

Required Privilege: proc_lock_memory

Impact: This can have an impact on software that needs to lock memory. For instance, a database program may want to lock memory to keep data table buffers in non-pageable memory for performance reasons. It should be noted that locking memory can cause certain Dynamic Reconfiguration events (for example, those invoked using the cfgadm(1M) command) to fail.

Workaround: If you are locking a shared memory segment, the workaround described in shmctl should be considered.

pthread_setschedparam

pthread_setschedparam (3C) - access dynamic thread scheduling parameters

Limitation: Cannot change the underlying scheduling policy and parameters for a thread. This is the same issue as for priocntl.

Required Privilege: proc_priocntl

Impact: Depending upon the nature of your application requirements, your software may need to set the kernel-level scheduling priority of a thread and the underlying LWP.

Workaround: N/A

timer_create

timer_create(3RT) - create a timer

Limitation: Cannot create a timer using the high-resolution system clock (CLOCK_HIGHRES).

Required Privilege: proc_clock_highres

Impact: Software that requires high-resolution timers.

Workaround: N/A

t_open

t_open(3NSL) - establish a transport endpoint

Limitation: The STREAMS driver /dev/rawip is the TLI transport provider that provides raw access to IP. This device node is not available in a zone, so this call will return the ENOENT error when used for this driver.

Required Privilege: net_rawaccess

Impact: This will also impact software that is using the /dev/rawip device to implement network protocols, software that needs to create/inspect TCP/IP headers, and so on.

Workaround: N/A

Libraries

The API that the following list of libraries provide, is not supported in a zone. The shared objects are present in the zone's /usr/lib directory, so no link time errors will occur if your code includes references to these libraries. You can inspect your make files to determine if your application has explicit bindings to any of these libraries and use pmap(1) while the application is executing to verify that none of these libraries are dynamically loaded.

  • libdevinfo(3LIB) - device information library
  • libcfgadm(3LIB) - configuration administration library
  • libpool(3LIB) - pool configuration manipulation library
  • libtnfctl(3LIB) - TNF probe control library
  • libsysevent(3LIB) - system event interface library
Devices

Zones have a restricted set of devices, consisting primarily of pseudo devices that form part of the Solaris programming API. These include /dev/null, /dev/zero, /dev/poll, /dev/random, /dev/tcp, and so on. Physical devices are not directly accessible from within a zone unless configured by an administrator. Since devices, in general, are shared resources in a system, to make devices available in a zone requires some restrictions so system security will not be compromised.

  • The /dev name space consists of symbolic links (logical paths) to the physical paths in /devices. The /devices name space, which is available only in the global zone, reflects the current state of attached device instances created by the driver. Only the logical path /dev is visible in a non-global zone.
  • As noted in the Zones Limitations section, processes within a zone cannot create new device nodes (i.e., mknod(2) will fail). The create(2), link(2), mkdir(2), rename(2), symlink(2), and unlink(2) system calls will fail with EACCES if a file in /dev is specified. You can create a symbolic link, symlink(2), to an entry in /dev but that link cannot be created in /dev.
  • Devices that expose system data are only available in the global zone. Examples of such devices are: dtrace(7D), kmem(7D), ksyms(7D), kmdb(7D), trapstat(1M), lockstat(7D), and so on.
  • The /dev name space consists of device nodes made up of a default, "safe" set of drivers as well as device nodes specified for the zone by the zonecfg(1M) command.
  • All NIC device nodes that support the DLPI programming interface are not accessible in a non-global zone. Examples of such device node are: hme(7D), ce(7D), ge(7D), eri(7D), bge(7D), dmfe(7D), dnet(7D), e1000g(7D, elxl(7D), iprb(7D), pcelx(7D), pcn(7D), qfe(7D), rtls(7D), sk98sol(7D), skfp(7D), and spwr(7D).

The following list of devices are not visible in the namespace of a non-global zone. Except for cpuid, fcip and ksyms, the interfaces to these devices are not public (Interface Stability: Private) so this should have no effect on your well-behaved software.

  • mem(7D), kmem(7D), allkmem(7D) - physical or virtual memory access
  • kmdb(7D) - kernel debugger
  • ksyms(7D) - kernel symbols
  • dtrace(7D) - DTrace dynamic tracing facility
  • lockstat(7D) - DTrace kernel lock instrumentation provider
  • fcip(7D) - IP/ARP over Fibre Channel datagram encapsulation drive
Networking

Each non-global zone has its own logical network and loopback interface. Bindings between upper layer streams and logical interfaces are restricted such that a stream may only establish bindings to logical interfaces in the same zone. Likewise, packets from a logical interface can only be passed to upper layer streams in the same zone as the logical interface. Bindings to the loopback address are kept within a zone with one exception: when a stream in one zone attempts to access the IP address of an interface in another zone.

While applications within a zone can bind to privileged network ports, they have no control over the network configuration, including IP addresses and the routing table.

 
 
 

Not all software works properly in a local zone. This section examines how to detect and diagnose the source of the execution problem and how to make the software work, perhaps by disabling features, when it is running in a local zone.

Detecting Software Breakages in a Local Zone

When a system call fails with a permission error, it is not always immediately obvious what caused the problem. To debug such a problem, you can use a tool called privilege debugging. When privilege debugging is enabled for a process, the kernel reports missing privileges on the controlling terminal of the process. (Enable debugging for a process with the -D option of ppriv(1).) Additionally, the administrator can enable system-wide privilege debugging by setting the system variable priv__debug = 1 in the global zone's /etc/system file.

global# zlogin redzonene
redzone# ls -l /tmp
total 8
drwxr-xr-x   2 root     root          69 Apr 19 22:11 testdir
redzone# ppriv -D -e unlink /tmp/testdir
unlink[1245]: missing privilege "sys_linkdir" 
              (euid = 0, syscall = 10) 
              needed at tmp_remove+0x6e
unlink: Not owner
redzone# ppriv -D -e rmdir /tmp/testdir
redzone# ls -l /tmp
total 0
 

The Solaris 10 OS offers a number of tools that you can use to identify and inspect at runtime the system/library calls that your application issues. We will explore three such tools, apptrace(1M), dtrace(1M), and truss(1M). Although the dtrace(1M) command is not supported in a non-global zone, you can use DTrace to monitor a process from the global zone that is executing in a non-global zone because the global zone has visibility to all processes on the system.

Once you have identified a system or library call that may not work in zone, you can inspect the argument list by using apptrace(1), dtrace(1M), or truss(1) for system calls. The following example will illustrate that msgctltst.c is code that will not work in a zone because of its use of IPC_SET to increase the message queue size. In a zone, you can decrease the size of a queue, but cannot increase the size of a queue.

redzone# cat msgctltst.c 

#include <stdio.h>
#include <errno.h>
#include <sys/msg.h>
int
main (int argc, char *argv[])
{
    struct msqid_ds msgc;
    int rc, msgid;

    if ((msgid = msgget(IPC_PRIVATE, IPC_CREAT)) < 0) {
        fprintf (stderr,"msgget(IPC_PRIVATE),  errno = %d\n", errno);
    }

    if ((rc = msgctl(msgid, IPC_STAT, &msgc)) < 0) {
        fprintf (stderr,"msgctl(IPC_STAT),  errno = %d\n", errno);
    }

    msgc.msg_qbytes--;
    if ((rc = msgctl(msgid, IPC_SET, &msgc)) < 0) {
        fprintf (stderr,"msgctl(IPC_SET),  errno = %d\n", errno);
    }

    msgc.msg_qbytes++;
    if ((rc = msgctl(msgid, IPC_SET, &msgc)) < 0) {
	fprintf(stderr,"msgctl(IPC_SET) growing queue,	
        	errno = %d\n", errno);

    }
    return(0);
}
 

The following example illustrates the use of truss(1) to inspect the system calls issued by msgsctlstst. It will fail when we try to increase the number of bytes in the message queue.

redzone# truss ./msgctltst
execve("msgctltst", 0x08047EA8, 0x08047EB0)  argc = 1
...
...
...
msgget(IPC_PRIVATE, IPC_CREAT)            = 3
msgctl(3, IPC_STAT, 0x08047E00)           = 0
msgctl(3, IPC_SET, 0x08047E00)            = 0
msgctl(3, IPC_SET, 0x08047E00)            Err#1 EPERM [sys_ipc_config]
...
...
...
...
...

 

The apptrace(1) utility runs the executable program specified and traces all function calls that the executable program makes to the Solaris shared libraries. For each function call that is traceable, apptrace(1) reports the name of the library interface called, the values of the arguments passed, and the return value. Again, the example below illustrates the application is making calls to msgctl(2) with the second argument set to IPC_SET (0xb).

redzone# apptrace ./msgctltst
-> msgctltst -> libc.so.1:atexit(0x80505a8, 0xd27e6fd0, 0x0) ** NR
-> msgctltst -> libc.so.1:atexit(0xd27e6fd0, 0x0, 0x0) ** NR
-> msgctltst -> libc.so.1:atexit(0x80508d9, 0xd27e6fd0, 0x0) ** NR
-> msgctltst -> libc.so.1:void __fpstart(void)
<- msgctltst -> libc.so.1:__fpstart() = 0xd254cc3c
-> msgctltst -> libc.so.1:int msgget(key_t = 0x0, int = 0x200)
<- msgctltst -> libc.so.1:msgget() = 0x1
-> msgctltst -> libc.so.1:int msgctl(int = 0x1, int = 0xc,
                              struct msqid_ds * = 0x8047da0)
<- msgctltst -> libc.so.1:msgctl()
-> msgctltst -> libc.so.1:int msgctl(int = 0x1, int = 0xb,
                              struct msqid_ds * = 0x8047da0)
<- msgctltst -> libc.so.1:msgctl()
-> msgctltst -> libc.so.1:int msgctl(int = 0x1, int = 0xb,
                              struct msqid_ds * = 0x8047da0)
<- msgctltst -> libc.so.1:msgctl() = 0xffffffff
...
...
...
...
...
...
 

Same program using DTrace, executing from the global zone. The DTrace probe, syscall::msgsys:entry, will fire every time the msgctl(2) function is called when the second argument is set to IPC_SET. If the system call returns with an error, the syscall::msgsys:return probe will fire. The msgctltst program is executing in the non-global zone redzone. So you can see, DTrace is more powerful than truss(1) and apptrace(2) because we can actually inspect data structures, conditionally execute probe actions and display a call stack trace.

global# cat msgctl.d

#!/usr/sbin/dtrace -Cqs
#include <sys/msg.h>
#include <sys/msg_impl.h>

syscall::msgsys:entry
/  arg0 == MSGCTL && arg2 == IPC_SET/
{
   self->ptr = (struct msqid_ds*)copyin(arg3, sizeof(struct msqid_ds));
   printf("\n (%s) msgid=%d msg_qbytes=%d\n", execname,
          arg0, self->ptr->msg_qbytes);
}
syscall::msgsys:return
/self->ptr && errno != 0/
{
   printf("\n msgctl failed (%d)\n",errno);
   ustack();
}
syscall::msgsys:return
/self->ptr/
{
   self->ptr = 0;
}

global# dtrace -ZCqs msgctl.d &
global# zlogin redzone 
[Connected to zone 'redzone' pts/9] 
Last login: Mon May  9 15:38:17 on pts/7 
Sun Microsystems Inc.   SunOS 5.10      s10_72  December 2004 
redzone# cd zonetest 
redzone# pwd 
/zonetest 
redzone# ./msgctltst 
msgctl(IPC_SET) growing queue,  errno = 1 
redzone# 
(msgctltst) msgid=2 msg_qbytes=65535 

(msgctltst) msgid=2 msg_qbytes=65536 

msgctl failed (1) 

            libc.so.1`_syscall6+0x1b
            msgctltst`main+0x160
            40094c

 

If a non-global zone is not available for testing, you could test your software in the global zone with the privileges not available in a non-global zone removed from the privilege set of the program using the ppriv(1) command. If your software fails in the global zone with the privileges removed, it will fail in a non-global zone. This method will not catch access to device interfaces and libraries not available in a non-global zone, so it's advisable to test your software in a non-global zone. The runtime tools described above will only find errors in the code paths that are exercised. In no way do they replace a code review.

Defensive Programming and Error Reporting
When Is It Sensible to Have Different Behavior in a Local Zone?
 

Some software cannot work in a non-global zone completely as it does in the global zone. An example is the tar(1) command. When running in a zone, tar(1) is able to create archives that preserve the sticky bit on individual files but is not able to write files with the sticky bit set back to the file system. The tar(1) command fails silently in this case, because the chmod(2) system call does not report a failure when this occurs.

This is an example of software that should work differently depending on if it is running in a global zone or a non-global zone. It would be great if the tar(1) command would report this condition as a warning so that you would at least figure out that something was wrong. This could be done by having tar(1) detect that it was running in a non-global zone and log a warning whenever a regular file with the sticky bit set was written to the file system.

Similarly, it is easy to imagine applications that use other zone-restricted system calls could disable features when executed in a non-global zone. This allows the software to run properly to the extent that is possible while not pushing the burden of diagnosing the failure onto the user of that software.

Detecting Execution in a Local Zone
 

Once the decision has been made that your software requires zone awareness, perhaps for the reasons mentioned above, an API is provided for zone identity.

  • getzoneid(3C) - Returns the zone ID of the calling process
  • getzoneidbyname(3C) - Returns the zone ID corresponding to the named zone
  • getzonenamebyid(3C) - Stores the name of a zone identified by a zone ID, in a user-supplied buffer

A definition of the global zone ID, GLOBAL_ZONEID, is defined in /usr/include/sys/zone.h.

global# cat myzone.c
#include <stdio.h>
#include <zone.h>

int main(int argc, char  **argv)
{
   char zonename[ZONENAME_MAX+1];
   zoneid_t  id;
   if ((id = getzoneid()) == GLOBAL_ZONEID)
      printf("Global Zone!\n");
   if  (getzonenamebyid( id,  zonename,  sizeof(zonename)) > 0)
       printf("%s\n", zonename);
}
 

Executing the code from the global zone:

global# ls -l zonename
-rwxr-xr-x   1 root     root        5704 Apr 19 23:06 zonename
global# ./myzone
Global Zone!
global
 

Executing the code from a non-global zone redzone:

global# zlogin redzone
[Connected to zone 'redzone' pts/5]
Last login: Tue Apr 19 22:01:08 from 192.168.2.2
Sun Microsystems Inc.   SunOS 5.10      s10_71  December 2004
redzone# cd zonetest
redzone# zoneadm list
redzone
redzone# ./myzone
redzone
 
 
 
 

There are two issues that could cause the installation of your software to fail.

  • Read-only file systems
  • CD-ROM Access

When a zone is created, two options are available to create the root file system of the zone, the Sparse Root and Whole Root models. The Whole Root model provides the maximum configurability by installing all of the required and any selected optional Solaris software packages into the private file systems of the zone. The advantages of this model include the ability for zone administrators to customize their zone's file-system layout (for example, creating a /usr/local) and add arbitrary unbundled or third-party packages. The disadvantages of this model include the loss of sharing of text segments from executables and shared libraries by the virtual memory system, and a much heavier disk footprint -- approximately an additional 2 Gbyte -- for each non-global zone configured as such.

The Sparse Root model optimizes the sharing of objects by installing only a subset of the root packages (those with the pkginfo(4) parameter SUNW_PKGTYPE set to root) and using read-only loopback file systems to gain access to other files. This is similar to the way a diskless client is configured, where /usr and other file systems are mounted over the network with NFS. By default with this model, the directories /lib, /platform, /sbin and /usr are mounted as loopback, read-only file systems. The advantages of this model are greater performance due to the efficient sharing of executables and shared libraries, and a much smaller disk footprint for the zone itself. The sparse-root model only requires approximately 100 Mbyte of file system space for the zone itself.

Any installation software that needs to install components in /usr (or any of the other read-only loopback file systems) will fail in a Sparse Root model zone.

The second issue deals with the CD-ROM device. There are a couple of ways to gain access to the CD-ROM. One popular method is to loopback mount the /cdrom directory from the global zone to the non-global zone:

# zonecfg -z myzone
add fs
set dir=/cdrom
set special=/cdrom
set type=lofs
set options=[nodevices]
end
 

If you use this method and your installation requires multiple CD volumes, you will need to eject CDs from the global zone. Any explicit ejects of the CD-ROM device (eject(1)) in the installation scripts will fail. The alternative method used to gain access to the CD-ROM device in a non-global zone, exporting the physical device(s) from the global zone to the non-global zone, is discouraged. If you choose to use this method, it should be noted that the Volume Management demon (vold(1M)) does not function in a non-global zone.

Solaris Packages

Each zone maintains its own package and patch database. A package or a patch can be installed individually into a non-global zone or to all zones from the global zone. The behavior of packaging in a zone environment varies according to the following factors:

  • Use of the -G option in pkgadd(1M)
  • Setting of the SUNW_PKG_ALLZONES, SUNW_PKG_HOLLOW, and SUNW_PKG_THISZONE variables in the pkginfo file (see pkginfo(4) for details)
  • Type of zone, global or non-global, in which pkgadd(1M) is invoked

Table 2 shows the behavior of packaging in a zone environment, with variances according to factors.

pkinfo variables value
Global Zone pkgadd
Global Zone pkgadd -G
Local Zone pkgadd
Local Zone pkgadd -G

SUNW_PKG_ALLZONES
false

SUNW_PKG_HOLLOW
false

SUNW_PKG_THISZONE
false

Add to gz, current lz and future lz

Add to gz only, not to current or future lz

Add to this lz only

Add to this lz only

SUNW_PKG_ALLZONES
true

SUNW_PKG_HOLLOW
false

SUNW_PKG_THISZONE
false

Add to gz, current lz and future lz

Operation not allowed

Operation not allowed

Operation not allowed

SUNW_PKG_ALLZONES
true

SUNW_PKG_HOLLOW
true

SUNW_PKG_THISZONE
false

Add to gz

Add to pkginfo db in current and future lz

Operation not allowed

Operation not allowed

Operation not allowed

SUNW_PKG_ALLZONES
true

SUNW_PKG_HOLLOW
true

SUNW_PKG_THISZONE
true

Invalid option combination

Invalid option combination

Invalid option combination

Invalid option combination

SUNW_PKG_ALLZONES
false

SUNW_PKG_HOLLOW
true

SUNW_PKG_THISZONE
false

Invalid option combination

Invalid option combination

Invalid option combination

Invalid option combination

SUNW_PKG_ALLZONES
false

SUNW_PKG_HOLLOW
true

SUNW_PKG_THISZONE
true

Invalid option combination

Invalid option combination

Invalid option combination

Invalid option combination

SUNW_PKG_ALLZONES
false

SUNW_PKG_HOLLOW
false

SUNW_PKG_THISZONE
true

Add to gz only, not to current or future lz

Add to gz only, not to current or future lz

Add to this lz only

Add to this lz only

SUNW_PKG_ALLZONES
true

SUNW_PKG_HOLLOW
false

SUNW_PKG_THISZONE
true

Invalid option combination

Invalid option combination

Invalid option combination

Invalid option combination

 

Legend:
gz = global zone
lz = non-global zone

An "invalid option combination" means the package attribute settings do not make sense - not all possible combinations of settings for these three attributes are legal. They should be caught by pkgmk(1M) and the package should not be created.

An "operation not allowed" means the pkgadd command will output an error message and fail to add the packages based on the combination of command line options, package attribute settings, and the type of zone pkgadd is being run in.

Configuration Issues

Getting your software to work in a zone is only part of the challenge. System administrators must understand how to configure their zone appropriately for the software they intend to run. There are many possible configurations of zones, and this section does not go through all of the various possibilities. Instead, this section focuses on strategies for targeting useful configurations and communication of the required zone configuration to your software administration audience.

Zones can be configured in many different ways. As mentioned before, a Sparse Root Zone takes advantage of files that are shared between the global zone and the non-global zone (such as /lib and /usr), while a Whole Root Zone maintains its own copy of all of the files. A Sparse Root zone is a more restrictive environment than the Whole Root Zone, because the shared directories are exposed into the non-global zone as a read-only file system. The additional flexibility provided by a less restrictive zone configuration comes at the cost of additional resources - specifically hard drive space and memory. In order to achieve maximum flexibility in the zone configurations that your software can be deployed into, it is important to target the most restrictive but reasonable zone configuration possible.

The default zone configuration is a Sparse Root Zone with very few devices provisioned into the zone. The directories that are shared with the global zone (with a read-only loopback mount) are /lib, /platform, /sbin, and /usr. This provides a fairly restricted zone in terms of deploying unbundled software, because during installation and execution of your software it is not possible to modify or write files into those directories. This default zone configuration is a good starting point to consider for your software. It offers the maximum permission set available for zones and the directory restrictions mentioned above seem to be a good tradeoff in terms of disk space requirements.

Starting from this default zone configuration, it is important to discover and document the configuration necessary to deploy your software successfully in a zone. If your installation includes writing to the read-only directories, a software modification should be considered. If the software cannot be modified to work around the problem, configure the zone accordingly. Remember that you are identifying the most restrictive environment in which your software can be installed and executed. From there, more liberal configurations will present no trouble.

Be sure to keep track of all elements of the required zone configuration. This should include removing inherited directories, device configuration, and network requirements. This information should be included in your install documentation or any relevant configuration guides.

The configuration details that fall out of using this strategy will aid system administrators who are faced with the task of configuring a single non-global zone for multiple software libraries and applications. By stating the minimum requirement for each piece of software, the minimum zone configuration for a set of software would simply be all of the minimum zone configuration requirements for each respective software package put together. This scheme simplifies the task of planning required resources for deployment of zones.

 
 
 

Once you have identified the most restrictive zone configuration for the deployment of your software, you should verify that your software works correctly in that configuration. A non-global zone is a more restricted environment than the global zone. This is true in terms of permission to call various system calls as well as the ability to use specific devices or to modify the contents of specific directories. You can use this fact to your advantage as you move to support the Solaris 10 OS with global and non-global zones. Rather than potentially double your QA test matrix to add local zone support, you can do your QA only in the non-global zone.

Rest assured that if the software works completely and correctly in the non-global zone, it will in the global zone as well. After the QA within the non-global zone, simply verify that the deployment works correctly in the global zone and you are all set.

 
 
 
  • If your software didn't need special superuser privileges to run in prior versions of Solaris, it will work in a zone. Concentrate on installation, configuration, and administration tasks.
  • Test to find the most restrictive zone configuration possible when running your software within a zone. This makes it easy to determine the necessary zone configuration for a set of software that runs in a common zone.
  • If your software works completely and correctly in the non-global zone, it will in the global zone as well. After the QA within the non-global zone, simply verify that the deployment works correctly in the global zone and you are all set.
 
 
 
 
 
 

Paul Lovvik, who has been with Sun for seven years, is lead engineer in a group in the Market Development Engineering organization focused on partner adoption of the Solaris OS for x86 Platforms. Paul and his engineering team have helped many partners add Solaris on x86 support to their products over the past year.

Joseph Balenzano has been with Sun for seven years. His current role is engineer in a group in the Market Development Engineering organization focused on partner adoption of the Solaris OS for x86 Platforms. He has over 20 years of software development experience working for ISVs.

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.