Design Specification 0.1


This page contains the high-level design specification for the Distributed Network Manager (DNM). The DNM is broken down into five main modules:
  1. Interface Module
  2. Protocol Module
  3. Configuration Control Module
  4. Data Collection and Reporting
  5. Distribution Control
The features shown in green are the suggested features of the prototype that will be complete by the end of Spring Quarter, 1996.

The Design Diagram gives a visual breakdown of the modules.


1.0 Interface Module

  1. Each process component will support a standard interface to other components and possibly to an interactive user
  2. Any process component that supports an interactive user interface will do so only for properly authorized privileged users
  3. Access to a process will require authentication beyond normal password access (Check with Tools)
  4. No remote access to process component - must have physical access to the Big Dog interface
  5. Will use a simple, command-line interface for current version, window-based GUI later
  6. Ability to configure parameters that effect the entire application as well as parameters specific to one or more of the process components.

2.0 Protocol Module

  1. Interaction of Big Dog with other processes/capabilities of network
  2. Abstract set of communication procedures
  3. Protocol Module is a Library or set of Libraries
  4. Use of existing protocol security features for each protocol (don’t reinvent the wheel)
  5. Security routines will exist as a submodule of the Protocol Module
  6. Submodule will provide encrypt/decrypt services in a transparent manner

3.0 Configuration Control Module

  1. Node / Component Discovery
    1. Determine number and types of nodes
    2. Determine number and types of communication links
    3. Get key information ( capacity, etc. )
  2. Config Change Notification
      A process component will receive automatic notification
      1. A process component must automaticallly detect changes
      2. Information will have to be supplied via manual intervention
    1. If change is local, the appropriate display/report database will be updated.
    2. If a network changes, the change notification will be forwarded to all such upstream process components
    3. Unexpected changes may cause an alarm notice

    Alarm Management

      An alarm occurs when:
      1. CPU and memory resources use exceed threshold values
      2. A utilization of a communication link exceeds specified thresholds or when error rates for a link reach a certain level
      3. A processing node(host, router, etc) becomes irresponsive in a time frame

      A mechanism to specify or enable an alarm, it should identify:

      1. The component and resources being monitored
      2. The threshold value or other event that causes alarm
      3. The action to be taken (handling) when an alarm occurs

      For a defined and enabled alarm, it may be desirable to:

      1. disable the alarm temporarily
      2. at times, to edit or delete the alarm specification

      A priveleged user can do for both the sphere of control for the local processor and downstream processes

      1. alarm definition
      2. alarm enabling
      3. alarm disabling
      4. alarm deletion

      An alarm will have three levels of action when they occur:

      1. Level 1: The responsible process component records the alarm--increment a counter and ignore it, but it may generate another alarm
      2. Level 2: Notify an operator/upstream process and may change a display/textual report or send a message
      3. Level 3: Notify and react Execute a specific user-supplied procedure or event handler

4.0 Data Collection and Reporting

Overview

  1. Constantly collect information about traffic and errors.
  2. Constantly maintain simple summary statisitcs
  3. Do more involved statisitcal computations (vague)
  4. Report on demand or at specific intervals
  5. "Raw" data collection will always take place in active mode.
  6. Broader data collection in both active and surrogate modes.
  7. Report presentation in both active and surrogate modes.

Simple Data Collection

  1. Only replicate functionality of SNMP where un-avoidable
  2. Data collection information is initialized via a config file
  3. Privileged User may change collection config manually (at any time)

STATS Subsystem

  1. Different components will keep different statistical data
  2. A components may gather some data, compute some data, and forward both to another component
  3. Simple Statistics
    1. A user may request statistical summaries for nodes, links and/or ports of a node
    2. stats:
      1. periodic utilization
      2. throughput
      3. error rate
    3. Counts (or values) per unit time will be computed
    4. Averages per set of unit times will be computed
    5. Time Units:
      1. seconds
      2. minutes
      3. hours
      4. days
    6. Measurement Periods
      1. time units
      2. weeks
      3. months
    7. Peak time unit during measurement period will be reported
    8. Statistical gathering will have a start and (optional) stop time.
    9. A report is generated after each Measurement Period (and all counters are reset)
    10. Focus will be on MAC level packets at a min (huh?)
  4. Link Cost Computation
    1. Link Cost computation requires 2 components to cooperate:
      1. they must synchronize their time clocks,
      2. periodically measure delays in transmission
    2. Unit Time and Measurement Periods may be specified
    3. Cost is a function of :
      1. delay
      2. transmission rate
      3. error rate
    4. A preset function will be provided
    5. the user may provide a computation algorithm
    6. cost compuration may be different for each link
    7. Route control agents may request link costs from the NMA
  5. Trend Analysis
    1. inputs to trend analyis are data values from nodes, links and ports.
    2. if a user modifies the data being collected on a node where trend analysis is being performed, they should get a warning. The user may then:
      1. Cancel trend analysis
      2. Modify request (for Data Collection change?)
    3. Trend Analysis may be performed on :
      1. utilization of a link
      2. throughput of a link
      3. error rate at a port
      4. availability at a node or port
    4. Trend Analysis will result in a periodic report being generated (with the period being specified by the priviledged user)
    5. Thesholds may be specified (for instance): [correct section??]
      1. utilization > some level
      2. rate of utilization increase changes dramatically
      3. error rate > some value
      4. availability < some value

Reports

  1. reports may be requested to be at a specific location
  2. a component may generate a report and forward it to an active reporting device

5.0 Distribution Control

Information Flow Management

  1. All process components will know its peer(s), parent(s), as well as nodes that provide collected, computed, or report information
  2. This will provide security and as a way to distribute information
  3. Primary and secondary destinations must be configured for each component to eliminate loss of data in the case of a failure
  4. This capability will be implemented in a future version of the application

Network Management Application Configuration Change Control

  1. A peer node must be able to take over a component (or set of components) should the primary component, or the system in which it resides, become unavailable
  2. This transition will be facilitated by their parent node
  3. Every process component must be backed up this way
  4. Thresholds will be set for "resources" (CPU, memory, or link bandwidth)
  5. Each process component will self-monitor its utilization of its resource
  6. If the resource is over-utilized, parent process will be notified so that alternate process components may be started on elsewhere

CSC405, Spring '96
Distributed Network Manager (DNM)