Flow Control in TNPM v1.3 - Components and Disk Usage Monitoring
High-level overview of flow control settings in TNPM v1.3 for managing disk space utilization. Components no longer monitor their own space usage; instead, a Disk Usage Server (DUS) monitors space for each datachannel root directory. When disk space is low, components are instructed to free up space. The DUS regulates space usage based on disk consumption levels, with specific actions triggered at different thresholds to ensure system stability.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
TNPM v1.3 Flow Control
High Level Instead of each component having flow control settings that govern only its directory, we now have a set of flow control settings for each datachannel root directory including all components that live in that directory Components no longer monitor their own space usage. Instead, inside the AMGR there is a Disk Usage Server (DUS) that monitors the space for each datachannel root directory on that host Components ask the DUS if there is enough space to write to disk and stop processing when there is not enough space When the overall space consumed in a datachannel root directory becomes too low, the DUS tells all components that live in that root directory to free up some space (or all available space) Components try harder to not overuse space by only acquiring a few hours of data before processing it and stopping when there are a few hours of data waiting to be picked up 2
High Level Components can still become flow controlled (stopped) because there is not enough space or the quota for the datachannel root directory has been exceeded Components still store old data that is no longer needed in their done directory and delete this data when more space is required 3
Flow Control Overview Components ask DUS if they can use more disk space DUS tells components to free disk space when necessary AMGR FTE.1.1 DiskUsageServer DiskUsageClient CME.1.1 /dc /dc/CME.1.1 /dc/FTE.1.1 /dc/LDR.1 /dc/DLDR.1 DiskUsageClient DLDR.1 LDR.1 DiskUsageClient DiskUsageClient 4
Managing Consumed Space When disk consumption is < 80% DiskUsageServer will continue to answer yes to space requests When disk consumption is >= 80% DiskUsageServer will contact all components who reside in this root directory and tell them to free up some space as they see fit. For example, each component may delete only 5 hour directories or only 50 files, etc. DiskUsageServer will continue to answer yes to space requests 5
Managing Consumed Space When disk consumption is >= 90% DiskUsageServer will contact all components who reside in this root directory and tell them to free up all space that they can DiskUsageServer will answer no to space requests which will stop all components (except LDR & DLDR components) in this root directory. LDR and DLDR components are allowed to run because the system cannot unblock itself unless these components run. The LDR and DLDR components are given 9% of the total quota to operate and load data which can unblock the system if there are no errors happening. When disk consumption is >= 99% DiskUsageServer will answer no to space requests from LDR & DLDR components in this root directory 6
Managing Free Space When free disk space <= FS_LL DiskUsageServer will contact all components who reside in this root directory and tell them to free up all space that they can DiskUsageServer will answer no to space requests which will stop all components in this root directory 7
Good Citizen Components try to behave as good citizens by: Only acquiring and buffering a few hours of data in advance in their do directory (default if 4 hours). Can be configured at the component level by modifying FC_MAX_DO_HOURS Only producing a few hours of data in their output directory and stopping if this data is not picked up by downstream components (default is 4 hours) Can be configured at the component level by modifying FC_MAX_OUTPUT_HOURS Honoring their retention interval and only keeping a certain number of hours of data in the done directory even if space is available. This has not changed from the previous release. Can be configured at the component level by modifying FC_RETENTION_HOURS 8
Supported Configurations Single datachannel root directory Component directories on the same disk (not mounted or linked) Datachannel Root FTE.1 Disk 1 9
Supported Configurations Multiple datachannel root directories (can be on different disks) Component directories are NOT mounted or linked Can create a root directory for each channel or for all FTEs or any other organization you choose Datachannel Root 1 FTE.1.1 Disk 1 Datachannel Root 2 FTE.2.1 Disk 2 10
New Restrictions Previously if you were running low on disk space you could mount or link a component directory (say CME.1.1) from another file system. This is no longer allowed. Instead of mounting or linking a component directory, you can mount another datachannel root directory and put some components in this new datachannel root directory. This new datachannel root directory must have its own DUS configuration settings. 11
Unsupported Configurations Datachannel root and component directories are on different disks To do this they use mounted or linked component directories This is NOT SUPPORTED and will cause problems Datachannel Root 1 FTE.1.1 link or mount Disk 1 Disk 2 12
Example DUS Configuration AMGR.DC1C.DUS.1.FC_FSLL=150000000 AMGR.DC1C.DUS.1.FC_QUOTA=2800000000 AMGR.DC1C.DUS.1.LOCAL_ROOT_DIRECTORY=/opt/datachannel AMGR.DC1C.DUS.1.REMOTE_PASSWORD=CACCDHDBCCCJ AMGR.DC1C.DUS.1.REMOTE_ROOT_DIRECTORY=/opt/datachannel AMGR.DC1C.DUS.1.REMOTE_USERNAME=pvuser AMGR.DC1C.DUS.1.USE_SECURE_FILE_TRANSFER=TRUE AMGR.DC1C.DUS.1.PORT_NUMBER=21 13
DUS Configuration Settings FC_FSLL is the free space low limit. When the disk has less than this amount of space available (in bytes), components will become flow controlled (stopped) FC_QUOTA is the amount of space (in bytes) you wish to allocate to the components running in this datachannel root directory. LOCAL_ROOT_DIRECTORY is the full local path to the datachannel root directory REMOTE_ROOT_DIRECTORY is the path to the datachannel root directory when accessing this directory via ftp or sftp REMOTE_USERNAME is the username to use when accessing this datachannel root directory via ftp or sftp REMOTE_PASSWORD is the password to use when accessing this datachannel root directory via ftp or sftp USE_SECURE_FILE_TRANSFER allows you to say that you want to use sftp when accessing this datachannel root directory from another host PORT_NUMBER is the port number to use for ftp or sftp 14
Log Messages V1:9017 2010.03.30-18.14.40 UTC AMGR.DC1C- 4673:8272 FLOW_CTRL_STATE 1 Dir=/opt/datachannel Actual free space = 416,288,768 Free space low limit = 150,000,000 Actual consumed space = 237,341,696 Space quota = 2,800,000,000 Consumed space calc milliseconds =91 The DUS inside AMGR will log this message so you can see how much space is currently used and available on the filesystem 16
Log Messages 010.03.24-15.00.00 UTC DG.1.13-17864:2515 FLOW_CTRL_ON 1 Flow control is being asserted Components will log this message when the system is low on available disk space and the DUS is answering no to components space requests. This means the component is flow controlled (stopped) until more space becomes available. 2010.03.24-15.25.59 UTC DG.1.13-17864:2515 FLOW_CTRL_OFF 1 Flow control has been deasserted Components will log this message when space has become available and they are returning to normal processing. This means the component is no longer flow controlled (stopped) because more space has become available. 17
Log Messages 2010.03.30-18.15.05 UTC FTE.4.8-8977:7706 FLOW_CTRL_PROCESSING_PAUSED GYMDC39209W Processing paused because output at maximum Components will log this message when there is too much data in the output directory waiting to be acquired by downstream components 2010.03.23-19.33.49 UTC CME.1.2-26344:1784 FOW_CTRL_PROCESSING_UNPAUSED GYMDC39211I Processing unpaused because no longer at max output Components will log this message when enough output data has been acquired 18
Log Messages 2010.03.24-17.06.15 UTC AMGR.DCAIX2-1622116:4888 FLOW_CTRL_PURGE_SOME 1 Notifying components in dir (/opt/proviso/datachannel) to purge some DUS will log this message when it is telling components to delete some data from their done directory. This is normal and should not cause worry. 2010.03.24-17.06.17 UTC CME.2.2000-1646612:15281 FLOW_CTRL_PURGE_SOME 1 Server requests I purge some Components will log this message when they are told to delete some data from their done directory. This is normal and should not cause worry. 19
Log Messages 2010.03.24-15.25.40 UTC AMGR.DC1C-4673:11897 FLOW_CTRL_PURGE_ALL 1 Notifying components in dir (/opt/proviso/datachannel) to purge all DUS will log this message when it is telling components to delete all data from their done directory 2010.03.24-15.25.41 UTC CME.1.13-19745:5271 FLOW_CTRL_PURGE_ALL 1 Server requests I purge all Components will log this message when they are told to delete all data from their done directory 20
Log Messages 2010.03.24-15.25.40 UTC AMGR.DC1C-4673:11897 FLOW_CTRL_QUOTA_FAILURE GYMDCDC10111 Error: Some error. Unable to get disk consumption for dir: /opt/datachannel DUS will log this message when it encounters an error while running the du command 2010.03.24-15.25.40 UTC AMGR.DC1C-4673:11897 FLOW_CTRL_FS_FAILURE GYMDCDC10157 Error: Some error. Unable to get free disk space for dir: /opt/datachannel DUS will log this message when it encounters an error while calculating the amount of free space available on this filesystem 21
Troubleshooting Tips Grep the log for FLOW_CTRL log messages Run the du command manually on the root directory to make sure it works Run the df command manually to see how much free space is available If your system is catching up after some components were stopped it is normal to see components log FLOW_CTRL_PROCESSING_PAUSED and FLOW_CTRL_PROCESSING_UNPAUSED as they rush ahead and downstream components are unable to keep up with the output of new data. BCOL and LDR have FLOW_CTRL_SKIP log messages that describe why BCOL or LDR is skipping the acquisition of data. Usually it is because too much data has already been acquired and buffered. CME logs NOT_ACQUIRING_TUPLES for a number of reasons. It could flow controlled or it could have already acquired and buffered too much data. This could also indicate a problem with CME receiving input from some inputs but not other inputs caused by a down collector or stopped FTE or CME. 22
Troubleshooting Tips The system depends on LDR and DLDR being able to load data into the database and then delete that data from the disk. This means that LDR and DLDR are allowed to run even if other components are stopped because the system is low on disk space. When flow control problems happen, components will back up from right to left (see diagram below). If your LDR is crashing it will eventually cause CME then FTE then UBA to flow control. This means when you notice a problem, start looking at components on the right to see if they are the cause. start UBA FTE CM E LDR DLDR Flow control problems cause backups upstream 23
Upgrade All installations before upgrade should have one datachannel root directory per host Check that there are no linked or mounted component directories under the datachannel root directory. If there are, they need to be reconfigured so that they are local directories under the main root directory or a new mounted root directory The Topology Editor will sum up component quotas and set the default root directory quota to this sum. Check that this sum is not greater than the amount of disk space available 24
Environment Design Guidelines Never link or mount a component directory under a datachannel root directory FC_QUOTA for a root directory should not exceed the amount of actual space available on the filesystem FC_FSLL should be large enough to be useful. Setting this number too low will make it very hard to recover if the system runs out of space. Think of this number as the buffer of space that will be available to recover from running out of space. 25