
Associative Computing Models by Johnnie Baker
Explore the world of associative computing models, including SIMD, associative, and multi-associative models, as discussed by Johnnie Baker. Learn about the ASC and MASC models, associated algorithms, and references in the field of synchronous computing.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Synchronous Computing SIMD, Associative, and Multi- Associative Models Johnnie Baker
Synchronous Computing Topics Introduction References for Associative Computing Motivation for the ASC model The ASC and MASC Models A Language Designed for the ASC Model Two ASC Algorithms and Programs ASC and MASC Algorithm Examples ASC version of Prim s MST Algorithm ASC version of QUICKHULL MASC version of QUICKHULL. 2
Associative Computing References Note: Below KSU papers are available on the website: http://www.cs.kent.edu/~parallel/ (Click on the link to papers ) 1. Maher Atwah, Johnnie Baker, and Selim Akl, An Associative Implementation of Classical Convex Hull Algorithms, Proc of the IASTED International Conference on Parallel and Distributed Computing and Systems, 1996, 435-438 Johnnie Baker and Mingxian Jin, Simulation of Enhanced Meshes with MASC, a MSIMD Model, Proc. of the Eleventh IASTED International Conference on Parallel and Distributed Computing and Systems, Nov. 1999, 511-516. 2. 3
Associative Computing References 3. Mingxian Jin, Johnnie Baker, and Kenneth Batcher, Timings for Associative Operations on the MASC Model, Proc. of the 15th International Parallel and Distributed Processing Symposium, (Workshop on Massively Parallel Processing, San Francisco, April 2001. Jerry Potter, Johnnie Baker, Stephen Scott, Arvind Bansal, Chokchai Leangsuksun, and Chandra Asthagiri, An Associative Computing Paradigm, Special Issue on Associative Processing, IEEE Computer, 27(11):19-25, Nov. 1994. (Note: MASC is called ASC in this article.) Primary Reference Jerry Potter, Associative Computing - A Programming Paradigm for Massively Parallel Computers, Plenum Publishing Company, 1992. 4. 5. 4
Associative Computers Associative Computer: A SIMD computer with a few additional features supported in hardware. These additional features can be supported (less efficiently) in traditional SIMDs in software. The name associative is due to its ability to locate items in the memory of PEs by content rather than location. 5
Associative Models The ASC model (for ASsociative Computing) lists the properties assumed for an associative computer. The MASC (for Multiple ASC) Model Supports multiple SIMD (or MSIMD) computation. Allows model to have more than one Instruction Stream (IS) The IS corresponds to the control unit of a SIMD. ASC is the MASC model with only one IS. The one IS version of the MASC model is sufficiently important to have its own name. 6
ASC & MASC are KSU Models Several professors and their graduate students at Kent State University have worked on models The STARAN and the ASPRO computers fully support the ASC model in hardware. The MPP can easily support ASC but not in hardware. Prof. Batcher was chief architect or consultant Dr. Potter developed a language for ASC Several of my students and I work on algorithms for models and architectures that support associative computing Dr. Walker and his students have investigated a hardware design to support the ASC and MASC models. Dr. Batcher and Dr. Potter are currently not actively working on ASC/MASC models but still provide advice. 7
Motivation The STARAN Computer (Goodyear Aerospace, early 1970 s) and later the ASPRO provided the motivation for the ASC model. ASC provides a parallel computational model that fully supports the data parallel style of programming. ASC provides a practical model to support massive parallelism. MASC supports allowing multiple ASC computations to occur concurrently. Careful descriptions of these models allow them to be compared to other parallel models 8
The ASC Model PEs P E Memory PE IS N E T W O R K Memory PE Memory PE 9
Basic Properties of ASC Instruction Stream The IS has a copy of the program and broadcasts instructions to all PEs in unit time PE Properties Each PE has a local memory that is only accessible by its PE All PEs listen to the IS A PE can be active, inactive, or idle Inactive PEs listens to but does not execute IS commands until reactivated Idle PEs contain no essential data and are available for reassignment Active PEs execute IS commands synchronously 10
Basic Properties of ASC Responder Processing The IS can detect if a data test is satisfied by any of its responder PEs in constant time (i.e., any-responders property). The IS can select an arbitrary responder in constant time (i.e., pick-one property). 11
Basic Properties of ASC Constant Time Global Operations (across PEs) Logical OR and AND of binary values (reductions) Maximum and minimum of numbers (reductions) Associative searches Communications There are at least two networks PE communications network IS broadcast/reduction network Can implemented as one network or two separate networks) 12
Basic Properties of ASC The PE communications network is normally supported by an interconnection network E.g., a 2D mesh The broadcast/reduction network(s) can be supported by a broadcast and a reduction network of circuits (either 2 separate networks or 1 combined network). See posted paper by Jin, Baker, & Batcher (listed in associative references) Control Features PEs, the IS, and the networks all operate synchronously, using the same clock 13
Non-SIMD Properties of ASC Observation: The ASC properties that are unusual for SIMDs are the constant time operations: Constant time responder processing Any-responders? Pick-one Constant time global operations Logical OR and AND of binary values Maximum and minimum value of numbers Associative Searches These timings are justified by implementations using a resolver in the paper by Jin, Baker, & Batcher (listed in associative references and posted). 14
Typical Data Structure for ASC Model Busy- idle On lot Color Model Price Year Make PE1 1 red Dodge 1 1994 0 PE2 0 PE3 1 blue 1996 Ford 1 IS PE4 0 1 1998 white Ford PE5 0 0 PE6 0 0 1 1 Subaru PE7 1997 red Make, Color, Year, etc. are fields the programmer establishes Various data types are supported. Some examples will show string data, but they are not supported in the ASC simulator. 15
The Associative Search Busy- idle On lot Color Model Price Year Make PE1 1 red Dodge 1 1994 0 PE2 0 PE3 1 blue 1996 Ford 1 IS PE4 0 1 1998 white Ford PE5 0 0 PE6 0 0 1 1 Subaru PE7 1997 red IS asks for all cars that are red and on the lot. PE1 and PE7 respond by setting a mask bit in their PE. 16
MASC Model Basic Components An array of PEs, each consisting of a PE and its local memory A PE interconnection network between the PEs Two or more Instruction Streams (ISs) An IS network MASC is a MSIMD model that supports both data and control parallelism associative programming Memory PE Instruc- tion Stream (IS) Memory PE PE Interconnection Network Memory PE IS Network Instruc- tion Stream (IS) Memory PE Memory PE Memory PE Instruc- tion Stream (IS) Memory PE Memory PE 17
MASC Basic Properties Each PE can listen to only one IS PEs can switch ISs in unit time, based on the results of a data test. Each IS and the PEs listening to it follow rules of the ASC model. Control Features: The PEs, ISs, and networks all operate synchronously, using the same clock A master IS coordinates the interaction between the multiple ISs. 18
Characteristics of Associative Programming Consistent use of style of programming called data parallel programming Consistent use of global associative searching and responder processing Usually, frequent use of the constant time global reduction operations: AND, OR, MAX, MIN Broadcast of data using IS bus allows the use of the PE network to be restricted to parallel data movement. 19
Characteristics of Associative Programming Tabular representation of data (i.e., 2D arrays) Use of searching instead of sorting Use of searching instead of pointers Use of searching instead of the ordering provided by linked lists, stacks, queues Promotes an highly intuitive programming style that promotes high productivity Uses structure codes (i.e., numeric representation) to represent data structures such as trees, graphs, embedded lists, and matrices. Examples of use of structure codes are given in Ref: Nov. 1994 IEEE Computer article (see references) Extensive uses in Associative Computing book by Jerry Potter. 20
Languages Designed for the ASC Professor Potter has created several languages for the ASC model. ASC is a C-like language designed for ASC model Compilers and instructions posted under software ACE is a higher level language than ASC that uses natural language syntax; e.g., plurals, pronouns (e.g., its , their , etc.) Language References: ASC Primer Copy available on parallel lab website www.cs.kent.edu/~parallel/ under software . Associative Computing book by Potter [11] some features in this book were never fully implemented in his ASC compiler 21
Algorithms and Programs Implemented in ASC A wide range of algorithms implemented in ASC without the use of the PE network: Graph Algorithms minimal spanning tree shortest path connected components Computational Geometry Algorithms convex hull algorithms (Jarvis March, Quickhull, Graham Scan, etc) Dynamic hull algorithms 22
ASC Algorithms and Programs (not requiring PE network) String Matching Algorithms all exact substring matches all exact matches with don t care (i.e., wild card) characters. Algorithms for NP-complete problems traveling salesperson 2-D knapsack. Data Base Management Software associative data base relational data base 23
ASC Algorithms and Programs (not requiring a PE network) A Two Pass Compiler for ASC not the one we will be using. This compiler runs on an associative computer & uses ASC parallelism. first pass optimization phase Two Rule-Based Inference Engines for AI An Expert System OPS-5 interpreter PPL (Parallel Production Language interpreter) A Context Sensitive Language Interpreter (OPS-5 variables force context sensitivity) An associative PROLOG interpreter 24
Associative Algorithms & Programs (using a network) There are numerous associative algorithms or programs that use a PE network; 2-D Knapsack ASC Algorithm using a 1-D mesh Image processing algorithms using 1-D mesh FFT (Fast Fourier Transform) using 1-D nearest neighbor & Flip networks Matrix Multiplication using 1-D mesh An Air Traffic Control Program (used a Flip network connection between PEs and memory) Demonstrated using live data at Dulles Field in early 70 s. All but first were created and/or implemented in assembler for STARAN at Goodyear Aerospace 25
Example 1 - MST A graph has nodes labeled by some identifying letter or number and arcs which are directional and have weights associated with them. Such a graph could represent a map where the nodes are cities and the arc weights give the mileage between two cities. A B C D E 3 5 2 4 5 26
The MST Problem The MST problem assumes the weights are positive and the graph is undirected and connected. The goal is to find a minimal spanning tree, i.e. a subgraph that is a tree1, that includes all nodes (i.e. it spans), and where the sum of the weights on the arcs of the subgraph is the smallest possible weight (i.e. it is minimal). Why would an algorithm solving this problem be useful? Note: The solution may not be unique. 1 A tree is a set of points called vertices, pairs of distinct vertices called edges, such that (1) there is a path from any vertex to any other, and (2) there are no circuits (i.e., that is, no paths starting from a vertex and returning to the same vertex). 27
An Example 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 As we will see, the algorithm is simple. The ASC program is quite easy to write. A sequential solution is a bit messy because of the data structures needed to hold the data for the problem 28
An Example Step 0 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 We will maintain three sets of nodes whose membership will change during the run. The first, V1, will be nodes selected to be in the tree. The second, V2, will be candidates at the current step to be added to V1. 29 The third, V3, will be nodes not considered yet.
An Example Step 0 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 V1 nodes will be in red with their selected edges being in red also. V2 nodes will be in light blue with their candidate edges in light blue also. V3 nodes and edges will remain white. 30
An Example Step 1 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 Select an arbitrary node to place in V1, say A. Put into V2, all nodes incident with A. 31
An Example Step 2 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 Choose the edge with the smallest weight and put its node, B, into V1. Mark that edge with red also. Retain the other edge-node combinations in the to be considered list. 32
An Example Step 3 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 Add all the nodes incident to B to the to be considered list . However, note that AG has weight 3 and BG has weight 6. So, there is no sense of including BG in the list. 33
An Example Step 4 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 Add the node with the smallest weight that is colored light blue and add it to V1. Note the nodes and edges in red are forming a subgraph which is a tree. 34
An Example Step 5 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 Update the candidate nodes and edges by including all that are incident to those that are in V1 and colored red. 35
An Example Step 6 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 Select I as its edge is minimal. Mark node and edge as red. 36
An Example Step 7 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 Add the new candidate edges. Note that IF has weight 5 while AF has weight 7. Thus, we drop AF from consideration at this time. 37
An Example after several more passes, C is added & we have 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 Note that when CH is added, GH is dropped as CH has less weight. Candidate edge BC is also dropped since it would form a back edge between two nodes already in the MST. When there are no more nodes to be considered, i.e. no more in V3, we obtain the final solution. 38
An Example the final solution 2 A B 7 4 3 6 F C G 5 1 2 3 2 6 I H 4 8 2 E D 1 The subgraph is clearly a tree no cycles and connected. The tree spans i.e. all nodes are included. While not obvious, it can be shown that this algorithm always produces a minimal spanning tree. 39 The algorithm is known as Prim s Algorithm for MST.
The ASC Program vs a sequential solution in C, C++, or Java First, think about how you would write the program in C or C++. The usual solution uses some way of maintaining the sets as lists using pointers or references. See solutions to MST in Algorithms texts by Baase in the posted references. In the ASC language, pointers are not even supported as they are not needed and their use is likely to result in inefficient SIMD algorithms An ASC algorithm will be developed for Prim s sequential algorithm using a pseudocode that is based on the ASC language. The ASC language user s guide is posted at www.cs.kent.edu/~parallel/, but its use is not required. The ASC algorithm can be used to create a program in ASC for the ASC simulator or in Cn for ClearSpeed. 40
ASC-MST Algorithm Preliminaries Next, a data structure level presentation of Prim s algorithm for the MST is given. The data structure used is illustrated in the next two slides. This example is from the Nov. 1994 IEEE Computer paper cited in the references. There are two types of variables for the ASC model, namely the parallel variables (i.e., ones for the PEs) the scalar variables (ie., the ones used by the IS). Scalar variables are essentially global variables. Could replace each scalar variable with its scalar value stored in each entry of a parallel variable. 41
ASC-MST Algorithm Preliminaries (cont.) In order to distinguish between variable types here, the parallel variables names will end with a $ symbol. Each step in this algorithm takes constant time. One MST edge is selected during each pass through the loop in this algorithm. Since a spanning tree has n-1 edges, the running time of this algorithm is O(n) and its cost is O(n 2). Definition of cost is (running time) (number of processors) Since the sequential running time of the Prim MST algorithm is O(n 2) and is time optimal, this parallel implementation is cost optimal. Cost & optimality will be covered in parallel algorithm performance evaluation chapter (See Ch 7 of Quinn) 42
Graph used for Data Structure a 2 2 8 7 b c 4 3 9 6 e d 3 f Figure 6 in [Potter, Baker, et. al.] 43
Data Structure for MST Algorithm current_best$ candidate$ parent$ mask$ node$ PEs d$ a$ e$ b$ c$ f$ no a 2 8 no b 2 7 4 3 a 2 yes c 8 7 6 9 b 7 yes d 4 3 b 4 IS yes e 3 6 3 b 3 a root next- node wait b f 9 44
Short Version of Algorithm: ASC-MST-PRIM(root) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. Initialize candidates to waiting If there are any finite values in root s field, set candidate$ to yes set parent$ to root set current_best$ to the values in root s field set root s candidate field to no Loop while some candidate$ contain yes for them restrict mask$ to mindex(current_best$) set next_node to a node identified in the preceding step set its candidate to no if the value in their next_node s field are less than current_best$, then set current_best$ to value in next_node s field set parent$ to next_node if candidate$ is waiting and the value in its next_node s field is finite set candidate$ to yes set parent$ to next_node set current_best to the values in next_node s field 45
Comments on ASC-MST Algorithm The three preceding slides are Figure 6 in [Potter, Baker, et.al.] IEEE Computer, Nov 1994]. Preceding slide gives a compact, data-structures level pseudo-code description for this algorithm Pseudo-code illustrates Potter s use of pronouns (e.g., them, its) and possessive nouns. The mindex function returns the index of a processor holding the minimal value. This MST pseudo-code is much shorter and simpler than data-structure level sequential MST pseudo- codes e.g., see one of Baase s textbooks cited in references Algorithm given in Baase s books is identical to this parallel algorithm, except for a sequential computer Next, a more detailed explanation of the algorithm in preceding slide will be given next. 46
Tracing 1st Pass of MST Algorithm on Figure 6 (Put below chart & Figure 6 on board) current_best$ candidate$ parent$ mask$ node$ PEs d$ a$ e$ b$ c$ f$ a 2 8 b 2 7 4 3 c 8 7 6 9 d 4 3 IS e 3 6 3 a root next- node f 9 47
Algorithm: ASC-MST-PRIM Initially assign any node to root. All processors set candidate$to wait current-best$ to the candidate fieldfor the rootnode to no All processors whose distance d from their node to root node is finite do Set their candidate$field to yes Set their parent$ field to root. Set current_best$ = d. 48
Algorithm: ASC-MST-PRIM (cont. 2/3) While the candidate field of some processor is yes , Restrict the active processors to those whose candidate field is yes and (for these processors) do Compute the minimum value x of current_best$. Restrict the active processors to those with current_best$ = x and do pick an active processor, say node y. Set the candidate$ value of node yto no Set the scalar variable next-node to y. 49
Algorithm: ASC-MST-PRIM (cont. 3/3) If the value z in the next_node column of a processor is less than its current_best$ value, then Set current_best$ to z. Set parent$ to next_node For all processors, if candidate$is waiting and the distance of its node from next_node y is finite, then Set candidate$to yes Set current_best$ to the distance of its node from y. Set parent$ to y 50