Indivisible. An instruction sequence executed by a strand is atomic if it appears at any moment to any other strand as if either no instructions in the sequence have been executed or all instructions in the sequence have been executed.
An extension to the C and C++ programming languages that allows a programmer to express task parallelism. Important milestones in the history of Cilk include Cilk-1, Cilk-5, Cilk++, Cilk Plus, and OpenCilk.
An important milestone in the history of Cilk developed by CilkArts, Inc., Cilk++ extended Cilk-5 to C++ and introduced reducer hyperobjects as an efficient means for resolving races on nonlocal variables.
The first important milestone in the history of Cilk, Cilk-1 was developed at MIT and provided provably efficient work-stealing runtime support but little linguistic support.
The Cilksan race detector is a tool provided in OpenCilk for finding race condition defects in Cilk code.
A keyword in the Cilk language that
for loop whose iterations
can be executed independently in
A keyword in the Cilk language that indicates that all spawned children within the scoped region must complete before proceeding.
A keyword in the Cilk language that indicates that the named subroutine can execute independently and in parallel with the caller.
A keyword in the Cilk language that
indicates that all functions spawned
within the current function must complete
before statements following the
cilk_sync can be executed.
A single processor unit of a multicore chip. The terms "processor" and "CPU" are often used in place of "core," although industry usage varies. Archaic: A solid-state memory made of magnetized toroidal memory elements.
Central Processing Unit. We use this term as a synonym for "core," or a single processor of a multicore chip.
A race condition that occurs when two or more parallel strands, holding no lock in common, access the same memory location and at least one of the strands performs a write. Compare with determinacy race.
A situation when two or more strands are each waiting for another to release a resource, and the "waiting-for" relation forms a cycle so that none can ever proceed.
The property of a program when it behaves identically from run to run when executed on the same inputs. Deterministic programs are usually easier to debug.
Computer storage that is partitioned among several processors. A distributed-memory multiprocessor is a computer in which processors must send messages to remote processors to access data in remote processor memory. Contrast with shared memory.
How long a program takes to execute on a given computer system. Also called running time.
A construct that
Cilksan treats as
a lock but which behaves like a no-op
during actual running of the program. A
fake lock can be used to suppress the
reporting of an intentional race condition.
The situation that occurs when two strands access different memory locations residing on the same cache block, thereby contending for the cache block. For more information, see the Wikipedia entry https://en.wikipedia.org/wiki/False_sharing.
A variable that is bound outside of all local scopes. See also nonlocal variable.
A linguistic construct supported by the OpenCilk runtime system that allows many strands to coordinate in updating a shared variable or data structure independently by providing different views of the hyperobject to different strands at the same time. The reducer is the only hyperobject currently provided by OpenCilk.
A single operation executed by a processor.
A point at which the end of one strand meets the end of another. If a knot has one incoming strand and one outgoing strand, it is a serial knot. If it has one incoming strand and two outgoing strands, it is a spawn knot. If it has multiple incoming strands and one outgoing strand, it is a sync knot. A Cilk program does not produce serial knots or knots with both multiple incoming and multiple outgoing strands.
A synchronization mechanism for providing atomic operation by limiting concurrent access to a resource. Important operations on locks include acquire (lock) and release (unlock). Many locks are implemented as a mutex, whereby only one strand can hold the lock at any time.
A computer containing multiple general-purpose processors.
A "mutually exclusive" lock that only one
strand can acquire at a time, thereby
ensuring that only one strand executes
the critical section protected by the
mutex at a time.
For example, Linux* OS supports Pthreads
The property of a program when it behaves differently from run to run when executed on exactly the same inputs. Nondeterministic programs are usually hard to debug.
A program variable that is bound outside
of the scope of the function, method, or
class in which it is used. In Cilk
programs, we also use this term to refer
to variables with a scope outside a
for loop all of whose iterations can be
run independently in parallel. The
cilk_for keyword designates a parallel
Perfect linear speedup
A self-contained concurrent agent that by default executes a serial chain of instructions. More than one thread may run within a process, but a process does not usually share memory with other processes. Scheduling of processes is typically managed by the operating system.
A processor implements the logic to execute program instructions sequentially; we use the term "core" as a synonym. This document does not use the term "processor" to refer to multiple processing units on the same or multiple chips, although other documents may use the term that way.
A variable to receive the result of a function call.
- A default constructor which initializes the reducer to its identity value
reduce()method which merges the value of right reducer into the left (this) reducer.
The time it takes to execute a computation from the time a human user provides an input to the time the user gets the result.
How long a program takes to execute on a given computer system. Also called execution time.
The ability of a parallel application to run efficiently on one or a small number of processors.
The memory model for concurrency wherein the effect of concurrent agents is as if their operations on shared memory were interleaved in a global order consistent with the orders in which each agent executed them. This model was advanced in 1976 by Leslie Lamport.
Execution of the serial projection of a Cilk program.
The C or C++ program that results from
stubbing out the keywords of a Cilk
cilk_sync are elided and
replaced with an ordinary
serial projection can be used for debugging
and, in the case of a converted C/C++
program, will behave exactly as the
original C/C++ program. The terms "serialization" and "serial elision" are used in some of the literature.
Also, see "serial semantics".
The behavior of a Cilk program when executed as the serial projection of the program. See the following article: Four Reasons Why Parallel Programs Should Have Serial Semantics.
See serial projection.
Computer storage that is shared among several processors. A shared-memory multiprocessor is a computer in which each processor can directly address any memory location. Contrast with distributed memory.
The theoretically fastest execution time
for a parallel program when run on an
infinite number of processors,
discounting overheads for
communication and scheduling. Often
To call a function without waiting for it to return, as in a normal call. The caller can continue to execute in parallel with the called function. See also cilk_spawn.
How many times faster a program is
when run in parallel than when run on
one processor. Speedup can be
computed by dividing the running time
Task-parallel platforms provide a layer of software on top of threads to coordinate, schedule, and manage the processors of a multicore. Some task-parallel platforms are built as runtime libraries, but others provide full-fledged parallel languages with compiler and runtime support.
Task-parallel programming allows parallelism to be specified in a "processor-oblivious" fashion, where the programmer identifies what computational tasks may run in parallel but does not indicate which thread or processor performs the task. Thus, the programmer is freed from worrying about communication protocols, load balancing, and other vagaries of thread programming. The task-parallel platform contains a scheduler, which automatically load-balances the tasks across the processors, thereby greatly simplifying the programmer’s chore. Task-parallel algorithms provide a natural extension to ordinary serial algorithms, allowing performance to be reasoned about mathematically using "work/span analysis."
A thread executes a serial instruction chain. Scheduling of threads is typically managed by the operating system.
A number of operations performed per unit time.
The running time of a program when run on one processor,
sometimes denoted by
A scheduling strategy where processors post parallel work locally and, when a processor runs out of local work, it steals work from another processor. Work-stealing schedulers are notable for their efficiency, because they incur no communication or synchronization overhead when there is ample parallelism. The OpenCilk runtime system employs a work-stealing scheduler.