dcp-sort

This application demonstrates how to sort values in a distributed manner using the Distributed Compute Protocol (DCP). If you have a good understanding of how DCP works and the terminology already please feel free to skip the refresher and jump straight to the set up.

A refresher

Click to see the refresher

DCP is a distributed computing framework made from web-based technology.

Computers and devices in classrooms, computer labs, households, and enterprises can be turned into computing clusters with a single click with DCP. The computational workload, a job, is sub-divided into smaller parts. These smaller parts are called slices. Slices are executed in parallel using DCP. Each slice of the job will correspond to a value in an array that we wish to sort. Slices will be sent to different Workers (for example, laptops, desktops, cell phones, etc.) which contain sandboxes. A sandbox is a clean environment on your Worker that runs your work function.

DCP is a powerful parallel computing framework that allows users to express a computational job as effortlessly as:

job = compute.for(inputSet, workFunction)

resultSet = await job.exec()

A workFunction is mapped onto each element in an inputSet. Each mapping represents a slice. Slices are distributed across DCP networks for computation. Results are returned to the user from DCP networks coherently.

Remember, the work function must contain progress();. Progress is a call made to tell the scheduler that the job is still alive and running. Progress is considered the heart beat of the job, without it the job would die and no results would be returned.

  1. Create compute nodes: go to https://dcp.work on as many devices as you want and click Start to join the public compute group, or to dcp.work/joinKey if you have a private compute group joinKey and joinSecret.

  2. Configure dev environment: load dcp-client and any required packages.

  3. Specify the inputSet: an arbitrary, but enumerable input dataset (parameters, mp3 files, images, blender project file, etc)

  4. Specify the workFunction: an arbitrary work function (physics simulation, inference model, rendering process, etc)

  5. Express the job: Map the workFunction onto the inputSet with job = compute.for(inputSet, workFunction)

  6. (optional) Specify a compute group with job.computeGroups=[{joinKey: name, joinSecret: password}]

  7. Await the resultSet: Execute the job in parallel on DCP via resultSet = await job.exec()

First, sign up for a DCP account at https://portal.distributed.computer. After signing up, download the id.keystore and default.keystore associated with your account. Refer to this tutorial for more information on how to download your keystores.

Set up

First, set up an HTML file. Require the dcp-client library in the first script tag, this allows the application to distribute work. Some pre-written HTML displays the output, but the rest of the tutorial is in JavaScript within the second script tag.

<!DOCTYPE html>
<html>
  <head>
    <script src="https://scheduler.distributed.computer/dcp-client/dcp-client.js"></script>
    <script>
      // The rest of the code will go here!
    </script>
  </head>
  <body>
    <p>Numbers: <input type="text" id="input-total" value="13" /></p>
    <p>Chunks: <input type="text" id="input-chunks" value="3" /></p>
    <input type="button" id="deploy" value="Deploy" onclick="deploy()" />

    <p>Source Set: <span id="source-set"></span></p>
    <p>Chunks Set: <span id="chunks-set"></span></p>
    <p>Sorted Chunks: <span id="sorted-chunks"></span></p>
    <p>Sorted Array: <span id="sorted-array"></span></p>
  </body>
</html>

Next, write the following helper functions. Note that they’re app-specific, not critical to DCP.

  • generateInput generates random numbers to sort.

  • splitInput() splits an array of numbers into different chunks, so that workers sort each chunk in parallel.

function generateInput(amount) {
  const array = [];
  for (let i = 0; i < amount; i += 1) {
    array.push(Math.floor(Math.random() * 1000));
  }
  return array;
}

function splitInput(arr, numChunks) {
  const chunks = [];
  for (let i = 0; i < numChunks; i += 1) chunks[i] = [];
  for (let i = 0; i < arr.length; i += 1) chunks[i % numChunks].push(arr[i]);
  return chunks;
}

Next, create the asynchronous deploy() function that will launch our job. Require the Compute API that accesses DCP. The two constants, TOTAL and CHUNKS will handle input from the user on the web page, and aren’t app-specific.

async function deploy() {
  const { compute } = dcp;
  const TOTAL = document.getElementById('input-total').value;
  const CHUNKS = document.getElementById('input-chunks').value;

  /* INPUT DATA */

  /* WORK FUNCTION */

  /* COMPUTE FOR */

  /* PROCESS RESULTS */
}

Input data

Within deploy(), use the helper functions to generate some random input and split them into chunks.

/* INPUT DATA */
const sourceSet = generateInput(TOTAL);
document
  .getElementById('source-set')
  .appendChild(document.createTextNode(sourceSet));

const inputSet = splitInput(sourceSet, CHUNKS);
for (let i = 0; i < CHUNKS; i += 1) {
  document
    .getElementById('chunks-set')
    .appendChild(document.createElement('br'));
  document
    .getElementById('chunks-set')
    .appendChild(document.createTextNode(inputSet[i]));
}

Work function

Within deploy(), enter the work function. DCP sends workers one element of the input set and performs this work function on it. In this app, the work function sorts a chunk and returns it.

/* WORK FUNCTION *
 * Returns one sorted chunk per slice */
async function workFn(arr) {
  progress();
  return arr.sort(function (a, b) {
    return a - b;
  });
}

compute.for

Within deploy(), write the code that launches the job using compute.for(). This distributes each element of the input set to workers in parallel, who execute the work function on one of those elements.

/* COMPUTE FOR */
const job = compute.for(inputSet, workFn);
job.public.name = 'dcp-sort - Web';

Compute group

If using a compute group, enter the below line with the join information for the specific group. Skip this section if you won’t be using a compute group.

// SKIP IF: you do not need a compute group
job.computeGroups = [{ joinKey: 'Your Key', joinSecret: 'Your Secret' }];

Process results

Within deploy(), await the results to return from job.exec(). The for loop below prints the results, which are sorted chunks. The chunks still need to be merged.

/* PROCESS RESULTS */
let resultSet = await job.exec();
resultSet = Array.from(resultSet);

for (let i = 0; i < CHUNKS; i += 1) {
  document
    .getElementById('sorted-chunks')
    .appendChild(document.createElement('br'));
  document
    .getElementById('sorted-chunks')
    .appendChild(document.createTextNode(resultSet[i]));
}

The last step is to merge these chunks together to make one sorted list. Since the chunks are already sorted, a greedy approach works well. The following function searches the first index of each chunk (which is always the smallest), figures out which chunk has the lowest first value, and pops that value into the new main sorted array. Once none of the chunks have any remaining values, the final array is complete and sorted.

The function chunkSort() handles this, and it’s not DCP-specific.

function chunkSort(arr) {
  const sorted = [];
  while (arr.length) {
    let currentChunk = arr[0];
    let bestChunkIndex = 0;
    let bestValue = currentChunk[0];
    for (let i = 1; i < arr.length; i += 1) {
      currentChunk = arr[i];
      if (currentChunk[0] < bestValue) {
        [bestValue] = currentChunk;
        bestChunkIndex = i;
      }
    }
    sorted.push(arr[bestChunkIndex].shift());
    if (arr[bestChunkIndex].length === 0) arr.splice(bestChunkIndex, 1);
  }
  return sorted;
}

Lastly, run the sorted chunks through chunkSort, and report the final results.

const sortedSet = chunkSort(resultSet);
document
  .getElementById('sorted-array')
  .appendChild(document.createTextNode(sortedSet));

Run it

Open the HTML file in a web browser, and press deploy.

Full code

Click to see full code.
<!DOCTYPE html>
<html>
  <head>
    <script src="https://scheduler.distributed.computer/dcp-client/dcp-client.js"></script>
    <script>
      /* GENERATE INPUT
       * Returns an array of 'amount' random numbers */
      function generateInput(amount) {
        const array = [];
        for (let i = 0; i < amount; i += 1) {
          array.push(Math.floor(Math.random() * 1000));
        }
        return array;
      }

      /* SPLIT INPUT
       * Takes input array arr and splits it into numChunks different chunks */
      function splitInput(arr, numChunks) {
        const chunks = [];
        for (let i = 0; i < numChunks; i += 1) chunks[i] = [];
        for (let i = 0; i < arr.length; i += 1) {
          chunks[i % numChunks].push(arr[i]);
        }
        return chunks;
      }

      async function deploy() {
        const { compute } = dcp;
        const TOTAL = document.getElementById('input-total').value;
        const CHUNKS = document.getElementById('input-chunks').value;

        /* INPUT DATA */
        const sourceSet = generateInput(TOTAL);
        document
          .getElementById('source-set')
          .appendChild(document.createTextNode(sourceSet));
        const inputSet = splitInput(sourceSet, CHUNKS);
        for (let i = 0; i < CHUNKS; i += 1) {
          document
            .getElementById('chunks-set')
            .appendChild(document.createElement('br'));
          document
            .getElementById('chunks-set')
            .appendChild(document.createTextNode(inputSet[i]));
        }

        /* WORK FUNCTION
         * Returns one sorted chunk per slice */
        async function workFn(arr) {
          progress();
          return arr.sort((a, b) => a - b);
        }

        /* COMPUTE FOR */
        const job = compute.for(inputSet, workFn);
        job.public.name = 'dcp-sort - Web';
        // job.computeGroups = [{ joinKey: 'Your Key', joinSecret: 'Your Secret' }];

        // Not mandatory console logs for status updates
        job.on('accepted', () => {
          console.log(` - Job accepted with id: ${job.id}`);
        });
        job.on('result', (ev) => {
          console.log(` - Received result ${ev}`);
        });

        /* PROCESS RESULTS */
        let resultSet = await job.exec();
        resultSet = Array.from(resultSet);
        for (let i = 0; i < CHUNKS; i += 1) {
          document
            .getElementById('sorted-chunks')
            .appendChild(document.createElement('br'));
          document
            .getElementById('sorted-chunks')
            .appendChild(document.createTextNode(resultSet[i]));
        }

        function chunkSort(arr) {
          const sorted = [];
          while (arr.length) {
            let currentChunk = arr[0];
            let bestChunkIndex = 0;
            let bestValue = currentChunk[0];
            for (let i = 1; i < arr.length; i += 1) {
              currentChunk = arr[i];
              if (currentChunk[0] < bestValue) {
                [bestValue] = currentChunk;
                bestChunkIndex = i;
              }
            }
            sorted.push(arr[bestChunkIndex].shift());
            if (arr[bestChunkIndex].length === 0) arr.splice(bestChunkIndex, 1);
          }
          return sorted;
        }

        const sortedSet = chunkSort(resultSet);
        document
          .getElementById('sorted-array')
          .appendChild(document.createTextNode(sortedSet));
        console.log(' - Job Complete');
      }
    </script>
  </head>
  <body>
    <p>Numbers: <input type="text" id="input-total" value="13" /></p>
    <p>Chunks: <input type="text" id="input-chunks" value="3" /></p>
    <input type="button" id="deploy" value="Deploy" onclick="deploy()" />

    <p>Source Set: <span id="source-set"></span></p>
    <p>Chunks Set: <span id="chunks-set"></span></p>
    <p>Sorted Chunks: <span id="sorted-chunks"></span></p>
    <p>Sorted Array: <span id="sorted-array"></span></p>
  </body>
</html>