dSort

A refresher

This application demonstrates how to sort values in a distributed manner using the Distributed Compute Protocol (DCP). DCP is a distributed computing framework made from web-based technology.

Computers and devices in classrooms, computer labs, households, and enterprises can be turned into computing clusters with a single click with DCP. The computational workload, a job, is sub-divided into smaller parts. These smaller parts are called slices. Slices are executed in parallel using DCP. Each slice of the job will correspond to a value in an array that we wish to sort. Slices will be sent to different Workers (for example, laptops, desktops, cell phones, etc.) which contain sandboxes. A sandbox is a clean environment on your Worker that runs your work function.

DCP is a powerful parallel computing framework that allows users to express a computational job as effortlessly as:

job = compute.for(inputSet, workFunction)

resultSet = await job.exec()

A workFunction is mapped onto each element in an inputSet. Each mapping represents a slice. Slices are distributed across DCP networks for computation. Results are returned to the user from DCP networks coherently.

Remember, the work function must contain progress();. Progress is a call made to tell the scheduler that the job is still alive and running. Progress is considered the heart beat of the job, without it the job would die and no results would be returned.

  1. Create compute nodes: go to https://dcp.work on as many devices as you want and click Start to join the public compute group, or to dcp.work/joinKey if you have a private compute group joinKey and joinSecret.

  2. Configure dev environment: load dcp-client and any required packages.

  3. Specify the inputSet: an arbitrary, but enumerable input dataset (parameters, mp3 files, images, blender project file, etc)

  4. Specify the workFunction: an arbitrary work function (physics simulation, inference model, rendering process, etc)

  5. Express the job: Map the workFunction onto the inputSet with job = compute.for(inputSet, workFunction)

  6. (optional) Specify a compute group with job.computeGroups=[{joinKey: name, joinSecret: password}]

  7. Await the resultSet: Execute the job in parallel on DCP via resultSet = await job.exec()

An example of a distributed brute force sorting algorithm:

First, sign up for a DCP account at https://portal.distributed.computer. After signing up, download the id.keystore and default.keystore associated with your account. Refer to this tutorial for more information on how to download your keystores.

Set up

First, set up an HTML file. Require the dcp-client library in the first script tag, this allows the application to distribute work. Some pre-written HTML displays the output, but the rest of the tutorial is in JavaScript within the second script tag.

<!DOCTYPE html>
<html>
  <head>
    <script src="https://scheduler.distributed.computer/dcp-client/dcp-client.js"></script>
    <script>
      // The rest of the code will go here!
    </script>
  </head>
  <body>
    <p>Input Size: <input type="text" id="input-size" value="25" /></p>
    <input type="button" id="deploy" value="Deploy" onclick="deploy()" />
    <p>Input: <span id="input"></span></p>
    <p>Output: <span id="output"></span></p>
  </body>
</html>

Input data

Next, create a function to generate random numbers to use as the input set.

/* INPUT SET */
function generateInputSet(size) {
  const set = [];
  for (let i = 0; i < size; i += 1) {
    set.push(Math.floor(Math.random() * 100));
  }
  return set;
}

Work function

Each Worker runs the work function on one element of the input set. In this case, each Worker takes one input value, and determine its index in the output set.

/* WORK FUNCTION */
function workFn(value, set) {
  let index = 0;
  for (let j = 0; j < set.length; j += 1) {
    progress();
    if (set[value] > set[j]) {
      index += 1;
    }
  }
  return index;
}

Compute for

Now set up the DCP job. The function deploy() launches the job using the compute.for() statement. Pass it an iterable that goes from 0 to inputSet.length - 1 so that each value in the input gets sent to one sandbox. A sandbox is a clean environment on your Worker that runs your work function. Here,[arguments] sends the inputSet to each sandbox. Assign a public name for the job.

async function deploy() {
  // Some application-specific, HTML set up.  Not important to DCP
  const inputSize = document.getElementById('input-size').value;
  const inputSet = generateInputSet(inputSize);
  document
    .getElementById('input')
    .appendChild(document.createTextNode(inputSet));

  /* COMPUTE FOR */
  const { compute } = dcp;
  const job = compute.for(0, inputSet.length - 1, workFn, [inputSet]);
  job.public.name = 'dSort - Web';
}

Compute group

If you don’t need a compute group, skip this section. If you do, enter this line with the key information to join that group.

// SKIP IF: you do not need a compute group
job.computeGroups = [{ joinKey: 'Your Key', joinSecret: 'Your Key' }];

Process results

Next, await the results. The results, resultSet, are stored in an array whose indices correspond to the indices of the input, but whose values correspond to the indices of the output.

/* PROCESS RESULTS */
let resultSet = await job.exec();
resultSet = Array.from(resultSet);
const outputSet = [];

The rest of this is app-specific, not DCP-specific. Next, use the results to fill a sorted array, outputSet. In general, outputSet[resultSet[i]] = inputSet[i]. When two(+) inputs have the same value, they have the same output index. To account for having two(+) inputs of the same value, the dSort algorithm pushes forward an index when it encounters an index that already has the duplicate value.

// Creates sorted list from results (non-distributed)
for (let i = 0; i < resultSet.length; i += 1) {
  // First instance of value
  if (outputSet[resultSet[i]] === undefined) {
    outputSet[resultSet[i]] = inputSet[i];
  } else {
    // Further duplicates
    let next = 1;
    while (outputSet[resultSet[i] + next] !== undefined) {
      next += 1;
    }
    outputSet[resultSet[i] + next] = inputSet[i];
  }
}

document
  .getElementById('output')
  .appendChild(document.createTextNode(outputSet));

Run it

Open your HTML file in a web browser, and press deploy.

Full code

Click to see full code.
<!DOCTYPE html>
<html>
  <head>
    <script src="https://scheduler.distributed.computer/dcp-client/dcp-client.js"></script>
    <script>
      /* INPUT SET */
      function generateInputSet(size) {
        const set = [];
        for (let i = 0; i < size; i += 1) {
          set.push(Math.floor(Math.random() * 100));
        }
        return set;
      }

      /* WORK FUNCTION */
      // Each slice loops through the dataset and counts values below current slice value
      function workFn(value, set) {
        let index = 0;
        for (let j = 0; j < set.length; j += 1) {
          progress();
          if (set[value] > set[j]) {
            index += 1;
          }
        }
        return index;
      }

      async function deploy() {
        // Some application-specific, HTML set up.  Not super important to DCP
        const inputSize = document.getElementById('input-size').value;
        const inputSet = generateInputSet(inputSize);
        document
          .getElementById('input')
          .appendChild(document.createTextNode(inputSet));

        /* COMPUTE FOR */
        const { compute } = dcp;
        const job = compute.for(0, inputSet.length - 1, workFn, [inputSet]);
        job.public.name = 'dSort - Web';
        // job.computeGroups = [{ joinKey: '', joinSecret: '' }];

        // Not mandatory console logs for status updates
        job.on('accepted', () => {
          console.log(` - Job accepted with id: ${job.id}`);
        });
        job.on('result', (ev) => {
          console.log(` - Received result ${ev}`);
        });

        /* PROCESS RESULTS */
        let resultSet = await job.exec();
        resultSet = Array.from(resultSet);
        const outputSet = [];

        // Creates sorted list from results (non-distributed)
        for (let i = 0; i < resultSet.length; i += 1) {
          if (outputSet[resultSet[i]] === undefined) {
            outputSet[resultSet[i]] = inputSet[i];
          } else {
            let next = 1;
            while (outputSet[resultSet[i] + next] !== undefined) {
              next += 1;
            }
            outputSet[resultSet[i] + next] = inputSet[i];
          }
        }
        document
          .getElementById('output')
          .appendChild(document.createTextNode(outputSet));
        console.log(' - Job Complete');
      }
    </script>
  </head>
  <body>
    <p>Input Size: <input type="text" id="input-size" value="25" /></p>
    <input type="button" id="deploy" value="Deploy" onclick="deploy()" />
    <p>Input: <span id="input"></span></p>
    <p>Output: <span id="output"></span></p>
  </body>
</html>