peteris.rocks

Pipes as input/output files

Force programs and scripts to write to pipes in memory instead of files on disk

Last updated on

This post is about what to do when you have a script or a program that you can't change that reads its input data from a file and/or writes its output to a file but when you want to force it to use pipes instead so that the input/output data never hit the disk.

/dev/stdin and /dev/stdout

Let's say that you have a script that processes some input data.

One way to implement this script would be to read the input data from the standard input and output to the standard output.

import sys

for line in sys.stdin:
  print int(line.strip())**2

Which you would run like this:

$ echo 4 | python example.py
16

$ echo -e "4\n5" | python example.py
16
25

(Without the -e flag, echo will print \n literally as a slash followed by an n instead of a newline.)

But sometimes such scripts read the input data from a file and write output also to a file.

import sys

with open(sys.argv[1], 'r') as input, open(sys.argv[2], 'w') as output:
  for line in input:
    output.write(str(int(line.strip())**2) + '\n')
$ echo -e "1\n2\n3" > input.txt
$ python example2.py input.txt output.txt
$ cat output.txt
1
4
9

But you'd like to avoid using files and pipe everything instead.

You can use /dev/stdin and /dev/stdout as filenames and everything that is written to /dev/stdout will be displayed on the screen or you can redirect it elsewhere.

Knowing that echo hello > /dev/stdout will display hello on the screen and that /dev/stdin works similarly, we can rewrite our command as:

$ echo -e "1\n2\n3" | python example2.py /dev/stdin /dev/stdout
1
4
9

Multiple output files

What to do if you can have multiple outputs?

import sys

with open(sys.argv[1], 'r') as input, \
     open(sys.argv[2], 'w') as output2, \
     open(sys.argv[3], 'w') as output3:
  for line in input:
    output2.write(str(int(line.strip())**2) + '\n')
    output3.write(str(int(line.strip())**3) + '\n')

This script will square and cube the input numbers.

$ echo -e "1\n2\n3" | python example3.py /dev/stdin output2.txt output3.txt
$ head output*
==> output2.txt <==
1
4
9

==> output3.txt <==
1
8
27

Here is what you can do to avoid using output files:

$ echo -e "1\n2\n3" | python example3.py /dev/stdin /dev/stdout /dev/stdout
1
1
4
8
9
27

Right, if you know that the script will output two lines you can read two lines at a time from a single output stream and process both lines as two separate output streams.

Named pipes

Another option is to use named pipes.

You can create a new named pipe with mkfifo. This will create a new pipe that you can see with ls and while it appears like a regular file its contents are never written to disk. When you write to a pipe it will block until you read everything that has been buffered so far from the pipe.

$ mkfifo output2
$ mkfifo output3
$ echo -e "1\n2\n3" | python example3.py /dev/stdin output2 output3 &
$ cat output2 &
$ cat output3 &
1
4
9
1
8
27

So here is what happens

This is why we need to run three processes at the same time. cat output2 output3 will not work since cat will try to read till the end of file but will never succeed because example3.py will block since output3 is not being read. This will cause a deadlock.

When you're done, don't forget to remove them:

$ rm output2 output3

Unnamed pipes

If you're going to do this in bash, you can use bash process substitution instead which has nicer syntax:

$ echo -e "1\n2\n3" | python example3.py /dev/stdin >(cat) >(cat)
1
8
27
1
4
9

So instead of cat you could use your program to process both outputs in parallel.