This post is about what to do when you have a script or a program that you can't change that reads its input data from a file and/or writes its output to a file but when you want to force it to use pipes instead so that the input/output data never hit the disk.
/dev/stdin
and /dev/stdout
/dev/stdin
and /dev/stdout
Let's say that you have a script that processes some input data.
One way to implement this script would be to read the input data from the standard input and output to the standard output.
import sys
for line in sys.stdin:
print int(line.strip())**2
Which you would run like this:
$ echo 4 | python example.py
16
$ echo -e "4\n5" | python example.py
16
25
(Without the -e
flag, echo
will print \n
literally as a slash followed by an n instead of a newline.)
But sometimes such scripts read the input data from a file and write output also to a file.
import sys
with open(sys.argv[1], 'r') as input, open(sys.argv[2], 'w') as output:
for line in input:
output.write(str(int(line.strip())**2) + '\n')
$ echo -e "1\n2\n3" > input.txt
$ python example2.py input.txt output.txt
$ cat output.txt
1
4
9
But you'd like to avoid using files and pipe everything instead.
You can use /dev/stdin
and /dev/stdout
as filenames and everything that is written to /dev/stdout
will be displayed on the screen or you can redirect it elsewhere.
Knowing that echo hello > /dev/stdout
will display hello
on the screen and that /dev/stdin
works similarly, we can rewrite our command as:
$ echo -e "1\n2\n3" | python example2.py /dev/stdin /dev/stdout
1
4
9
Multiple output files
What to do if you can have multiple outputs?
import sys
with open(sys.argv[1], 'r') as input, \
open(sys.argv[2], 'w') as output2, \
open(sys.argv[3], 'w') as output3:
for line in input:
output2.write(str(int(line.strip())**2) + '\n')
output3.write(str(int(line.strip())**3) + '\n')
This script will square and cube the input numbers.
$ echo -e "1\n2\n3" | python example3.py /dev/stdin output2.txt output3.txt
$ head output*
==> output2.txt <==
1
4
9
==> output3.txt <==
1
8
27
Here is what you can do to avoid using output files:
$ echo -e "1\n2\n3" | python example3.py /dev/stdin /dev/stdout /dev/stdout
1
1
4
8
9
27
Right, if you know that the script will output two lines you can read two lines at a time from a single output stream and process both lines as two separate output streams.
Named pipes
Another option is to use named pipes.
You can create a new named pipe with mkfifo
. This will create a new pipe that you can see with ls
and while it appears like a regular file its contents are never written to disk. When you write to a pipe it will block until you read everything that has been buffered so far from the pipe.
$ mkfifo output2
$ mkfifo output3
$ echo -e "1\n2\n3" | python example3.py /dev/stdin output2 output3 &
$ cat output2 &
$ cat output3 &
1
4
9
1
8
27
So here is what happens
example3.py
reads a line from/dev/stdin
example3.py
writes a line tooutput2
and blockscat output2
unblocks, reads a line, prints it and then tries to read the next line but is blockedexample3.py
unblocks sinceoutput2
was read, writes a line tooutput3
and blockscat output3
unblocks, reads a line, prints it and then tries to read the next line but is blocked- go to the first step
This is why we need to run three processes at the same time. cat output2 output3
will not work since cat
will try to read till the end of file but will never succeed because example3.py
will block since output3
is not being read. This will cause a deadlock.
When you're done, don't forget to remove them:
$ rm output2 output3
Unnamed pipes
If you're going to do this in bash, you can use bash process substitution instead which has nicer syntax:
$ echo -e "1\n2\n3" | python example3.py /dev/stdin >(cat) >(cat)
1
8
27
1
4
9
So instead of cat
you could use your program to process both outputs in parallel.