This post is about what to do when you have a script or a program that you can't change that reads its input data from a file and/or writes its output to a file but when you want to force it to use pipes instead so that the input/output data never hit the disk.
Let's say that you have a script that processes some input data.
One way to implement this script would be to read the input data from the standard input and output to the standard output.
import sys for line in sys.stdin: print int(line.strip())**2
Which you would run like this:
$ echo 4 | python example.py 16 $ echo -e "4\n5" | python example.py 16 25
echo will print
\n literally as a slash followed by an n instead of a newline.)
But sometimes such scripts read the input data from a file and write output also to a file.
import sys with open(sys.argv, 'r') as input, open(sys.argv, 'w') as output: for line in input: output.write(str(int(line.strip())**2) + '\n')
$ echo -e "1\n2\n3" > input.txt $ python example2.py input.txt output.txt $ cat output.txt 1 4 9
But you'd like to avoid using files and pipe everything instead.
You can use
/dev/stdout as filenames and everything that is written to
/dev/stdout will be displayed on the screen or you can redirect it elsewhere.
echo hello > /dev/stdout will display
hello on the screen and that
/dev/stdin works similarly, we can rewrite our command as:
$ echo -e "1\n2\n3" | python example2.py /dev/stdin /dev/stdout 1 4 9
Multiple output files
What to do if you can have multiple outputs?
import sys with open(sys.argv, 'r') as input, \ open(sys.argv, 'w') as output2, \ open(sys.argv, 'w') as output3: for line in input: output2.write(str(int(line.strip())**2) + '\n') output3.write(str(int(line.strip())**3) + '\n')
This script will square and cube the input numbers.
$ echo -e "1\n2\n3" | python example3.py /dev/stdin output2.txt output3.txt $ head output* ==> output2.txt <== 1 4 9 ==> output3.txt <== 1 8 27
Here is what you can do to avoid using output files:
$ echo -e "1\n2\n3" | python example3.py /dev/stdin /dev/stdout /dev/stdout 1 1 4 8 9 27
Right, if you know that the script will output two lines you can read two lines at a time from a single output stream and process both lines as two separate output streams.
Another option is to use named pipes.
You can create a new named pipe with
mkfifo. This will create a new pipe that you can see with
ls and while it appears like a regular file its contents are never written to disk. When you write to a pipe it will block until you read everything that has been buffered so far from the pipe.
$ mkfifo output2 $ mkfifo output3 $ echo -e "1\n2\n3" | python example3.py /dev/stdin output2 output3 & $ cat output2 & $ cat output3 & 1 4 9 1 8 27
So here is what happens
example3.pyreads a line from
example3.pywrites a line to
cat output2unblocks, reads a line, prints it and then tries to read the next line but is blocked
output2was read, writes a line to
cat output3unblocks, reads a line, prints it and then tries to read the next line but is blocked
- go to the first step
This is why we need to run three processes at the same time.
cat output2 output3 will not work since
cat will try to read till the end of file but will never succeed because
example3.py will block since
output3 is not being read. This will cause a deadlock.
When you're done, don't forget to remove them:
$ rm output2 output3
If you're going to do this in bash, you can use bash process substitution instead which has nicer syntax:
$ echo -e "1\n2\n3" | python example3.py /dev/stdin >(cat) >(cat) 1 8 27 1 4 9
So instead of
cat you could use your program to process both outputs in parallel.