Marquette University, view of Wisconsin Avenue  

Module 14

Interacting with Files

Each Operating System has its own file structure with its own conventions. For instance, Windows OS specify files by a drive letter and uses two backward slashes for separating folder names in a path to a file, whereas UNIX based systems like MacOS or Linux have a single root directory and use forward slashes.

Files store data, but can do so in many formats. As an intermediate Python programmer, you should use the Python documentation to learn about the large varieties of encodings just for text files. Python recognizes many different encodings. Typically, we access files in text mode, when the encoding becomes important, however, in the English language space, the default encoding (called utf-8) tends to do a good job. We can also access files in binary mode, in which case the contents will be made available as is, without any interpretation.

Python accesses files by first opening them, then processing them and then closing them. Leaving a file unnecessary open is considered bad practice, since no other application can interact with the file. Fortunately, the Python 3 with-construct allows us to only keep a file open while we are using it. The open function has as first, mandatory parameter the name of the file (a string), and as second parameter another string that specifies the mode. There are three modes:

  • "t" for text mode (default)
  • "b" for binary mode
  • "r" for read mode (default)
  • "w" for write mode
  • "a" for append mode, where an existing file is replaced by a new, empty one.

These escapes can be combined, thus "rwt" is a mode where we read and write from a text file.

Reading from and writing to files

Python allows many ways to read and write from files. One write method is the write method, as in this example

with open('somefile.txt','wt') as f:
      f.write(str(500)+”\n")

where we open a file and call the file handler f. This will convert the number 500 into a string, add a newline to it, and write the results into the file f. For my taste, a simpler way is to redirect the target of the print method from standard output to a file. Now, we do not need to convert manually into strings and at the end, a newline will be automatically added unless we use the end parameter of print.

with open('somefile.txt','wt') as f:
      print(500, file = f)

We can read a number of bytes from the file using the read method:

with open('somefile.txt','rt') as f:
      data = f.read(10)

If we leave out the number, then the whole file will be read. More typically, we want to process a text file line by line. In this case,

with open('somefile.txt','wt') as f:
      for line in f:
            line=line.strip()
            for word in line.split()
            	print(word)

we will read the file line by line, then strip white spaces from the beginning and the end of the line, and then divide the line into words using split, which gives us a list of the line contents that were separated via white spaces. See the Python documentation for the use of arguments for both functions. This is almost the complete paradigm for processing a text file line by line.