一、首先,认识下文件
进行个总结:
计算机内的文件广义上来说,只有二进制文件
狭义上来讲分为两大类:二进制文件和文本文件。
先说数据的产生(即写操作)
文本文件的所有数据都是固定长度的,每条数据(也就是每个字符)都是1个字节。文本文件的“编/解码器”会将每条数据转换成ASCII码或者Unicode,然后以二进制的形式存到硬盘;
而二进制文件每条数据不固定,如short占2个字节,int占5个字节,float占8个字节(不一定,只是举个例子),这是二进制文件的写操作是将内存里的数据直接写入文件。
再说数据的读取:
文件的读过程是这样的:磁盘 》》 文件缓冲区》》应用程序内存空间。
我们说“文本文件和二进制文件没有区别”,实际上针对的是第一个过程;既然没有区别,那么打开方式不同,为何显示内容就不同呢?这个区别实际上是第二个过程造成的。
文件实际上包括两部分,控制信息和内容信息。纯文本文件仅仅是没有控制格式信息罢了;
1.以Numpy的multiarray.fromfile为例
numpy.fromfile()
def fromfile(file, dtype=None, count=-1, sep=''): # real signature unknown; restored from __doc__ """ fromfile(file, dtype=float, count=-1, sep='') Construct an array from data in a text or binary file. A highly efficient way of reading binary data with a known data-type, as well as parsing simply formatted text files. Data written using the `tofile` method can be read using this function. Parameters ---------- file : file or str Open file object or filename. dtype : data-type Data type of the returned array. For binary files, it is used to determine the size and byte-order of the items in the file. count : int Number of items to read. ``-1`` means all items (i.e., the complete file). sep : str Separator between items if file is a text file. Empty ("") separator means the file should be treated as binary. Spaces (" ") in the separator match zero or more whitespace characters. A separator consisting only of spaces must match at least one whitespace. See also -------- load, save ndarray.tofile loadtxt : More flexible way of loading data from a text file. Notes ----- Do not rely on the combination of `tofile` and `fromfile` for data storage, as the binary files generated are are not platform independent. In particular, no byte-order or data-type information is saved. Data can be stored in the platform independent ``.npy`` format using `save` and `load` instead. Examples -------- Construct an ndarray: >>> dt = np.dtype([('time', [('min', int), ('sec', int)]), ... ('temp', float)]) >>> x = np.zeros((1,), dtype=dt) >>> x['time']['min'] = 10; x['temp'] = 98.25 >>> x array([((10, 0), 98.25)], dtype=[('time', [('min', '>> import os >>> fname = os.tmpnam() >>> x.tofile(fname) Read the raw data from disk: >>> np.fromfile(fname, dtype=dt) array([((10, 0), 98.25)], dtype=[('time', [('min', ' >> np.save(fname, x) >>> np.load(fname + '.npy') array([((10, 0), 98.25)], dtype=[('time', [('min', '
值得注意的是,
Empty ("") separator means the file should be treated as binary.
也就是说,default情况下,是将文件按照二进制文件读取的,加上separator参数后会将二进制转换后的ASCII码或者unicode再解码为文本数据,
以test.txt文件为例(1对应的ASCII码十进制为49,","为44)
test.txt
1,1,1,1,1
(1)使用默认sep参数读取:
filepath = "D://Documents/temp/testForPyStruct.txt"data= np.fromfile(filepath , dtype=np.uint8, sep="")print(data)
输出
[49 44 49 44 49 44 49 44 49]
(2)使用sep=","读取:
filepath = "D://Documents/temp/testForPyStruct.txt"data= np.fromfile(filepath , dtype=np.uint8, sep=",")print(data)
输出
[1 1 1 1 1]
2.
See also -------- load, save ndarray.tofile loadtxt : More flexible way of loading data from a text file.