Python数据分析与展示_3_Numpy数据存取与函数

数据CSV文件存取

CSV(Comma‐Separated Value, 逗号分隔值)是一种常见的文件格式，用来存储批量数据

城市,环比,同比,定基
北京,101.5,120.7,121.4
上海,101.2,127.3,127.8
广州,101.3,119.4,120.0
深圳,102.0,140.9,145.5
沈阳,100.1,101.4,101.6

np.savetxt(frame, array, fmt='%.18e', delimiter=None)
• frame : 文件、字符串或产生器，可以是.gz或.bz2的压缩文件
• array : 存入文件的数组
• fmt : 写入文件的格式，例如：%d %.2f %.18e
• delimiter : 分割字符串，默认是任何空格

存：

import numpy as np
a=np.arange(100).reshape(5,20)
np.savetxt('C:/Users/HASEE/Desktop/a.csv',a,fmt='%d',delimiter=',')

Traceback (most recent call last):

  File "<ipython-input-7-725d28f63f7c>", line 1, in <module>
    np.savetxt('C:/Users/HASEE/Desktop/a.csv',a,fmt='%d',delimeter=',')

TypeError: savetxt() got an unexpected keyword argument 'delimeter'

问题原因:
在文件夹中复制地址时，文件夹中的地址是用 \ 来分隔不同文件夹的，而Python识别地址时只能识别用 / 分隔的地址。

解决方法:
将从文件夹中复制过来的地址中的 \ 都改成 / 

np.savetxt('C:/Users/HASEE/Desktop/a.csv',a,fmt='%d',delimiter=',')

取

np.loadtxt(frame, dtype=np.float, delimiter=None， unpack=False)
• frame : 文件、字符串或产生器，可以是.gz或.bz2的压缩文件
• dtype : 数据类型，可选
• delimiter : 分割字符串，默认是任何空格
• unpack  : 如果True，读入属性将分别写入不同变量

b=np.loadtxt('C:/Users/HASEE/Desktop/a.csv',delimiter=',',dtype=np.int)

b
Out[13]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
        56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
        76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
        96, 97, 98, 99]])

csv文件的局限性：

CSV只能有效存储一维和二维数组

np.savetxt() np.loadtxt()只能有效存取一维和二维数组

多维数据的存取

a.tofile(frame, sep='', format='%s')
• frame  : 文件、字符串
• sep : 数据分割字符串，如果是空串，写入文件为二进制
• format : 写入数据的格式

a=np.arange(100).reshape(5,10,2)
a.tofile('C:/Users/HASEE/Desktop/b.dat',sep=',',format='%d')
c=np.fromfile('C:/Users/HASEE/Desktop/b.dat',sep=',',dtype=np.int,count=-1)

c
Out[18]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

c.reshape(2,5,10)
Out[20]: 
array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]])
        
该方法需要读取时知道存入文件时数组的维度和元素类型
a.tofile()和np.fromfile()需要配合使用
可以通过元数据文件来存储额外信息

Numpy的随机文件存取

np.save(fname, array) 或 np.savez(fname, array)
• fname : 文件名，以.npy为扩展名，压缩扩展名为.npz
• array  : 数组变量
np.load(fname)
• fname : 文件名，以.npy为扩展名，压缩扩展名为.npz

 a=np.arange(100).reshape(5,10,2)

np.save('a.npy',a)

 b=np.load('a.npy')

b
Out[25]: 
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29],
        [30, 31],
        [32, 33],
        [34, 35],
        [36, 37],
        [38, 39]],

       [[40, 41],
        [42, 43],
        [44, 45],
        [46, 47],
        [48, 49],
        [50, 51],
        [52, 53],
        [54, 55],
        [56, 57],
        [58, 59]],

       [[60, 61],
        [62, 63],
        [64, 65],
        [66, 67],
        [68, 69],
        [70, 71],
        [72, 73],
        [74, 75],
        [76, 77],
        [78, 79]],

       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

Numpy随机数函数子库

函数

说明

rand(d0,d1,..,dn)

根据d0‐dn创建随机数数组，浮点数，[0,1)，均匀分布

randn(d0,d1,..,dn)

根据d0‐dn创建随机数数组，标准正态分布

randint(low[,high,shape])

根据shape创建随机整数或整数数组，范围是[low, high)

seed(s)

随机数种子，s是给定的种子值,种子值相同，生成的随机数也相同

shuffle(a)

根据数组a的第1轴进行随排列，改变数组x

permutation(a)

根据数组a的第1轴产生一个新的乱序数组，不改变数组x

choice(a[,size,replace,p])

从一维数组a中以概率p抽取元素，形成size形状新数组replace表示是否可以重用元素，默认为False

uniform(low,high,size)

产生具有均匀分布的数组,low起始值,high结束值,size形状

normal(loc,scale,size)

产生具有正态分布的数组,loc均值,scale标准差,size形状

poisson(lam,size)

产生具有泊松分布的数组,lam随机事件发生率,size形状

import numpy as np
a=np.random.rand(3,4,5)
a
out:
array([[[0.24639253, 0.722497  , 0.06705677, 0.57236565, 0.28976888],
        [0.72545351, 0.63711307, 0.7305934 , 0.62810739, 0.22117966],
        [0.27692999, 0.29420823, 0.881048  , 0.50637681, 0.99317356],
        [0.61826611, 0.13610396, 0.94085436, 0.83689825, 0.05277357]],

       [[0.56759999, 0.48501222, 0.99744752, 0.36442473, 0.10996119],
        [0.30532853, 0.99185963, 0.01528704, 0.9655763 , 0.07883292],
        [0.69017904, 0.34405313, 0.48902329, 0.90762022, 0.94073407],
        [0.99060258, 0.18003825, 0.15771573, 0.49471469, 0.49768674]],

       [[0.88907564, 0.60919579, 0.89118723, 0.72911511, 0.88404285],
        [0.10481751, 0.98548878, 0.66120233, 0.29016637, 0.57104031],
        [0.22982642, 0.14531348, 0.26788788, 0.28058991, 0.46626988],
        [0.684612  , 0.34908288, 0.55960948, 0.67505087, 0.04902906]]])

sn=np.random.randn(3,4,5)

sn
Out[28]: 
array([[[-0.8927074 , -0.07921713, -1.09702413,  0.20266238,
          2.07800266],
        [-0.30521372,  1.07882345,  0.15834808, -0.4657899 ,
         -0.67738772],
        [ 0.52078183,  0.73034311, -0.21416105, -1.77684991,
          0.98170757],
        [ 0.77941776, -0.5389379 ,  0.37604244,  0.31786087,
         -2.37803701]],

       [[ 0.11112126,  0.49939424, -1.06720594,  1.75672316,
          0.18743589],
        [ 3.23782667,  0.3871532 ,  0.8731636 , -0.8501687 ,
         -0.62653135],
        [ 0.99275262,  1.09478903,  0.15127731,  0.00602239,
          0.72496009],
        [ 0.05037592, -0.07816541,  1.07494759, -1.69539531,
         -1.45367689]],

       [[ 0.6453074 , -0.97600581, -0.21570961,  0.2988862 ,
          0.73129948],
        [-0.18624953,  1.17215876,  0.53122232, -1.24010898,
          1.05254842],
        [-1.38374598, -0.11569819, -0.1682294 ,  1.10782766,
         -0.15701692],
        [-1.55098208,  0.55973668,  1.84080928, -1.64429112,
         -0.07670816]]])

b=np.random.randint(100,200,(3,4))
b
Out[30]: 
array([[179, 170, 104, 104],
       [109, 120, 114, 116],
       [128, 175, 198, 163]])
       
a=np.random.randint(100,200,(3,4))

a
Out[32]: 
array([[135, 180, 159, 164],
       [107, 185, 165, 185],
       [107, 149, 165, 140]])

np.random.shuffle(a)

a
Out[34]: 
array([[107, 185, 165, 185],
       [135, 180, 159, 164],
       [107, 149, 165, 140]])

np.random.permutation(a)
Out[35]: 
array([[135, 180, 159, 164],
       [107, 149, 165, 140],
       [107, 185, 165, 185]])

a
Out[36]: 
array([[107, 185, 165, 185],
       [135, 180, 159, 164],
       [107, 149, 165, 140]])
       
 b=np.random.randint(100,200,(8,))

b
Out[38]: array([155, 117, 150, 145, 193, 119, 152, 166])

np.random.choice(b,(3,2))
Out[39]: 
array([[152, 117],
       [166, 117],
       [145, 145]])

np.random.choice(b,(3,2),replace=False)
Out[40]: 
array([[166, 145],
       [150, 152],
       [119, 117]])
       
# 值越大，抽取概率越大
np.random.choice(b,(3,2),p=b/np.sum(b))
Out[41]: 
array([[193, 193],
       [152, 193],
       [152, 193]])

u=np.random.uniform(0,10,(3,4))

u
Out[43]: 
array([[7.59440693, 9.14123945, 5.69234925, 0.11060994],
       [2.36784092, 0.82206082, 4.70415773, 2.80706347],
       [0.58551754, 0.76346577, 8.6746605 , 0.62805895]])

n=np.random.normal(10,5,(3,4))

n
Out[45]: 
array([[ 2.50436991,  0.45901779, 18.12367847, 14.49488327],
       [ 9.61865032,  6.03216628, -7.30214271, 11.20063806],
       [ 8.36891732,  9.19299271,  5.53896269,  3.87134051]])

Numpy统计函数

a
Out[47]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
       
np.mean(a)
Out[50]: 7.0

np.mean(a,axis=1)
Out[51]: array([ 2.,  7., 12.])

np.mean(a,axis=0)
Out[52]: array([5., 6., 7., 8., 9.])

np.average(a,axis=0,weights=[10,5,1])
Out[55]: array([2.1875, 3.1875, 4.1875, 5.1875, 6.1875])

np.std(a)
Out[56]: 4.320493798938574

np.var(a)
Out[58]: 18.666666666666668

b=np.arange(15,0,-1).reshape(3,5)

b
Out[60]: 
array([[15, 14, 13, 12, 11],
       [10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

np.max(b)
Out[61]: 15

np.argmax(b)
Out[62]: 0

np.unravel_index(np.argmax(b),b.shape)
Out[64]: (0, 0)

np.ptp(b)
Out[65]: 14

np.median(b)
Out[66]: 8.0

np.random的梯度函数

函数

说明

np.gradient(f)

计算数组f中元素的梯度，当f为多维时，返回每个维度梯度

梯度：连续值之间的变化率，即斜率
XY坐标轴连续三个X坐标对应的Y轴值：a, b, c，其中，b的梯度是： (c‐a)/2

a=np.random.randint(0,20,(5))

a
Out[68]: array([ 5, 18, 10,  1, 14])

np.gradient(a)
Out[69]: array([13. ,  2.5, -8.5,  2. , 13. ])
# 最边界的值用最后两个值相减除1，(18-5)/1=13,（14-1）/1=13

# 二维梯度
c=np.random.randint(0,50,(3,5))
c
Out[71]: 
array([[47, 25, 46,  3, 45],
       [47, 17, 15, 21, 22],
       [32,  5, 19, 49, 16]])

np.gradient(c)
Out[72]: 
[array([[  0. ,  -8. , -31. ,  18. , -23. ],
        [ -7.5, -10. , -13.5,  23. , -14.5],
        [-15. , -12. ,   4. ,  28. ,  -6. ]]),
 array([[-22. ,  -0.5, -11. ,  -0.5,  42. ],
        [-30. , -16. ,   2. ,   3.5,   1. ],
        [-27. ,  -6.5,  22. ,  -1.5, -33. ]])]
两个array分别粮食最外层，第二层维度的梯度，分别按列，按行计算