Lichord

VS code 使用

发表于 2019-07-03 更新于 2019-09-10 分类于工具

预览Markdown:
- ctrl+shift+p 选择Markdown open preview…
- ctrl+shift+v
安装插件
- ctrl + P 然后输入 >ext install
替换文字：ctrl+f 在弹出的查找框中点开左边箭头
转pdf,html,图片：F1 输入export

IDEA常用快捷键

发表于 2019-07-03 更新于 2019-09-08 分类于工具

Ctrl+D 复制换行
Ctrl+Alt+Enter 另起一行
选中代码ctrl+alt+t 可以让代码被指定语句块包住
Ctrl+N查找类
替换字符Ctrl+r
Ctrl+click查看鼠标所在类的源码
Alt+Enter鼠标所在类的方法实现
Ctrl+h 查看继承关系

IDEA使用问题

发表于 2019-07-03 更新于 2019-09-08 分类于工具

打开软件时闪退

插件不兼容导致：C:\Users\HASEE.IntelliJIdea2018.1\config\plugins文件夹下删除对应插件文件夹

双系统安装记录

发表于 2019-07-02 更新于 2019-09-08 分类于电脑

目标：在装了win10的电脑上装个Ubuntu，并将C盘的软件迁移到其他非系统盘

情况

win10的系统，笔记本一个只有250G的固态，剩余60G空间。只有一个C盘，担心哪天系统完蛋安装的软件数据都没了，毕竟那么多的开发工具要重装还是很费时。

同时想给电脑再装一个linux双系统,磁盘空间不够

一. 重要数据备份

磁盘分区，将数据备份到非系统盘

按照win10默认的分区方法，可压缩的空间只有2G

1. 网上解决方案(尝试无效)

估计是可能有些文件占用了较后的存储位置，导致空间不连续，无法合并成为新的空间。

打开”这台电脑“，在想压缩的分区上右击选择”属性“，在打开的对话框中选择”工具“页，尝试了“检查”和“优化”，将文件存储优化。
在桌面上的“这台电脑”上右单击，选择“属性”，在左侧菜单中选择“系统保护”，在弹出框中选中要压缩的分区，点击“配置”，在弹出框中选择“禁用系统保护”后点击“确定”，并重启系统。
可能是有文件正在使用，系统无法调整并将其空间腾出，就尝试关闭虚拟内存。
在“这台电脑”上右击，选择“属性”，在弹出框中选择左侧菜单“高级系统设置”，在弹出框的高级分页下，点击第一个性能框中的”设置“按钮，在弹出框中选择”高级”页，点击虚拟内存分页右下角的“更改”按钮。在弹出的标题为“虚拟内存”的对话框中，取消“自动管理所有驱动器的分页文件大小”前面的勾，同时选择”无分页文件“，点击右侧的”设置“，后”确认“，根据提示，重启系统。

2. 用软件分区(成功)

下载了傲然分区助手，按照教程分区成功，但是分区过程非常非常非常漫长

三. 迁移软件

分区之后想尝试一下把C盘的软件迁移到D盘，于是下载了一个腾讯电脑管家来帮助我迁移

经过再次漫长的等待,显示将Anaconda迁移成功了，但是当我想要开启Anaconda，黑屏了，原因未知，重启之后就没事了

解决迁移软件问题

先确保软件是否的确在电脑上迁移成功，我在命令行输入conda --version是无效命令，所以先将之前配置的环境变量改成现有目录，改变环境变量之后再输入命令就能看到版本了。
原本安装在C盘的快捷方式都无效了，安装目录下又找不到程序入口

解决：找到Ananconda的安装目录下的Menu文件夹，命令行打开，输入python .\Lib\_nsis.py mkmenus就会重新创建快捷方式了

参考：https://blog.csdn.net/qq_42580947/article/details/90671836

打开Spyder的快捷方式，报错：This application failed to start because it could not find or load the Qt platfrom plugin “windows” in “”.Reinstalling the application may fix this problem
按照网上方法将\Anaconda3\Library\plugins下的platform文件夹拷贝到\Anaconda下重新打开，没有反应
我突然想看一下其他没有移动的软件是否存在问题，打开Pycharm，IDEA都报错failed to create jvm error code -4
网上说是内存不足导致的，我想可能是压缩完之后碎片比较多造成的
此外，我在查找解决方案的过程中，Chrome和IE浏览器都频繁卡死，以及出现程序错误的报错信息，Chrome也提示内存不足，无法打开页面，甚至造成电脑重启。
打开此电脑->左击C盘->点击窗口上方的管理->优化
对C盘进行优化之后，IDEA，PYcharm，还有Anaconda的软件都能打开了
想到之前为了分区修改了系统的配置，我又重新将配置改回以前的样子。对电脑进行重启。问题解决
总结
迁移软件虽然软件安装目录变了，配置文件等等数据文件却都还在C盘，系统崩了一样麻烦，所以迁移软件并不是什么好的方法，完全迁移时间成本太高，不如重装。

二. 安装双系统

装双系统的方案

1. 双硬盘单引导分区。

两个硬盘分别装两个系统，两个系统分主从；在主系统所在盘上划分EFI分区，从系统所在硬盘不划分EFI分区；两个系统的引导都放在主系统硬盘上的EFI分区，两个系统共用一个引导程序，一般是grub2。开机使用grub2选择系统。这是绝大多数的做法

优点：引导简单，绝大多数电脑都可以快速选择系统；
缺点是主系统删除麻烦，还得重新建立从系统的引导

2. 双硬盘双引导分区

这样的方案适用于开机可以快速选择启动硬盘的电脑
具体实现方法：单独安装一个硬盘，分别给两个硬盘做好系统，然后再将两个硬盘都放入电脑。
如果使用这种方式，两个硬盘分别各有一个EFI分区，各自有各自的引导程序；系统的引导文件分别在各自的硬盘里面，互不干扰；两个系统分别使用各种的引导程序进行引导，即Linux使用grub2, Windows使用Windows boot manager；开机通过选择硬盘进行系统的选择，而不是引导程序选择系统

优点：双系统随意搞，一个系统搞垮完全不会影响另一个系统，删除任意一个系统也不需要重新建立引导；
缺点：部分电脑选择启动硬盘可能比较麻烦，视情况而定

重启看了一下开机时的选项，按F2 setup进入bios,按F7是Boot Option可以选择启动硬盘所以方案2 相对来说还是更安全方便一些

步骤

1. 加装固态硬盘

2. 刻录光盘

下载Ubuntu镜像文件，地址：https://cn.ubuntu.com/download
Windows下可以通过UltraISO来制作U盘系统启动盘,安装UltraISO
准备一个空U盘
打开UltrsalSO,选中本地的Ubuntu iOS影像文件，点击菜单栏启动->写入硬盘映像，选中u盘，
修改写入方式为raw,写入成功之后U盘会变小，可以安装完系统之后重新选择默认的写入方式写入一遍恢复U盘。否则用U盘启动会报Failed to load ldlinux.c32 Boot failed：please change disks and press a key to continue错误并发出警报声
点击写入，等待。
3. 安装系统
踩坑路程

第一遍安装按照指引非常快速地装完了，基本上需要手动更改的就是安装类型选择其他选项，自己设置分区大小。
安装操作大概是这样：https://www.jianshu.com/p/54d9a3a695cc
安装完，点击重启，界面卡死，于是强制重启
存在的问题就是系统重启会卡死，以及界面非常不流畅，鼠标滑过界面卡顿明显
总而言之就是显卡的问题，之后一切的操作都围绕显卡问题不断从一个坑到另一个坑
按照网上的方法不断尝试碰到的问题

启动失败/dev/sda1 * :clean, / files,/ blocks

网上说先Ctrl+Alt+F1……F6，具体f几自己试，进入命令行，我都不行
于是重启，设置在本次启动时用命令后方式打开：https://jingyan.baidu.com/article/3052f5a104b9b797f31f86b0.html#!/article/3052f5a104b9b797f31f86b0
之后，使用 sudo apt-get purge nvidia* 卸载掉之前的驱动
reboot

循环登陆：每次输入密码之后重新跳到登陆页面

尝试1：（无效）

Ctrl+alt+F1进入终端，输入自己的账号密码
将Xauthority的拥有者改为Linux用户,这里lixian是我的用户名，sudo chown lixian:lixian .Xauthority
终端显示-row------ 1 lixian lixian ....就算正确了
Ctrl+alt+F7回到图形页面登录
https://jingyan.baidu.com/article/08b6a591b16dbf14a80922e4.html#!/article/08b6a591b16dbf14a80922e4

尝试2：（成功）还是驱动问题

sudo apt-get remove –purge nvidia-*
sudo apt-get autoremove
sudo apt-get install -f
sudo reboot
重启成功
https://blog.csdn.net/tangwenbo124/article/details/79120677

黑屏

这些问题出现的解决方案其实都是想办法进入命令行把安装的驱动卸载

驱动安装成功但没有效果

驱动安装成功：输入nvidia-smi会显示版本信息之类的
输入nvidia-settingsERROR: Unable to load info from any available system

实在没办法了我选择重装系统

重装系统的时候看这个：https://www.jianshu.com/p/54d9a3a695cc 总结那里正是我想说的，网上什么乱起乱七八糟的方法我都试过，都不行，其实不用那么复杂。在装系统的时候，系统进入引导界面让你选择语言然后选安装还是适用的界面之前，按e,进入设置的界面，会让你选择语言，选完之后选择安装Ubuntu，删掉quiet splash后面的—，加上nomodeset 再进行安装
- 安装驱动的命令其实是很简单的，如果卸载重装驱动不行，不如重装系统

hexo+github Page搭建博客与配置

发表于 2019-07-01 更新于 2019-09-08 分类于博客配置

前提：安装了Node.js,Git，注册github账号，并且创建了一个{githubname}.github.io的仓库

安装与配置Hexo

打开命令行输入 npm install -g hexo-cli
本地新建空文件夹(文件夹名字可以随意取)
目录切换到新建文件夹下
初始化hexo init ./
安装依赖 npm install
hexo generator
hexo s --debug
在浏览器输入 http://localhost:4000 进入自己的博客

配置自己喜欢的主题

我用的主题：https://github.com/tufu9441/maupassant-hexo 按照教程安装配置主题
按照教程安装配置主题：https://www.haomwei.com/technology/maupassant-hexo.html

hexo部署到Github

修改{githubname}.github.io目录下的配置文件_config.yml

# Deployment
## Docs: https://hexo.io/docs/deployment.html
deploy:
  type: git
  repo: https://github.com/your_githubName/your_githubName.github.io.git

通过下面的命名进行博客静态页面的生成，以及部署到远端Github Pages

# 安装部署插件
$ npm install hexo-deployer-git --save
#删除静态文件,即 public 文件
$ hexo clean
#生成静态文件,即 public 文件
$ hexo generate
#部署到远程站点
$ hexo deploy
#也可以使用组合命令(替代上面2条命令)：生成静态命令并部署到远程站点
$ hexo deploy -g

设置不渲染 README.md

1
2
3

# 在source目录下新建README.md
#修改_config.yml文件,设置不渲染的文件
skip_render: README.md

常用hexo命令

hexo new "postName" #新建文章
hexo new page "pageName" #新建页面
hexo generate #生成静态页面至public目录
hexo server #开启预览访问端口（默认端口4000，'ctrl + c'关闭server）
hexo deploy #部署到GitHub
hexo help  # 查看帮助
hexo version  #查看Hexo的版本

# 缩写
hexo n == hexo new
hexo g == hexo generate
hexo s == hexo server
hexo d == hexo deploy

# 组合
hexo s -g #生成并本地预览
hexo d -g #生成并上传

_config.yml全局配置

# Hexo Configuration
## Docs: https://hexo.io/docs/configuration.html
## Source: https://github.com/hexojs/hexo/

# Site
title: Banana
subtitle: 
description:
keywords:
author: Lixian
language: zh-CN
timezone:

# URL
## If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'
url: http://yoursite.com
root: /
permalink: :year/:month/:day/:title/
permalink_defaults:

# Directory
source_dir: source
public_dir: public
tag_dir: tags
archive_dir: archives
category_dir: categories
code_dir: downloads/code
i18n_dir: :lang
skip_render: README.md  

# Writing
new_post_name: :title.md # File name of new posts
default_layout: post
titlecase: false # Transform title into titlecase
external_link: true # Open external links in new tab
filename_case: 0
render_drafts: false
post_asset_folder: false
relative_link: false
future: true
highlight:
  enable: true
  line_number: true
  auto_detect: true
  tab_replace:
  
# Home page setting
# path: Root path for your blogs index page. (default = '')
# per_page: Posts displayed per page. (0 = disable pagination)
# order_by: Posts order. (Order by date descending by default)
index_generator:
  path: ''
  per_page: 10
  order_by: -date
  
# Category & Tag
default_category: uncategorized
category_map:
tag_map:

# Date / Time format
## Hexo uses Moment.js to parse and display date
## You can customize the date format as defined in
## http://momentjs.com/docs/#/displaying/format/
date_format: YYYY-MM-DD
time_format: HH:mm:ss

# Pagination
## Set per_page to 0 to disable pagination
per_page: 10
pagination_dir: page

# Extensions
## Plugins: https://hexo.io/plugins/
## Themes: https://hexo.io/themes/
theme: Pacman

# Deployment
## Docs: https://hexo.io/docs/deployment.html
deploy:
  type: git
  repo: https://github.com/lichord/lichord.github.io.git

冒号后面必须有一个空格，否则可能会出问题。

md文件开头写法

---
title: hexo+github Page搭建博客与配置
date: 2019-07-12 10:14:23
tags:
    - hexo
    - tag2
categories:
    - github page
    - category2
---

参考&更多:

Python数据分析与展示_6_图像手绘效果实例

发表于 2019-04-05 更新于 2019-09-08 分类于 python

PIL库

PIL,Python Image Library,图像处理库
from PIL import Image
图像的数组表示

图像是一个由像素组成的二维矩阵，每个元素是一个RGB值

from PIL import Image
import numpy as np
im=np.array(Image.open('C:/Users/HASEE/DeskTop/flower.jpg'))
# 图像是一个三位数组，维度分别是高度，宽度，像素RGB值
print(im.shape,im.dtype)
(135, 180, 3) uint8

图像的变换

读入图像，获取像素RGB值，修改后另存为新的文件

# 获取原图片的补值
b=[255,255,255]-im
image=Image.fromarray(b.astype('uint8'))
image.save('C:/Users/HASEE/DeskTop/flower2.jpg')

# convert('L')表示将一个彩色的图片变成一个灰度值图片,是一个二维数组，其中每一个元素对应一个灰度值
a=np.array(Image.open('C:/Users/HASEE/DeskTop/flower.jpg').convert('L'))
# 取反
b=255-a
# Image.fromarray还原为图像类型
image=Image.fromarray(b.astype('uint8'))
image.save('C:/Users/HASEE/DeskTop/flower3.jpg')

# 灰度压缩，+150变换区间
c=(100/255)*a+150
image=Image.fromarray(c.astype('uint8'))
image.save('C:/Users/HASEE/DeskTop/flower4.jpg')

# 像素平方
d=255*(a/255)**2
image=Image.fromarray(d.astype('uint8'))
image.save('C:/Users/HASEE/DeskTop/flower5.jpg')

图像的手绘效果

手绘效果的几个特征：

黑白灰色
边界线条较重
相同或相近色彩趋于白色
略有光源效果

Python数据分析与展示_5_Pandas数据特征分析

发表于 2019-04-04 更新于 2019-09-08 分类于 python

数据的排序

对一组数据的理解

一组数据表达一个或多个含义

数据摘要：数据形成有损特征的过程

基本统计（含排序）
分布/累计统计
数据特征
相关性、周期性等

数据挖掘(形成知识)

Pandas库的数据排序

对索引操作进行排序

.sort_index()方法在指定轴上根据索引进行排序，默认升序
.sort_index(axis=0, ascending=True)

b=pd.DataFrame(np.arange(20).reshape(4,5),index=['c','a','d','b'])
b
Out[99]: 
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19

b.sort_index()
Out: 
    0   1   2   3   4
a   5   6   7   8   9
b  15  16  17  18  19
c   0   1   2   3   4
d  10  11  12  13  14

b.sort_index(ascending=False)
Out: 
    0   1   2   3   4
d  10  11  12  13  14
c   0   1   2   3   4
b  15  16  17  18  19
a   5   6   7   8   9

b.sort_index(axis=1,ascending=False)
Out[102]: 
    4   3   2   1   0
c   4   3   2   1   0
a   9   8   7   6   5
d  14  13  12  11  10
b  19  18  17  16  15

根据数值进行排序

.sort_values()方法在指定轴上根据数值进行排序，默认升序
Series.sort_values(axis=0, ascending=True)
DataFrame.sort_values(by, axis=0, ascending=True)

by : axis轴上的某个索引或索引列表

.sort_values(by, axis=0, ascending=True)

b
Out: 
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19

#对第2列进行排序
c=b.sort_values(2,ascending=False)
c
Out: 
    0   1   2   3   4
b  15  16  17  18  19
d  10  11  12  13  14
a   5   6   7   8   9
c   0   1   2   3   4

c=c.sort_values('a',axis=1,ascending=False)
c
Out: 
    4   3   2   1   0
b  19  18  17  16  15
d  14  13  12  11  10
a   9   8   7   6   5
c   4   3   2   1   0

NaN值统一放到排序末尾

数据的基本统计分析

基本统计分析函数

适用于Series和DateFrame类型

方法

说明

.sum()

计算数据的总和，按0轴计算，下同

.count()

非NaN值的数量

.mean() .median()

计算数据的算术平均值、算术中位数

.var() .std()

计算数据的方差、标准差

.min() .max()

计算数据的最小值、最大值

.describe()

针对0轴（各列）的统计汇总

适用Series类型

方法

说明

—

.argmin() .argmax()

计算数据最大值、最小值所在位置的索引位置（自动索引）

.idxmin() .idxmax()

计算数据最大值、最小值所在位置的索引（自定义索引）

 a
Out[109]: 
   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

a=pd.Series([9,8,7,6],index=['a','b','c','d'])

a
Out[111]: 
a    9
b    8
c    7
d    6
dtype: int64

a.describe()
Out[112]: 
count    4.000000
mean     7.500000
std      1.290994
min      6.000000
25%      6.750000
50%      7.500000
75%      8.250000
max      9.000000
dtype: float64

type(a.describe())
Out[113]: pandas.core.series.Series

a.describe()['count']
Out[114]: 4.0

a.describe()['max']
Out[115]: 9.0

b
Out[116]: 
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19

b.describe()
Out[117]: 
               0          1          2          3          4
count   4.000000   4.000000   4.000000   4.000000   4.000000
mean    7.500000   8.500000   9.500000  10.500000  11.500000
std     6.454972   6.454972   6.454972   6.454972   6.454972
min     0.000000   1.000000   2.000000   3.000000   4.000000
25%     3.750000   4.750000   5.750000   6.750000   7.750000
50%     7.500000   8.500000   9.500000  10.500000  11.500000
75%    11.250000  12.250000  13.250000  14.250000  15.250000
max    15.000000  16.000000  17.000000  18.000000  19.000000

type(b.describe())
Out[118]: pandas.core.frame.DataFrame

b.describe().ix['max']
Out[119]: 
0    15.0
1    16.0
2    17.0
3    18.0
4    19.0
Name: max, dtype: float64

b.describe()[2]
Out[120]: 
count     4.000000
mean      9.500000
std       6.454972
min       2.000000
25%       5.750000
50%       9.500000
75%      13.250000
max      17.000000
Name: 2, dtype: float64

数据的累计统计分析

适用于Series和DataFrame类型，累计计算

方法

说明

.cumsum()

依次给出前1、2、…、n个数的和

.cumprod()

依次给出前1、2、…、n个数的积

.cummax()

依次给出前1、2、…、n个数的最大值

.cummin()

依次给出前1、2、…、n个数的最小值

 b
Out[122]: 
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19

b.cumsum()
Out[123]: 
    0   1   2   3   4
c   0   1   2   3   4
a   5   7   9  11  13
d  15  18  21  24  27
b  30  34  38  42  46

b.cumprod()
Out[124]: 
   0     1     2     3     4
c  0     1     2     3     4
a  0     6    14    24    36
d  0    66   168   312   504
b  0  1056  2856  5616  9576

b.cummin()
Out[125]: 
   0  1  2  3  4
c  0  1  2  3  4
a  0  1  2  3  4
d  0  1  2  3  4
b  0  1  2  3  4

b.cummax()
Out[126]: 
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19

 b
Out[127]: 
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19
# 计算当前行与前面行总共两行的和
b.rolling(2).sum()
Out[128]: 
      0     1     2     3     4
c   NaN   NaN   NaN   NaN   NaN
a   5.0   7.0   9.0  11.0  13.0
d  15.0  17.0  19.0  21.0  23.0
b  25.0  27.0  29.0  31.0  33.0
# 计算当前行与前面行总共3行的和
b.rolling(3).sum()
Out[129]: 
      0     1     2     3     4
c   NaN   NaN   NaN   NaN   NaN
a   NaN   NaN   NaN   NaN   NaN
d  15.0  18.0  21.0  24.0  27.0
b  30.0  33.0  36.0  39.0  42.0

数据的相关分析

协方差

两个事物，表示为X和Y，如何判断它们之间的存在相关性？

cov(X,Y)= $\frac{\sum_{i=1}^n(x_i-\overline{x})(y_i-\overline{y})}{n-1}$

协方差>0, X和Y正相关
协方差<0, X和Y负相关
协方差=0, X和Y独立无关
Pearson相关系数
两个事物，表示为X和Y，如何判断它们之间的存在相关性？
r= $\frac{\sum_{i=1}^n(x_i-\overline{x})(y_i-\overline{y}) } {\sum_{i=1}^n(x_i-\overline{x})^2 \sum_{i=1}^n(y_i-\overline{y})^2} $
0.8‐1.0 极强相关
0.6‐0.8 强相关
0.4‐0.6 中等程度相关
0.2‐0.4 弱相关
0.0‐0.2 极弱相关或无相关
相关分析函数
适用于Series和DataFrame类型

方法	说明
.cov()	计算协方差矩阵
.corr()	计算相关系数矩阵, Pearson、Spearman、Kendall等系数

hprice=pd.Series([3.04,22.93,12.75,22.6,12.33],index=['2008','2009','2010','2011','2012'])
m2=pd.Series([8.18,18.38,9.13,7.82,6.69],index=['2008','2009','2010','2011','2012'])
hprice.corr(m2)
out:
0.5239439145220387

1	2
排序	.sort_index() .sort_values()
基本统计函数	.describe()
累计统计函数	.cum() .rolling().()
相关性分析	.corr() .cov()

Python数据分析与展示_4_Pandas库入门

发表于 2019-04-03 更新于 2019-09-08 分类于 python

Pandas库入门

操作索引即操作数据，操作数据时度进行过多的考虑
Pandas库的引用

1
2
3

Pandas是Python第三方库，提供高性能易用数据类型和分析工具
import pandas as pd
Pandas基于NumPy实现，常与NumPy和Matplotlib一同使用

1 2	cumsum() 计算前n项的累加和

Pandas库的理解

Pandas主要提供两个数据类型：Series(表示一维数据),DataFrame(表示二维/多维数据)
基于上述数据类型的各类操作
基本操作、运算操作、特征类操作、关联类操作

Numpy与pandas的差异：
NumPy 
基础数据类型
关注数据的结构表达
维度：数据间关系
Pandas：
扩展数据类型
关注数据的应用表达
数据与索引间关系

Pandas库的Series类型

Series类型由一组数据及与之相关的数据索引组成
import pandas as pd
a=pd.Series([9,8,7,6])
a
输出：
0 9
1 8
2 7
3 6
dtype:int64
第一列是数据的索引
最后一行是NumPython中的数据类型

索引可以自行指定
import pandas as pd
b=pd.Series([9,8,7,6],index=['a','b','c','d'])
b
输出：
a 9
b 8
c 7
d 6
dtype:int64

Series类型可以由如下类型创建：

• Python列表
• 标量值
• Python字典
• ndarray
• 其他函数

从标量值创建

import pandas as pd
s=pd.Series(25,index=['a','b','c'])
#index 不能省略
s
输出：
a 25
b 25
c 25
dtype:int64

从字典类型创建

import pandas as pd
d=pd.Series({'a':9,'b':8,'c':7})
d
输出：
a 9
b 8
c 7
dtype:int64

# 由index指定结构并从字典中取值
import pandas as pd
e=pd.Series({'a':9,'b':8,'c':7},index=['c','a','b','d'])
e
输出：
c    7.0
a    9.0
b    8.0
d    NaN #表示空
dtype: float64

从ndarray类型创建

import pandas as pd
import numpy as np
n=pd.Series(np.arange(5))
n
out:
0    0
1    1
2    2
3    3
4    4
dtype: int32

m=pd.Series(np.arange(5),index=np.arange(9,4,-1))
m
Out: 
9    0
8    1
7    2
6    3
5    4
dtype: int32

Series类型的基本操作：

Series类型包括index和values两部分
Series类型的操作类似ndarray类型
Series类型的操作类似Python字典类型
import pandas as pd
b=pd.Series([9,8,7,6],['a','b','c','d'])

b.index
Out: Index(['a', 'b', 'c', 'd'], dtype='object')

b.values
Out: array([9, 8, 7, 6], dtype=int64)

b['b']
out:8

#自动索引默认生成
b[1]
out:8

# 两套索引并存，但不能混用
b[['c','d',0]]
out:
c    7.0
d    6.0
0    NaN
dtype: float64

Series类型的操作类似ndarray类型：

• 索引方法相同，采用[]
• NumPy中运算和操作可用于Series类型
• 可以通过自定义索引的列表进行切片
• 可以通过自动索引进行切片，如果存在自定义索引，则一同被切片

import pandas as pd
b=pd.Series([9,8,7,6],['a','b','c','d'])
b[:3]
out:
a 9
b 8
c 7
dtype:int 64
#对Series类型切片等操作返回的仍是Series类型
#如果是对Series中的一个值操作则不会返回具有索引和值的Series类型

b[b>b.median()]
out:
a 9
b 8
dtype:int64

np.exp(b)
out:
a    8103.083928
b    2980.957987
c    1096.633158
d     403.428793
dtype: float64

Series类型的操作类似Python字典类型：

• 通过自定义索引访问
• 保留字in操作
• 使用.get()方法

import pandas as pd
b=pd.Series([9,8,7,6],['a','b','c','d'])
# 可以通过in判断索引是否在Seri中
'c' in b
out:True
# 不会判断保留字索引
0 in b
out:False
# 从Series中寻找索引f,找到对应的值，返回值，没找到返回给的值
b.get('f',100)
100

Seri类型对齐操作

import pandas as pd
a=pd.Series([1,2,3],['c','d','e','d'])
b=pd.Series([9,8,7,6],['a','b','c','d'])
a+b
out:
a    NaN
b    NaN
c    8.0
d    8.0
e    NaN
dtype: float64
# Series类型在运算中会自动对齐不同索引的数据

Series类型的name属性

Series对象和索引都可以有一个名字，存储在属性.name中
import pandas as pd
b=pd.Series([9,8,7,6],['a','b','c','d'])
b.name
b.name="Series对象"
b.index.name="索引列"
b
out:
索引列
a    9
b    8
c    7
d    6
Name: Series对象, dtype: int64

Series类型的修改

Series对象可以随时修改并即刻生效
import pandas as pd
b=pd.Series([9,8,7,6],['a','b','c','d'])
b['a']=15
b
out:
a    15
b     8
c     7
d     6
dtype: int64

Series类型

1
2
3

Series是一维带“标签”数组
data_a index_0
Series基本操作类似ndarray和字典，根据索引对齐

DataFrame类型

DataFrame类型由共用相同索引的一组列组成

1
2
3

DataFrame是一个表格型的数据类型，每列值类型可以不同
DataFrame既有行索引、也有列索引
DataFrame常用于表达二维数据，但可以表达多维数据

DataFrame类型可以由如下类型创建：
• 二维ndarray对象
• 由一维ndarray、列表、字典、元组或Series构成的字典
• Series类型
• 其他的DataFrame类型

从二维ndarray对象创建

import pandas as pd
import numpy as np
d=pd.DataFrame(np.arange(10).reshape(2,5))
d
Out: 
   0  1  2  3  4
0  0  1  2  3  4
1  5  6  7  8  9

从一维ndarray对象字典创建

import pandas as pd
dt={'one':pd.Series([1,2,3],index=['a','b','c']),'two':pd.Series([9,8,7,6],index=['a','b','c','d'])}
d=pd.DataFrame(dt)
d
Out: 
   one  two
a  1.0    9
b  2.0    8
c  3.0    7
d  NaN    6

pd.DataFrame(dt,index=['b','c','d'],columns=['two','three'])
Out: 
   two three
b    8   NaN
c    7   NaN
d    6   NaN

从列表类型的字典创建

import pandas as pd
dl={'one':[1,2,3,4],'two':[9,8,7,6]}
d=pd.DataFrame(dl,index=['a','b','c','d'])
d
Out[35]: 
   one  two
a    1    9
b    2    8
c    3    7
d    4    6

import pandas as pd
dl={'城市':['北京','上海','广州','深圳','沈阳'],
    '环比':[101.5,101.2,101.3,102.0,100.1],
    '同比':[120.7,127.3,119.4,140.9,101.4],
    '定基':[121.4,127.8,120.2,145.5,101.6]
}
d=p.DataFrame(dl,index=['c1','c2','c2','c4','c5'])
d
out:
    同比  城市     定基     环比
c1  120.7  北京  121.4  101.5
c2  127.3  上海  127.8  101.2
c2  119.4  广州  120.2  101.3
c4  140.9  深圳  145.5  102.0
c5  101.4  沈阳  101.6  100.1

d.index
Out: Index(['c1', 'c2', 'c3', 'c4', 'c5'], dtype='object')

d.columns
Out: Index(['同比', '城市', '定基', '环比'], dtype='object')

d.values
Out: 
array([[120.7, '北京', 121.4, 101.5],
       [127.3, '上海', 127.8, 101.2],
       [119.4, '广州', 120.2, 101.3],
       [140.9, '深圳', 145.5, 102.0],
       [101.4, '沈阳', 101.6, 100.1]], dtype=object)

d['同比']
Out[52]: 
c1    120.7
c2    127.3
c3    119.4
c4    140.9
c5    101.4
Name: 同比, dtype: float64

d.ix['c2']
Out: 
同比    127.3
城市       上海
定基    127.8
环比    101.2
Name: c2, dtype: object  

d['同比']['c2']
Out[51]: 127.3     
       
DataFrame是二维带"标签"数组
DataFrame基本操作类似Series，依据行列索引

数据类型操作

如何改变Series和DataFrame对象？
- 增加或重排：重新索引
- 删除：drop

重新索引

.reindex()能够改变或重排Series和DataFrame索引

d=d.reindex(index=['c5','c4','c3','c2','c1'])
d
out:
 同比  城市     定基     环比
c5  101.4  沈阳  101.6  100.1
c4  140.9  深圳  145.5  102.0
c3  119.4  广州  120.2  101.3
c2  127.3  上海  127.8  101.2
c1  120.7  北京  121.4  101.5

d=d.reindex(columns=['城市','同比','环比','定基'])
d
out:
 城市     同比     环比     定基
c5  沈阳  101.4  100.1  101.6
c4  深圳  140.9  102.0  145.5
c3  广州  119.4  101.3  120.2
c2  上海  127.3  101.2  127.8
c1  北京  120.7  101.5  121.4

newc=d.columns.insert(4,'新增')
newd=d.reindex(columns=newc,fill_value=200)
newd
out:
    城市     同比     环比     定基   新增
c5  沈阳  101.4  100.1  101.6  200
c4  深圳  140.9  102.0  145.5  200
c3  广州  119.4  101.3  120.2  200
c2  上海  127.3  101.2  127.8  200
c1  北京  120.7  101.5  121.4  200

索引类型

1 2	Series和DataFrame的索引是Index类型 Index对象是不可修改类型

索引类型的常用方法

方法	说明
.append(idx)	连接另一个Index对象，产生新的Index对象
.diff(idx)	计算差集，产生新的Index对象
.intersection(idx)	计算交集
.union(idx)	计算并集
.delete(loc)	删除loc位置处的元素
.insert(loc,e)	在loc位置增加一个元素e

索引类型的使用

d
out:
    城市     同比     环比     定基  
c5  沈阳  101.4  100.1  101.6  200
c4  深圳  140.9  102.0  145.5  200
c3  广州  119.4  101.3  120.2  200
c2  上海  127.3  101.2  127.8  200
c1  北京  120.7  101.5  121.4  200

nc=d.columns.delete(2)
ni=d.index.insert(5,'c0')
# 按照教程中的写法此句会报错：ValueError: index must be monotonic increasing or decreasing
# 错误原因：pandas版本问题,这种写法不可以同时对行和列进行重新索引
# 解决办法：nd=d.reindex(index=ni,columnc=nc).ffill()
nd=d.reindex(index=ni,columnc=nc,method='ffill')
nd

删除指定索引对象

.drop()能够删除Series和DataFrame指定行或列索引
a=pd.Series([9,8,7,6],index=['a','b','c','d'])
a
out:
a    9
b    8
c    7
d    6
dtype: int64

a.drop(['b','c'])
Out: 
a    9
d    6
dtype: int64

d
out:
    城市     同比     环比     定基
c5  沈阳  101.4  100.1  101.6
c4  深圳  140.9  102.0  145.5
c3  广州  119.4  101.3  120.2
c2  上海  127.3  101.2  127.8
c1  北京  120.7  101.5  121.4

d.drop('c5')
out:
    城市     同比     环比     定基
c4  深圳  140.9  102.0  145.5
c3  广州  119.4  101.3  120.2
c2  上海  127.3  101.2  127.8
c1  北京  120.7  101.5  121.4

d.drop('同比',axis=1)
# 意思是删除第2列的索引叫同比的列，必须用axis指明列位置，axis默认=0
d.drop('同比',axis=1)
Out: 
    城市     环比     定基
c5  沈阳  100.1  101.6
c4  深圳  102.0  145.5
c3  广州  101.3  120.2
c2  上海  101.2  127.8
c1  北京  101.5  121.4

算术运算法则

算术运算根据行列索引，补齐后运算，运算默认产生浮点数
补齐时缺项填充NaN (空值)
二维和一维、一维和零维间为广播运算
采用+ ‐ * /符号进行的二元运算产生新的对象

数据类型的算术运算

import pandas as pd
import numpy as np
a=pd.DataFrame(np.arange(12).reshape(3,4))
a
Out: 
   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

b=pd.DataFrame(np.arange(20).reshape(4,5))
b
Out: 
    0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19

a+b
Out: 
      0     1     2     3   4
0   0.0   2.0   4.0   6.0 NaN
1   9.0  11.0  13.0  15.0 NaN
2  18.0  20.0  22.0  24.0 NaN
3   NaN   NaN   NaN   NaN NaN

a*b
Out: 
      0     1      2      3   4
0   0.0   1.0    4.0    9.0 NaN
1  20.0  30.0   42.0   56.0 NaN
2  80.0  99.0  120.0  143.0 NaN
3   NaN   NaN    NaN    NaN NaN

b.add(a,fill_value=100)
Out: 
       0      1      2      3      4
0    0.0    2.0    4.0    6.0  104.0
1    9.0   11.0   13.0   15.0  109.0
2   18.0   20.0   22.0   24.0  114.0
3  115.0  116.0  117.0  118.0  119.0

a.mul(b,fill_value=0)
Out: 
      0     1      2      3    4
0   0.0   1.0    4.0    9.0  0.0
1  20.0  30.0   42.0   56.0  0.0
2  80.0  99.0  120.0  143.0  0.0
3   0.0   0.0    0.0    0.0  0.0

不同维度运算

b
Out: 
    0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19

c=pd.Series(np.arange(4))

c
Out[85]: 
0    0
1    1
2    2
3    3
dtype: int32
# 每一个c中的数组-10
c-10
Out[86]: 
0   -10
1    -9
2    -8
3    -7
dtype: int32

b-c
Out[87]: 
      0     1     2     3   4
0   0.0   0.0   0.0   0.0 NaN
1   5.0   5.0   5.0   5.0 NaN
2  10.0  10.0  10.0  10.0 NaN
3  15.0  15.0  15.0  15.0 NaN
不同维度为广播运算，一维Series默认在轴1参与运算

可以使用运算方法令一维Series参与轴0运算
b.sub(c,axis=0)
Out: 
    0   1   2   3   4
0   0   1   2   3   4
1   4   5   6   7   8
2   8   9  10  11  12
3  12  13  14  15  16

比较运算法则

比较运算只能比较相同索引的元素，不进行补齐
二维和一维、一维和零维间为广播运算

采用> < >= <= == !=等符号进行的二元运算产生布尔对象

a
Out: 
   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

d=pd.DataFrame(np.arange(12,0,-1).reshape(3,4))

d
Out: 
    0   1   2  3
0  12  11  10  9
1   8   7   6  5
2   4   3   2  1

a>d
Out: 
       0      1      2      3
0  False  False  False  False
1  False  False  False   True
2   True   True   True   True

a==d
Out: 
       0      1      2      3
0  False  False  False  False
1  False  False   True  False
2  False  False  False  False

 c=pd.Series(np.arange(4))

c
Out: 
0    0
1    1
2    2
3    3
dtype: int32

#不同维度，广播运算，默认在1轴
a>c
Out: 
       0      1      2      3
0  False  False  False  False
1   True   True   True   True
2   True   True   True   True

c>0
Out[97]: 
0    False
1     True
2     True
3     True
dtype: bool

Python数据分析与展示_3_Numpy数据存取与函数

发表于 2019-04-02 更新于 2019-09-08 分类于 python

数据CSV文件存取

CSV(Comma‐Separated Value, 逗号分隔值)是一种常见的文件格式，用来存储批量数据

城市,环比,同比,定基
北京,101.5,120.7,121.4
上海,101.2,127.3,127.8
广州,101.3,119.4,120.0
深圳,102.0,140.9,145.5
沈阳,100.1,101.4,101.6

np.savetxt(frame, array, fmt='%.18e', delimiter=None)
• frame : 文件、字符串或产生器，可以是.gz或.bz2的压缩文件
• array : 存入文件的数组
• fmt : 写入文件的格式，例如：%d %.2f %.18e
• delimiter : 分割字符串，默认是任何空格

存：

import numpy as np
a=np.arange(100).reshape(5,20)
np.savetxt('C:/Users/HASEE/Desktop/a.csv',a,fmt='%d',delimiter=',')

Traceback (most recent call last):

  File "<ipython-input-7-725d28f63f7c>", line 1, in <module>
    np.savetxt('C:/Users/HASEE/Desktop/a.csv',a,fmt='%d',delimeter=',')

TypeError: savetxt() got an unexpected keyword argument 'delimeter'

问题原因:
在文件夹中复制地址时，文件夹中的地址是用 \ 来分隔不同文件夹的，而Python识别地址时只能识别用 / 分隔的地址。

解决方法:
将从文件夹中复制过来的地址中的 \ 都改成 / 

np.savetxt('C:/Users/HASEE/Desktop/a.csv',a,fmt='%d',delimiter=',')

取

np.loadtxt(frame, dtype=np.float, delimiter=None， unpack=False)
• frame : 文件、字符串或产生器，可以是.gz或.bz2的压缩文件
• dtype : 数据类型，可选
• delimiter : 分割字符串，默认是任何空格
• unpack  : 如果True，读入属性将分别写入不同变量

b=np.loadtxt('C:/Users/HASEE/Desktop/a.csv',delimiter=',',dtype=np.int)

b
Out[13]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
        56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
        76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
        96, 97, 98, 99]])

csv文件的局限性：

CSV只能有效存储一维和二维数组

np.savetxt() np.loadtxt()只能有效存取一维和二维数组

多维数据的存取

a.tofile(frame, sep='', format='%s')
• frame  : 文件、字符串
• sep : 数据分割字符串，如果是空串，写入文件为二进制
• format : 写入数据的格式

a=np.arange(100).reshape(5,10,2)
a.tofile('C:/Users/HASEE/Desktop/b.dat',sep=',',format='%d')
c=np.fromfile('C:/Users/HASEE/Desktop/b.dat',sep=',',dtype=np.int,count=-1)

c
Out[18]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

c.reshape(2,5,10)
Out[20]: 
array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]])
        
该方法需要读取时知道存入文件时数组的维度和元素类型
a.tofile()和np.fromfile()需要配合使用
可以通过元数据文件来存储额外信息

Numpy的随机文件存取

np.save(fname, array) 或 np.savez(fname, array)
• fname : 文件名，以.npy为扩展名，压缩扩展名为.npz
• array  : 数组变量
np.load(fname)
• fname : 文件名，以.npy为扩展名，压缩扩展名为.npz

 a=np.arange(100).reshape(5,10,2)

np.save('a.npy',a)

 b=np.load('a.npy')

b
Out[25]: 
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29],
        [30, 31],
        [32, 33],
        [34, 35],
        [36, 37],
        [38, 39]],

       [[40, 41],
        [42, 43],
        [44, 45],
        [46, 47],
        [48, 49],
        [50, 51],
        [52, 53],
        [54, 55],
        [56, 57],
        [58, 59]],

       [[60, 61],
        [62, 63],
        [64, 65],
        [66, 67],
        [68, 69],
        [70, 71],
        [72, 73],
        [74, 75],
        [76, 77],
        [78, 79]],

       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

Numpy随机数函数子库

函数

说明

rand(d0,d1,..,dn)

根据d0‐dn创建随机数数组，浮点数，[0,1)，均匀分布

randn(d0,d1,..,dn)

根据d0‐dn创建随机数数组，标准正态分布

randint(low[,high,shape])

根据shape创建随机整数或整数数组，范围是[low, high)

seed(s)

随机数种子，s是给定的种子值,种子值相同，生成的随机数也相同

shuffle(a)

根据数组a的第1轴进行随排列，改变数组x

permutation(a)

根据数组a的第1轴产生一个新的乱序数组，不改变数组x

choice(a[,size,replace,p])

从一维数组a中以概率p抽取元素，形成size形状新数组replace表示是否可以重用元素，默认为False

uniform(low,high,size)

产生具有均匀分布的数组,low起始值,high结束值,size形状

normal(loc,scale,size)

产生具有正态分布的数组,loc均值,scale标准差,size形状

poisson(lam,size)

产生具有泊松分布的数组,lam随机事件发生率,size形状

import numpy as np
a=np.random.rand(3,4,5)
a
out:
array([[[0.24639253, 0.722497  , 0.06705677, 0.57236565, 0.28976888],
        [0.72545351, 0.63711307, 0.7305934 , 0.62810739, 0.22117966],
        [0.27692999, 0.29420823, 0.881048  , 0.50637681, 0.99317356],
        [0.61826611, 0.13610396, 0.94085436, 0.83689825, 0.05277357]],

       [[0.56759999, 0.48501222, 0.99744752, 0.36442473, 0.10996119],
        [0.30532853, 0.99185963, 0.01528704, 0.9655763 , 0.07883292],
        [0.69017904, 0.34405313, 0.48902329, 0.90762022, 0.94073407],
        [0.99060258, 0.18003825, 0.15771573, 0.49471469, 0.49768674]],

       [[0.88907564, 0.60919579, 0.89118723, 0.72911511, 0.88404285],
        [0.10481751, 0.98548878, 0.66120233, 0.29016637, 0.57104031],
        [0.22982642, 0.14531348, 0.26788788, 0.28058991, 0.46626988],
        [0.684612  , 0.34908288, 0.55960948, 0.67505087, 0.04902906]]])

sn=np.random.randn(3,4,5)

sn
Out[28]: 
array([[[-0.8927074 , -0.07921713, -1.09702413,  0.20266238,
          2.07800266],
        [-0.30521372,  1.07882345,  0.15834808, -0.4657899 ,
         -0.67738772],
        [ 0.52078183,  0.73034311, -0.21416105, -1.77684991,
          0.98170757],
        [ 0.77941776, -0.5389379 ,  0.37604244,  0.31786087,
         -2.37803701]],

       [[ 0.11112126,  0.49939424, -1.06720594,  1.75672316,
          0.18743589],
        [ 3.23782667,  0.3871532 ,  0.8731636 , -0.8501687 ,
         -0.62653135],
        [ 0.99275262,  1.09478903,  0.15127731,  0.00602239,
          0.72496009],
        [ 0.05037592, -0.07816541,  1.07494759, -1.69539531,
         -1.45367689]],

       [[ 0.6453074 , -0.97600581, -0.21570961,  0.2988862 ,
          0.73129948],
        [-0.18624953,  1.17215876,  0.53122232, -1.24010898,
          1.05254842],
        [-1.38374598, -0.11569819, -0.1682294 ,  1.10782766,
         -0.15701692],
        [-1.55098208,  0.55973668,  1.84080928, -1.64429112,
         -0.07670816]]])

b=np.random.randint(100,200,(3,4))
b
Out[30]: 
array([[179, 170, 104, 104],
       [109, 120, 114, 116],
       [128, 175, 198, 163]])
       
a=np.random.randint(100,200,(3,4))

a
Out[32]: 
array([[135, 180, 159, 164],
       [107, 185, 165, 185],
       [107, 149, 165, 140]])

np.random.shuffle(a)

a
Out[34]: 
array([[107, 185, 165, 185],
       [135, 180, 159, 164],
       [107, 149, 165, 140]])

np.random.permutation(a)
Out[35]: 
array([[135, 180, 159, 164],
       [107, 149, 165, 140],
       [107, 185, 165, 185]])

a
Out[36]: 
array([[107, 185, 165, 185],
       [135, 180, 159, 164],
       [107, 149, 165, 140]])
       
 b=np.random.randint(100,200,(8,))

b
Out[38]: array([155, 117, 150, 145, 193, 119, 152, 166])

np.random.choice(b,(3,2))
Out[39]: 
array([[152, 117],
       [166, 117],
       [145, 145]])

np.random.choice(b,(3,2),replace=False)
Out[40]: 
array([[166, 145],
       [150, 152],
       [119, 117]])
       
# 值越大，抽取概率越大
np.random.choice(b,(3,2),p=b/np.sum(b))
Out[41]: 
array([[193, 193],
       [152, 193],
       [152, 193]])

u=np.random.uniform(0,10,(3,4))

u
Out[43]: 
array([[7.59440693, 9.14123945, 5.69234925, 0.11060994],
       [2.36784092, 0.82206082, 4.70415773, 2.80706347],
       [0.58551754, 0.76346577, 8.6746605 , 0.62805895]])

n=np.random.normal(10,5,(3,4))

n
Out[45]: 
array([[ 2.50436991,  0.45901779, 18.12367847, 14.49488327],
       [ 9.61865032,  6.03216628, -7.30214271, 11.20063806],
       [ 8.36891732,  9.19299271,  5.53896269,  3.87134051]])

Numpy统计函数

a
Out[47]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
       
np.mean(a)
Out[50]: 7.0

np.mean(a,axis=1)
Out[51]: array([ 2.,  7., 12.])

np.mean(a,axis=0)
Out[52]: array([5., 6., 7., 8., 9.])

np.average(a,axis=0,weights=[10,5,1])
Out[55]: array([2.1875, 3.1875, 4.1875, 5.1875, 6.1875])

np.std(a)
Out[56]: 4.320493798938574

np.var(a)
Out[58]: 18.666666666666668

b=np.arange(15,0,-1).reshape(3,5)

b
Out[60]: 
array([[15, 14, 13, 12, 11],
       [10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

np.max(b)
Out[61]: 15

np.argmax(b)
Out[62]: 0

np.unravel_index(np.argmax(b),b.shape)
Out[64]: (0, 0)

np.ptp(b)
Out[65]: 14

np.median(b)
Out[66]: 8.0

np.random的梯度函数

函数

说明

np.gradient(f)

计算数组f中元素的梯度，当f为多维时，返回每个维度梯度

梯度：连续值之间的变化率，即斜率
XY坐标轴连续三个X坐标对应的Y轴值：a, b, c，其中，b的梯度是： (c‐a)/2

a=np.random.randint(0,20,(5))

a
Out[68]: array([ 5, 18, 10,  1, 14])

np.gradient(a)
Out[69]: array([13. ,  2.5, -8.5,  2. , 13. ])
# 最边界的值用最后两个值相减除1，(18-5)/1=13,（14-1）/1=13

# 二维梯度
c=np.random.randint(0,50,(3,5))
c
Out[71]: 
array([[47, 25, 46,  3, 45],
       [47, 17, 15, 21, 22],
       [32,  5, 19, 49, 16]])

np.gradient(c)
Out[72]: 
[array([[  0. ,  -8. , -31. ,  18. , -23. ],
        [ -7.5, -10. , -13.5,  23. , -14.5],
        [-15. , -12. ,   4. ,  28. ,  -6. ]]),
 array([[-22. ,  -0.5, -11. ,  -0.5,  42. ],
        [-30. , -16. ,   2. ,   3.5,   1. ],
        [-27. ,  -6.5,  22. ,  -1.5, -33. ]])]
两个array分别粮食最外层，第二层维度的梯度，分别按列，按行计算

Python数据分析与展示_2_Numpy库入门

发表于 2019-04-01 更新于 2019-09-08 分类于 python

数据维度

一维数据

一维数据由对等关系的有序或无序数据构成，采用线性方式组织
对应列表、数组和集合等概念

列表：数据类型可以不同
数组：数据类型相同
二维数据
二维数据由多个一维数据构成，是一维数据的组合形式

表格是典型的二维数据
其中，表头是二维数据的一部分

多维数据

多维数据由一维或二维数据在新维度上扩展形成

高维数据

高维数据仅利用最基本的二元关系展示数据间的复杂结构

{
"firstName" : "Tian" ,
"lastName" : "Song" ,
"address"  : {
"streetAddr" : "中关村南大街5号" ,
"city"  : "北京市" ,
"zipcode"  : "100081"
} ,
"prof"  : [ "Computer System" , "Security" ]
}
#键值对

数据维度的Python表示

一维数据：列表和集合类型
数据维度是数据的组织形式
二维数据：列表类型
多维数据：列表类型
高维数据：字典类型或数据表示格式
JSON、XML和YAML格式

Numpy

Numpy开源Python科学计算基础库，是SciPy,Pandas等数据处理或科学计算库的基础

一个强大的N位数组对象ndarray
广播功能函数
整合C/C++/Fortran代码的工具
线性代数，傅里叶变换，随机数生成等功能

n维数组对象

np.array()生成一个ndarray数组
Python已有列表类型，为什么需要一个数组对象(类型)？

例：计算 A 2 +B 3 ，其中，A和B是一维数组

#常规写法
def pySum():
    a=[0,1,2,3,4]
    b=[9,8,7,6,5]
    c=[]
    for i in range(len(a)):
        c.append(a[i)**2+b[i]**3)
    return c
print(pySum())

#numpy写法：
import numpy as np
def npSum():
    a=np.array([0,1,2,3,4])
    b=np.array([9,8,7,6,5])
    c=a**2+b**3
    return c
print(npSum())

Python已有列表类型，为什么需要一个数组对象(类型)？

数组对象可以去掉元素间运算所需的循环，使一维向量更像单个数据
设置专门的数组对象，经过优化，可以提升这类应用的运算速度
观察：科学计算中，一个维度所有数据的类型往往相同
数组对象采用相同的数据类型，有助于节省运算和存储空间
ndarry构成：
ndarray是一个多维数组对象，由两部分构成：
- 实际的数据
- 描述这些数据的元数据（数据维度、数据类型等）

ndarray数组一般要求所有元素类型相同（同质），数组下标从0开始

a=np.array([[0,1,2,3,4],[9,8,7,6,5]])
a
Out: 
array([[0, 1, 2, 3, 4],
       [9, 8, 7, 6, 5]])
# np.array()输出成[]形式，元素由空格分割
print(a)
[[0 1 2 3 4]
 [9 8 7 6 5]]

轴(axis): 保存数据的维度；秩(rank)：轴的数量

ndarray对象的属性

属性

说明

.ndim

秩，即轴的数量或维度的数量

.shape

ndarray对象的尺度，对于矩阵，n行m列

.size

ndarray对象元素的个数，相当于.shape中n*m的值

.dtype

ndarray对象的元素类型

.itemsize

ndarray对象中每个元素的大小，以字节为单位

a.ndim
Out[136]: 2

a.shape
Out[137]: (2, 5)

a.size
Out[138]: 10

a.dtype
Out[139]: dtype('int32')

a.itemsize
Out[140]: 4

ndarray数组的元素类型

数据类型	说明
bool	布尔类型，True或False
intc	与C语言中的int类型一致，一般是int32或int64
intp	用于索引的整数，与C语言中ssize_t一致，int32或int64
int8	字节长度的整数，取值：[‐128, 127]
int16	16位长度的整数，取值：[‐32768, 32767]
int32	32位长度的整数，取值：[‐2 31 , 2 31 ‐1]
int64	64位长度的整数，取值：[‐2 63 , 2 63 ‐1]
uint8	8位无符号整数，取值：[0, 255]
uint16	16位无符号整数，取值：[0, 65535]
uint32	32位无符号整数，取值：[0, 2 32 ‐1]
uint64	32位无符号整数，取值：[0, 2 64 ‐1]
float16	16位半精度浮点数：1位符号位，5位指数，10位尾数
float32	32位半精度浮点数：1位符号位，8位指数，23位尾数
float64	64位半精度浮点数：1位符号位，11位指数，52位尾数
complex64	复数类型，实部和虚部都是32位浮点数
complex128	复数类型，实部和虚部都是64位浮点数

ndarray为什么要支持这么多种元素类型？

对比：Python语法仅支持整数、浮点数和复数3种类型

科学计算涉及数据较多，对存储和性能都有较高要求
对元素类型精细定义，有助于NumPy合理使用存储空间并优化性能

对元素类型精细定义，有助于程序员对程序规模有合理评估

非同质的ndarray对象

# ndarray数组可以由非同质对象构成
x=np.array([[0,1,2,3,4],[9,8,7,6]])
x.shape
Out[143]: (2,)
# 非同质ndarray元素为对象类型
x.dtype
Out[144]: dtype('O')

x
Out[145]: array([list([0, 1, 2, 3, 4]), list([9, 8, 7, 6])], dtype=object)

x.itemsize
Out[146]: 8

x.size
Out[147]: 2
# 非同质ndarray对象无法有效发挥NumPy优势，尽量避免使用

ndarray数组的创建

方法

从Python中的列表、元组等类型创建ndarray数组
使用NumPy中函数创建ndarray数组，如：arange, ones, zeros等
从字节流（raw bytes）中创建ndarray数组
从文件中读取特定格式，创建ndarray数组

从Python中的列表、元组等类型创建ndarray数组

x = np.array(list/tuple)
x = np.array(list/tuple, dtype=np.float32)
当np.array()不指定dtype时，NumPy将根据数据情况关联一个dtype类型

# 由列表类型创建
x=np.array([1,2,3,4])
print(x)
[1 2 3 4]
# 元组类型创建
x=np.array((4,5,6,7))
print(x)
[4 5 6 7]
#列表元组混合创建
x=np.array([[1,2],[8,9],(0.1,0.2)])

print(x)
[[1.  2. ]
 [8.  9. ]
 [0.1 0.2]]

使用NumPy中函数创建ndarray数组，如：arange, ones, zeros等

函数

说明

np.arange(n)

类似range()函数，返回ndarray类型，元素从0到n‐1

np.ones(shape)

根据shape生成一个全1数组，shape是元组类型

np.zeros(shape)

根据shape生成一个全0数组，shape是元组类型

np.full(shape,val)

根据shape生成一个数组，每个元素值都是val

np.eye(n)

创建一个正方的n*n单位矩阵，对角线为1，其余为0

np.ones_like(a)

根据数组a的形状生成一个全1数组

np.zeros_like(a)

根据数组a的形状生成一个全0数组

np.full_like(a,val)

根据数组a的形状生成一个数组，每个元素值都是val

np.linspace()

根据起止数据等间距地填充数据，形成数组

np.concatenate()

将两个或多个数组合并成一个新的数组

np.ones((3,6))
Out[9]: 
array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

np.zeros((3,6),dtype=np.int32)
Out[11]: 
array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

np.eye(5)
Out[12]: 
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

x=np.ones((2,3,4))

print(x)
[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
 
x.shape
out:(2,3,4)

#等间距生成4个数，默认生成浮点数
a=np.linspace(1,10,4)
a
Out[18]: array([ 1.,  4.,  7., 10.])
# 末位不为10
b=np.linspace(1,10,4,endpoint=False)
b
Out[20]: array([1.  , 3.25, 5.5 , 7.75])

# 合并两个数组
c=np.concatenate((a,b))
c
Out[22]: array([ 1.  ,  4.  ,  7.  , 10.  ,  1.  ,  3.25,  5.5 ,  7.75])

ndarray数组的变换

对于创建后的ndarray数组，可以对其进行维度变换和元素类型变换

方法	说明
.reshape(shape)	不改变数组元素，返回一个shape形状的数组，原数组不变
.resize(shape)	与.reshape()功能一致，但修改原数组
.swapaxes(ax1,ax2)	将数组n个维度中两个维度进行调换
.flatten()	对数组进行降维，返回折叠后的一维数组，原数组不变

ndarray数组的维度变换

a
Out: 
array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]])

a.resize((3,8))

a
Out: 
array([[1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1]])

a.flatten()
Out: 
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1])

ndarray数组的类型变换

 a
Out[33]: 
array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]])

b=a.astype(np.float)

b
Out[35]: 
array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])
        
# astype()方法一定会创建新的数组（原始数据的一个拷贝），即使两个类型一致

ndarray数组向列表转换

a=np.full((2,3,4),25,dtype=np.int32)

a
Out[37]: 
array([[[25, 25, 25, 25],
        [25, 25, 25, 25],
        [25, 25, 25, 25]],

       [[25, 25, 25, 25],
        [25, 25, 25, 25],
        [25, 25, 25, 25]]])

a.tolist()
Out[38]: 
[[[25, 25, 25, 25], [25, 25, 25, 25], [25, 25, 25, 25]],
 [[25, 25, 25, 25], [25, 25, 25, 25], [25, 25, 25, 25]]]

ndarray数组的操作

数组的索引和切片

# 一维数组的索引和切片：与Python的列表类似
a=np.array([9,8,7,6,5])

a[2]
Out: 7
# 起始编号:步长，3元素冒号分隔
a[1:4:2]
Out: array([8, 6])

# 多维数组的索引
a=np.arange(24).reshape((2,3,4))

a
Out: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

a[1,2,3]
Out: 23

a[0,1,2]
Out: 6

a[-1,-2,-3]
Out: 17

多维数组的切片

a
Out[50]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

a[:,1,-3]
Out[51]: array([ 5, 17])

a[:,1:3,:]
Out[52]: 
array([[[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])

a[:,:,::2]
Out[53]: 
array([[[ 0,  2],
        [ 4,  6],
        [ 8, 10]],

       [[12, 14],
        [16, 18],
        [20, 22]]])

ndarry数组的运算

a
Out[55]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
# 元素平均值
a.mean()
Out[56]: 11.5

a=a/a.mean()
# 数组与标量之间的运算作用于数组的每一个元素
a
Out[58]: 
array([[[0.        , 0.08695652, 0.17391304, 0.26086957],
        [0.34782609, 0.43478261, 0.52173913, 0.60869565],
        [0.69565217, 0.7826087 , 0.86956522, 0.95652174]],

       [[1.04347826, 1.13043478, 1.2173913 , 1.30434783],
        [1.39130435, 1.47826087, 1.56521739, 1.65217391],
        [1.73913043, 1.82608696, 1.91304348, 2.        ]]])

Numpy一元函数

对ndarray中的数据执行元素级运算的函数

函数

说明

np.abs(x) np.fabs(x)

计算数组各元素的绝对值

np.sqrt(x)

计算数组各元素的平方根

np.square(x)

计算数组各元素的平方np.log(x) np.log10(x)

np.log2(x)

计算数组各元素的自然对数、10底对数和2底对数

np.ceil(x) np.floor(x)

计算数组各元素的ceiling值或 floor值

np.rint(x)

计算数组各元素的四舍五入值

np.modf(x)

将数组各元素的小数和整数部分以两个独立数组形式返回

np.exp(x)

计算数组各元素的指数值

np.sign(x)

计算数组各元素的符号值，1(+), 0, ‐1(‐)

np.cos(x) np.cosh(x)np.sin(x) np.sinh(x)np.tan(x) np.tanh(x)

计算数组各元素的普通型和双曲型三角函数

a
Out: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
        
np.square(a)
Out[63]: 
array([[[  0,   1,   4,   9],
        [ 16,  25,  36,  49],
        [ 64,  81, 100, 121]],

       [[144, 169, 196, 225],
        [256, 289, 324, 361],
        [400, 441, 484, 529]]], dtype=int32)

a=np.sqrt(a)
a
Out[65]: 
array([[[0.        , 1.        , 1.41421356, 1.73205081],
        [2.        , 2.23606798, 2.44948974, 2.64575131],
        [2.82842712, 3.        , 3.16227766, 3.31662479]],

       [[3.46410162, 3.60555128, 3.74165739, 3.87298335],
        [4.        , 4.12310563, 4.24264069, 4.35889894],
        [4.47213595, 4.58257569, 4.69041576, 4.79583152]]])

np.modf(a)
Out[66]: 
(array([[[0.        , 0.        , 0.41421356, 0.73205081],
         [0.        , 0.23606798, 0.44948974, 0.64575131],
         [0.82842712, 0.        , 0.16227766, 0.31662479]],
 
        [[0.46410162, 0.60555128, 0.74165739, 0.87298335],
         [0.        , 0.12310563, 0.24264069, 0.35889894],
         [0.47213595, 0.58257569, 0.69041576, 0.79583152]]]),
 array([[[0., 1., 1., 1.],
         [2., 2., 2., 2.],
         [2., 3., 3., 3.]],
 
        [[3., 3., 3., 3.],
         [4., 4., 4., 4.],
         [4., 4., 4., 4.]]]))

Numpy二元函数

header 1

header 2

函数

说明

+ ‐ * / **

两个数组各元素进行对应运算

np.maximum(x,y) np.fmax()np.minimum(x,y) np.fmin()

元素级的最大值/最小值计算

np.mod(x,y)

元素级的模运算

np.copysign(x,y)

将数组y中各元素值的符号赋值给数组x对应元素

> < >= <= == !=

算术比较，产生布尔型数组

a=np.arange(24).reshape((2,3,4))

b=np.sqrt(a)

a
Out[69]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

b
Out[70]: 
array([[[0.        , 1.        , 1.41421356, 1.73205081],
        [2.        , 2.23606798, 2.44948974, 2.64575131],
        [2.82842712, 3.        , 3.16227766, 3.31662479]],

       [[3.46410162, 3.60555128, 3.74165739, 3.87298335],
        [4.        , 4.12310563, 4.24264069, 4.35889894],
        [4.47213595, 4.58257569, 4.69041576, 4.79583152]]])

np.maximum(a,b)
Out[71]: 
array([[[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]],

       [[12., 13., 14., 15.],
        [16., 17., 18., 19.],
        [20., 21., 22., 23.]]])

a>b
Out[72]: 
array([[[False, False,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]],

       [[ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]]])

小结

ndarray类型属性、创建和变换
.ndim
.shape
.size
.dtype
.itemsize

.reshape(shape)
.resize(shape)
.swapaxes(ax1,ax2)
.flatten()

数组的索引
和切片
数组的运算
一元函数
二元函数
np.arange(n)
np.ones(shape)
np.zeros(shape)
np.full(shape,val)
np.eye(n)
np.ones_like(a)
np.zeros_like(a)
np.full_like(a,val)