1.1
本练习涵盖了计算和呈现基本统计数据的方法和方程。这种做法包括变量和属性数据的简单描述性统计,统计推断的基本方法,以及变量数据的表格和图形方法。还包括一些解释和使用指南。
1.2
这种做法的单位制没有规定。实践中的尺寸量仅作为计算方法的说明。这些实施例对处理的产品或测试方法没有约束力。
1.3
本标准并不旨在解决与其使用相关的所有安全性问题(如果有)。本标准的使用者有责任在使用前建立适当的安全、健康和环境实践并确定法规限制的适用性。1.4
本国际标准是根据世界贸易组织技术性贸易壁垒(TBT)委员会发布的《关于制定国际标准、指南和建议的原则的决定》中确立的国际公认的标准化原则制定的。
======意义和用途======
4.1
本实践提供了表征样本的方法
n
以数据集形式到达的观察结果。来自组织、企业和政府机构的大型数据集以记录和其他经验观察的形式存在。大学、政府机构和私营部门的研究机构和实验室也产生了大量的经验数据。
4.1.1
包含单个变量的数据集通常由一列数字组成。每一行都是变量的单独观察或测量实例。数字本身是将测量过程应用于正在研究或观察的变量的结果。我们可以将变量的每个观察值称为数据集中的一个项目。在许多情况下,可能有几个变量被定义用于研究。
4.1.2
样本选自称为总体的更大集合。群体可以是一组有限的项目,一组非常大或基本上无限的项目,或者一个过程。在一个过程中,项目随着时间的推移而产生,种群是动态的,不断出现并可能随着时间的推移而变化。样本数据作为样本来源群体的代表。在任何特定的研究中,主要感兴趣的是人群。4.2
数据(测量和观察)可以是变量类型或简单属性类型。在属性的情况下,数据可以是二进制试验或在某个间隔(时间、空间、体积、重量或面积)上定义的事件的计数。二元试验由一系列0和1组成,其中“1”表示被检查的项目表现出所研究的属性,“0”表示项目不表现出该属性。每个检查项目被分配“0”或“1”。此类数据通常由二项分布控制。对于某个间隔内的事件计数,记录在检查间隔上观察到事件的次数
n
检查间隔。泊松分布通常控制间隔内事件的计数。
4.3
为了使用样本数据得出关于总体的结论,抽样和数据收集过程必须被认为是可重复的,至少是潜在的可重复的。描述性统计是使用真实样本数据计算的,这些数据在重复采样过程中会有所不同。因此,统计数据是一个随机变量,其本身会发生变化。样本统计量通常在总体中有一个未知的相应参数(参见
5
).使用统计量的目的是总结数据集并估计相应的总体特征或参数,或者检验假设。
4.4
描述性统计考虑用数字、表格和图形方法来总结一组数据。本实践中考虑的方法用于总结单个变量的观察结果。本实践中描述的描述性统计有:平均值、中位数、最小值、最大值、范围、中间范围、顺序统计、四分位数、经验百分位数、分位数、四分位数间距、方差、标准差,
Z
-得分、变异系数以及偏度和峰度。
4.5
统计推断是得出关于总体或其参数的结论。本实践中描述的统计推断方法有:自由度、标准误差、置信区间、预测区间、容许区间和统计假设检验。
4.6
本实践中描述的表格方法有:频率分布、相对频率分布、累积频率分布和累积相对频率分布。
4.7
本实践中描述的图形方法有:直方图、卵形图、箱线图、点阵图、正态概率图和q-q图。
4.8
虽然本实践中描述的方法可用于总结任何一组观察结果,但从解释的角度来看,通过使用它们获得的结果可能没有什么价值,除非数据质量是可接受的并满足某些要求。为了用于归纳概括,出于演示目的被视为单个组的任何观察样本必须代表一系列测量,所有测量都是在基本相同的测试条件下对材料或产品进行的,所有测量都是在基本相同的条件下生产的。当满足这些标准时,我们将混合两个或更多明显不同的数据集的危险降至最低。
4.8.1
如果给定的数据集合由在不同测试条件下收集的两个或多个样品组成,或者代表在不同条件下(即不同群体)产生的材料,则应被视为两个或多个独立的观察子组,每个子组在数据分析程序中独立处理。这些代表显著不同条件的子组的合并可能会导致几乎没有实际价值的演示。简而言之,应用这些方法的任何观察样本都应该是同质的,或者在过程的情况下,来自处于统计控制状态的过程。
4.9
分节开发的方法
6
,
7
,
8
,和
9
应用于样本数据。除非另有说明,否则当例如指示术语“平均值”时,其含义是样本平均值而不是总体平均值,不会有误解。应当理解,存在一个数据集,该数据集包含
n
评论。数据集可以表示为:
4.9.1
除非下标包含在括号中,否则下标符号没有隐含的数量级(参见
6.7
).
1.1
This practice covers methods and equations for computing and presenting basic statistics. This practice includes simple descriptive statistics for variable and attribute data, elementary methods of statistical inference, and tabular and graphical methods for variable data. Some interpretation and guidance for use is also included.
1.2
The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrations of calculation methods. The examples are not binding on products or test methods treated.
1.3
This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.
1.4
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
====== Significance And Use ======
4.1
This practice provides approaches for characterizing a sample of
n
observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data.
4.1.1
A data set containing a single variable usually consists of a column of numbers. Each row is a separate observation or instance of measurement of the variable. The numbers themselves are the result of applying the measurement process to the variable being studied or observed. We may refer to each observation of a variable as an item in the data set. In many situations, there may be several variables defined for study.
4.1.2
The sample is selected from a larger set called the population. The population can be a finite set of items, a very large or essentially unlimited set of items, or a process. In a process, the items originate over time and the population is dynamic, continuing to emerge and possibly change over time. Sample data serve as representatives of the population from which the sample originates. It is the population that is of primary interest in any particular study.
4.2
The data (measurements and observations) may be of the variable type or the simple attribute type. In the case of attributes, the data may be either binary trials or a count of a defined event over some interval (time, space, volume, weight, or area). Binary trials consist of a sequence of 0s and 1s in which a “1” indicates that the inspected item exhibited the attribute being studied and a “0” indicates the item did not exhibit the attribute. Each inspection item is assigned either a “0” or a “1.” Such data are often governed by the binomial distribution. For a count of events over some interval, the number of times the event is observed on the inspection interval is recorded for each of
n
inspection intervals. The Poisson distribution often governs counting events over an interval.
4.3
For sample data to be used to draw conclusions about the population, the process of sampling and data collection must be considered, at least potentially, repeatable. Descriptive statistics are calculated using real sample data that will vary in repeating the sampling process. As such, a statistic is a random variable subject to variation in its own right. The sample statistic usually has a corresponding parameter in the population that is unknown (see Section
5
). The point of using a statistic is to summarize the data set and estimate a corresponding population characteristic or parameter, or to test a hypothesis.
4.4
Descriptive statistics consider numerical, tabular, and graphical methods for summarizing a set of data. The methods considered in this practice are used for summarizing the observations from a single variable. The descriptive statistics described in this practice are: mean, median, min, max, range, mid range, order statistic, quartile, empirical percentile, quantile, interquartile range, variance, standard deviation,
Z
-score, coefficient of variation, and skewness and kurtosis.
4.5
Statistical inference is drawing conclusions about the population or its parameters. Methods for statistical inference described in this practice are: degrees of freedom, standard error, confidence intervals, prediction intervals, tolerance intervals, and statistical hypothesis tests.
4.6
Tabular methods described in this practice are: frequency distribution, relative frequency distribution, cumulative frequency distribution, and cumulative relative frequency distribution.
4.7
Graphical methods described in this practice are: histogram, ogive, boxplot, dotplot, normal probability plot, and q-q plot.
4.8
While the methods described in this practice may be used to summarize any set of observations, the results obtained by using them may be of little value from the standpoint of interpretation unless the data quality is acceptable and satisfies certain requirements. To be useful for inductive generalization, any sample of observations that is treated as a single group for presentation purposes must represent a series of measurements, all made under essentially the same test conditions, on a material or product, all of which have been produced under essentially the same conditions. When these criteria are met, we are minimizing the danger of mixing two or more distinctly different sets of data.
4.8.1
If a given collection of data consists of two or more samples collected under different test conditions or representing material produced under different conditions (that is, different populations), it should be considered as two or more separate subgroups of observations, each to be treated independently in a data analysis program. Merging of such subgroups, representing significantly different conditions, may lead to a presentation that will be of little practical value. Briefly, any sample of observations to which these methods are applied should be homogeneous or, in the case of a process, have originated from a process in a state of statistical control.
4.9
The methods developed in Sections
6
,
7
,
8
, and
9
apply to the sample data. There will be no misunderstanding when, for example, the term “mean” is indicated, that the meaning is sample mean, not population mean, unless indicated otherwise. It is understood that there is a data set containing
n
observations. The data set may be denoted as:
4.9.1
There is no order of magnitude implied by the subscript notation unless subscripts are contained in parenthesis (see
6.7
).