ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

java-使用DataSetIterator时TransformProcess转换数据

2019-11-11 09:12:16  阅读:291  来源: 互联网

标签:csv deeplearning4j java


我有一个既包含数值属性又包含名义属性的CSV数据集.我为数据集定义了架构,该架构列出了名义属性的所有可能值.之后,我创建了TransformProcess,以使用CategoricalToOneHotTransform将标称值转换为数值.如何在RecordReaderDataSetIterator上使用此TransformProcess为我的神经网络做准备?

        Schema schema = new Schema.Builder()
        .addColumnInteger("age")
        .addColumnCategorical("workclass", "Private", "Self-emp-not-inc", "Self-emp-inc", "Federal-gov", "Local-gov", "State-gov", "Without-pay", "Never-worked")
        .addColumnInteger("fnlwgt")
        .addColumnCategorical("education", "Bachelors", "Some-college", "11th", "HS-grad", "Prof-school", "Assoc-acdm", "Assoc-voc", "9th", "7th-8th", "12th", "Masters", "1st-4th", "10th", "Doctorate", "5th-6th", "Preschool")
        .addColumnInteger("education-num")
        .addColumnCategorical("marital-status", "Married-civ-spouse", "Divorced", "Never-married", "Separated", "Widowed", "Married-spouse-absent", "Married-AF-spouse")
        .addColumnCategorical("occupation", "Tech-support", "Craft-repair", "Other-service", "Sales", "Exec-managerial", "Prof-specialty", "Handlers-cleaners", "Machine-op-inspct", "Adm-clerical", "Farming-fishing", "Transport-moving", "Priv-house-serv", "Protective-serv", "Armed-Forces")
        .addColumnCategorical("relationship", "Wife", "Own-child", "Husband", "Not-in-family", "Other-relative", "Unmarried")
        .addColumnCategorical("race", "White", "Asian-Pac-Islander", "Amer-Indian-Eskimo", "Other", "Black")
        .addColumnCategorical("sex", "Female", "Male")
        .addColumnInteger("capital-gain")
        .addColumnInteger("capital-loss")
        .addColumnInteger("hours-per-week")
        .addColumnCategorical("native-country", "United-States", "Cambodia", "England", "Puerto-Rico", "Canada", "Germany", "Outlying-US(Guam-USVI-etc)", "India", "Japan", "Greece", "South", "China", "Cuba", "Iran", "Honduras", "Philippines", "Italy", "Poland", "Jamaica", "Vietnam", "Mexico", "Portugal", "Ireland", "France", "Dominican-Republic", "Laos", "Ecuador", "Taiwan", "Haiti", "Columbia", "Hungary", "Guatemala", "Nicaragua", "Scotland", "Thailand", "Yugoslavia", "El-Salvador", "Trinadad&Tobago", "Peru", "Hong", "Holand-Netherlands")
        .addColumnCategorical("class", ">50K", "<=50K")
        .build();

    TransformProcess tp = new TransformProcess.Builder(schema)
        .transform(new CategoricalToOneHotTransform("workclass"))
        .transform(new CategoricalToOneHotTransform("education"))
        .transform(new CategoricalToOneHotTransform("marital-status"))
        .transform(new CategoricalToOneHotTransform("occupation"))
        .transform(new CategoricalToOneHotTransform("relationship"))
        .transform(new CategoricalToOneHotTransform("race"))
        .transform(new CategoricalToOneHotTransform("sex"))
        .transform(new CategoricalToOneHotTransform("native-country"))
        .transform(new CategoricalToIntegerTransform("class"))
        .build();

    Schema outputSchema = tp.getFinalSchema();

    int numLinesToSkip = 0;
    String delimiter = ",";
    CSVRecordReader recordReader = new CSVRecordReader(numLinesToSkip, delimiter);
    recordReader.initialize(new FileSplit(Paths.get("..\\adult.data").toFile()));


    int labelIndex = outputSchema.getColumnNames().size() - 1;
    int numClasses = 2;
    int batchSize = 2000;

    RecordReaderDataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, batchSize, labelIndex, numClasses);

    DataSet allData = iterator.next();
    allData.shuffle();
    SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(0.65);

解决方法:

看一下:https://github.com/deeplearning4j/DataVec/blob/master/datavec-api/src/main/java/org/datavec/api/records/reader/impl/transform/TransformProcessRecordReader.java

RecordReaderDataSetItertor接收记录读取器并处理向量化过程.这将包装记录读取器,并输出转换后的记录,然后将其馈送到recordreaderdatasetiterator.

标签:csv,deeplearning4j,java
来源: https://codeday.me/bug/20191111/2018933.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有