ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

准备用于SOFM算法的数据集合

2021-10-31 13:04:25  阅读:178  来源: 互联网

标签:SOFM IMAGE printf 算法 字体 picrange 集合 boxrange 点阵


简 介: 为了了准备人工神经网络中对于SOFM数据集合,在网络上搜集了六种5×7的点阵字符图片。通过程序将其转换成 0 - 1 编码,用于课程的作业中。

关键词 ASCII字符点阵

 

§01 SOFM数据集合


  是为了2021年人工神经网络课程第二次作业(针对于竞争网络)中的作业题。在去年的作业体重使用了课件上的三种字符作为SOM的数据集合。今年计划修改成另外一组数据集合。

▲ 图1.1 2020年作业中所使用的数据集合

▲ 图1.1 2020年作业中所使用的数据集合

  Self-Organizing Maps and Applications

一、数据集合

1、原计划数据集

  使用网络上的 7×9点阵,选取其中 G、H、I、N、O、Q、U、Z,也就是ZHUOQING中对应的八个字符,两种不同的字体,再有这两种不同的字体增加 汉明距离 为2,生成另外两组字符进行聚类。

  选择GNINOQUZ作为训练样本,其中 H-N, O-Q较为难以区分。它们之间的汉明距离很接近。

  但是经过网络搜索,发现网络上5×7点阵的字符集合比较多。

2、5×7点阵字符集合

  下面搜集了6中ASCII点阵字符。

▲ 图1.1.1 5×7点阵字体

▲ 图1.1.1 5×7点阵字体

▲ 图1.1.2 5×7点阵字体

▲ 图1.1.2 5×7点阵字体

▲ 图1.1.3 5×7点阵字体

▲ 图1.1.3 5×7点阵字体

▲ 图1.1.4 5×7点阵字体

▲ 图1.1.4 5×7点阵字体

▲ 图1.1.5 5×7点阵字体

▲ 图1.1.5 5×7点阵字体

▲ 图1.1.6 5×7点阵字体

▲ 图1.1.6 5×7点阵字体

二、图片数据转换

  上述所获得的点阵模板都是图片,需要将它们转换成按照行扫描的 0-1字符串。每个字符包括长度为35个0-1字符串进行。

1、图片增强与反转

  首先将图片通过编辑器转换成前景是深色,背景是浅色的图片。如果原始图片相反,则通过图片颜色反向来获得。

▲ 图1.2.1 将图片转换成前景是深色,背景是前侧图片

▲ 图1.2.1 将图片转换成前景是深色,背景是前侧图片

2、定出字符边界

  在TEASOFT软件中,按照字符确定出每个字符点阵图片的边界。

▲ 图1.2.2 按照顺序确定出字符边界

▲ 图1.2.2 按照顺序确定出字符边界

3、转换成旭

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# ASCIIDOT.PY                  -- by Dr. ZhuoQing 2021-10-31
#
# Note:
#============================================================
from headm import *
from PIL import Image
boxid = [2, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
         23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
picfile = tspgetdopfile(boxid[0])
picrange = tspgetrange(boxid[0])
#printf(picrange)
printf(picfile)
#------------------------------------------------------------
IMAGE_ROW = 7
IMAGE_COL = 5
PIXEL_THRESHOLD = 230
def image2Density(size, imagePixels):
    global boxid
    imageSize = size
    imageWidth = imageSize[0]
    imageHeight = imageSize[1]
    picwidth = picrange[2] - picrange[0]
    picheight = picrange[3] - picrange[1]
    widthRatio = imageWidth / picwidth
    heightRatio = imageHeight / picheight
    asciidim = []
    for box in boxid[1:]:
        boxrange = tspgetrange(box)
        boxpos = [boxrange[0] - picrange[0],
                  boxrange[1] - picrange[1],
                  boxrange[2] - picrange[0],
                  boxrange[3] - picrange[1]]
        boxheight = boxrange[3] - boxrange[1]
        boxwidth = boxrange[2] - boxrange[0]
        asciistr = ''
        for i in range(IMAGE_ROW):
            startRow = int((i * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
            endRow = int(((i+1) * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
            col = []
            for i in range(IMAGE_COL):
                startCol = int((i * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
                endCol = int(((i+1) * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
                pixelNum = (endRow - startRow - 1) * (endCol - startCol - 1)
                pixelSigma = 0
                for ii in range(startRow, endRow):
                    for jj in range(startCol, endCol):
                        pixelSigma += sum(imagePixels[ii, jj])
                pixelSigma = int(pixelSigma / (pixelNum))
                #printf(pixelSigma)
                if pixelSigma > PIXEL_THRESHOLD:
                    col.append('0')
                else: col.append('1')
            str01 = ''.join(col)
            printf(str01.replace('0','.').replace('1','#'))
            asciistr = asciistr + str01
        printf('  ')
        asciidim.append(asciistr)
    return asciidim
#------------------------------------------------------------
img = Image.open(picfile)
r,g,b = img.split()
img = Image.merge("RGB", (r,g,b)).getdata()
#plt.imshow(img)
#plt.show()
size = img.size
print(size)
img = array(img).sum(axis=1)/3
imgdata = img.reshape(size[1], size[0])
#imgaverage = imgdata.sum(axis=0)
#printf(shape(imgaverage))
#plt.plot(imgaverage)
#plt.xlabel("x")
#plt.ylabel("y")
#plt.grid(True)
#plt.tight_layout()
#plt.show()
printf(shape(imgdata))
result = image2Density(size, imgdata)
for s in result:
    printf(s)
#------------------------------------------------------------
#        END OF FILE : ASCIIDOT.PY
#============================================================

4、转换结果

(1)字体1

▲ $#Y 1:字体1

▲ $#Y 1:字体1

01110100011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000101111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00001000010000100001100011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110001100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000110001111101000010000
01110100011000110001101011001101111
11110100011000110001111101000110001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011101110001
10001100010101000100010101000110001
10001100011000101010001000010000100
11111000010001000100010001000011111

(2)字体2

▲ 图1.2.4 字体2

▲ 图1.2.4 字体2

00111010011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
01111000010000100001000011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001100010111000001
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100010101000100010101000110001
10001100011000101111000011000101110
11111000010001000100010001000011111

(3)字体3

▲ 图1.2.5 字体3

▲ 图1.2.5 字体3

01110010101101110001111111000110001
11110010010100101110010010100111110
01110100011000010000100001000101110
11110010100101101001010110101011110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010111100011000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000110001000010110101101001110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111111110101100011000110001
10001110011110110101101111001110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001001101
11110100011000111110100101001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000111011010100111000100
10001100011010110101111111101111011
10001110110101000100010101101110001
10001110110101001110001000010000100
11110001100010000100010000100011110

(4)字体4

▲ 图1.2.6 字体4

▲ 图1.2.6 字体4

00111010011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000010000100001000010100100110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001100010111000001
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100010101000100010101000110001
10001100011000101111000011100101110
11111000010001000100010001000011111

(5)字体5

▲ 图1.2.7 字体5

▲ 图1.2.7 字体5

00100010101000110001111111000110001
11110010010100101110010010100111110
01110100011000010000100001000101110
11110010010100101001010010100111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010011100011000101111
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000100001000010000101001001100
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001001101
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110101101011010101010
10001100010101000100010101000110001
10001100011000101010001000010000100
11111000010001000100010001000011111

(6)字体6

▲ 图1.2.8 字体6

▲ 图1.2.8 字体6

01110100011000111111100011000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
01111000010000100001000011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110001100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001101111
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100011000101110100011000110001
10001100011000101110001000010000100
11111000010001000100010001000011111

5、字符集合分析

  在上面转换的六种字体中,实际上有的字符在所字体中编码都相似,比如C,V,K。也有的字母相差很大,比如A,B,Y等。

▲ 图2.1 六种字体

▲ 图2.1 六种字体

 

验总结 ※


  了准备人工神经网络中对于SOFM数据集合,在网络上搜集了六种5×7的点阵字符图片。通过程序将其转换成 0 - 1 编码,用于课程的作业中。


■ 相关文献链接:

● 相关图表链接:

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# ASCIIDOT.PY                  -- by Dr. ZhuoQing 2021-10-31
#
# Note:
#============================================================
from headm import *
from PIL import Image
#boxid = [3, 10, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
#boxid = [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91]
#boxid = [6, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104, 105, 106, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119]
#boxid = [120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 147, 148]
boxid = [149, 150, 151, 152, 153, 154, 155, 156, 157, 173, 172, 171, 170, 174, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 175]
picfile = tspgetdopfile(boxid[0])
picrange = tspgetrange(boxid[0])
#printf(picrange)
printf(picfile)
#------------------------------------------------------------
IMAGE_ROW = 7
IMAGE_COL = 5
PIXEL_THRESHOLD = 230
def image2Density(size, imagePixels):
    global boxid
    imageSize = size
    imageWidth = imageSize[0]
    imageHeight = imageSize[1]
    picwidth = picrange[2] - picrange[0]
    picheight = picrange[3] - picrange[1]
    widthRatio = imageWidth / picwidth
    heightRatio = imageHeight / picheight
    asciidim = []
    for box in boxid[1:]:
        boxrange = tspgetrange(box)
        boxpos = [boxrange[0] - picrange[0],
                  boxrange[1] - picrange[1],
                  boxrange[2] - picrange[0],
                  boxrange[3] - picrange[1]]
        boxheight = boxrange[3] - boxrange[1]
        boxwidth = boxrange[2] - boxrange[0]
        asciistr = ''
        for i in range(IMAGE_ROW):
            startRow = int((i * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
            endRow = int(((i+1) * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
            col = []
            for i in range(IMAGE_COL):
                startCol = int((i * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
                endCol = int(((i+1) * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
                pixelNum = (endRow - startRow - 1) * (endCol - startCol - 1)
                pixelSigma = 0
                for ii in range(startRow, endRow):
                    for jj in range(startCol, endCol):
                        pixelSigma += sum(imagePixels[ii, jj])
                pixelSigma = int(pixelSigma / (pixelNum))
                #printf(pixelSigma)
                if pixelSigma > PIXEL_THRESHOLD:
                    col.append('0')
                else: col.append('1')
            str01 = ''.join(col)
            printf(str01.replace('0','.').replace('1','#'))
            asciistr = asciistr + str01
        printf('  ')
        asciidim.append(asciistr)
    return asciidim
#------------------------------------------------------------
img = Image.open(picfile)
r,g,b = img.split()
img = Image.merge("RGB", (r,g,b)).getdata()
#plt.imshow(img)
#plt.show()
size = img.size
print(size)
img = array(img).sum(axis=1)/3
imgdata = img.reshape(size[1], size[0])
#imgaverage = imgdata.sum(axis=0)
#printf(shape(imgaverage))
#plt.plot(imgaverage)
#plt.xlabel("x")
#plt.ylabel("y")
#plt.grid(True)
#plt.tight_layout()
#plt.show()
printf(shape(imgdata))
result = image2Density(size, imgdata)
for s in result:
    printf(s)
#------------------------------------------------------------
#        END OF FILE : ASCIIDOT.PY
#============================================================

标签:SOFM,IMAGE,printf,算法,字体,picrange,集合,boxrange,点阵
来源: https://blog.csdn.net/zhuoqingjoking97298/article/details/121060703

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有