ICode9

精准搜索请尝试: 精确搜索
首页 > 系统相关> 文章详细

java – 在内存中存储大型地图

2019-08-28 23:12:31  阅读:192  来源: 互联网

标签:java performance guava graph


首先是问题的背景:我有一个非常大的图形,存储成本约为4GB.关于3M节点和34M边缘.我的程序采用这个大图,并从中递归构建较小的图.在递归的每个级别,我有两个图形 – 原始图形和从原始图形创建的图形.递归继续,直到图形缩小到非常小的图表,大约10个节点.

由于我需要这些图表来执行整个程序,因此内存效率对我的应用程序至关重要.

现在这是我目前遇到的问题:
这是从较大的图形创建较小图形的算法:

public static Graph buildByTriples(Graph g, ArrayList<Integer> seeds) {
    ArrayList<Edge> edges = new ArrayList(g.getEdgeCount());
    for (int i = 0; i < g.size(); i++) {
        for (Edge e : g.adj(i)) {
            int v = e.getEndpoint(i);
            if (i < v) {
                edges.add(e);
            }
        }
    }

    Table<Integer, Integer, Double> coarseEgdes = HashBasedTable.create(seeds.size(),seeds.size());
    //compute coarse weights
    edges.stream().forEach((e) -> {
        int v = e.getV();
        int u = e.getU();
        if (g.isC(u) && g.isC(v)) {
            addToTable(coarseEgdes, u, v, e.getWeight());
        }else if(!g.isC(u) && g.isC(v)){ //F-C
            for(Edge cEdge: g.cAdj(u)){//get coarse neighbors of the fine edges
                int nb = cEdge.getEndpoint(u);
                if(nb != v){
                    addToTable(coarseEgdes, v, nb, cEdge.getPij() * e.getWeight());

                }
            }
        }else if(g.isC(u) && !g.isC(v)){//C-F
            for(Edge cEdge: g.cAdj(v)){//get coarse neighbors of the fine edges
                int nb = cEdge.getEndpoint(v);
                if(nb != u){
                    addToTable(coarseEgdes, u, nb, cEdge.getPij() * e.getWeight());
                }
            }
        }else{//F-F
            for(Edge cEdgeU: g.cAdj(u)){//get coarse neighbors of the fine edges
                int uNb = cEdgeU.getEndpoint(u);
                for(Edge cEdgeV: g.cAdj(v)){
                    int vNb = cEdgeV.getEndpoint(v);
                    if(uNb != vNb){
                        addToTable(coarseEgdes, uNb, vNb, cEdgeU.getPij() * e.getWeight() * cEdgeV.getPij());
                    }
                }
            }
        }
    });

    return createGraph(g, coarseEgdes); //use the edges to build new graph. Basically loops through coarseEdges and add edge and weight to the new graph.
}

private static void addToTable(Table<Integer, Integer,Double> tbl, int r, int c, double val){
    int mn = Math.min(r, c);//the smaller of the two nodeIds
    int mx = Math.min(r, c);//the largest of the two nodeId
    if(tbl.contains(mn, mx)){
        tbl.put(mn, mx, tbl.get(mn, mx) + val);
    }else{
        tbl.put(mn, mx,val);
    }
}

现在,当我这样做时,我很快就会耗尽内存.我用YourKit描述了应用程序,并且内存使用率超过了顶层(在用完之前大于6GB),因此CPU使用率也是如此. coarseEdges可以变得非常大.是否存在更好的内存中Map实现,可以使用大型数据集进行扩展?或者有没有更好的方法来做到这一点而不存储coarseEdges?

PS:请注意,我的图形无法在恒定时间内检索边(u,v).它基本上是一个列表列表,这更好地提供了我的应用程序的其他关键部分的性能.

**Also See my graph implementation code below: **
public class Graph{
    private final int SIZE;
    private final EdgeList[] nodes;
    private final float[] volumes;
    private final double[] weightedSum;
    private final double[] weightedCoarseSum;
    private final int[] nodeDegrees;
    private final int[] c_nodeDegrees;
    private int edge_count=0;
    private final boolean[] coarse;
    private final EdgeList[] coarse_neighbors;
    public Graph(int SIZE){
        this.SIZE =SIZE;
        nodes = new EdgeList[SIZE];
        coarse_neighbors = new EdgeList[SIZE];

        volumes = new float[SIZE];
        coarse = new boolean[SIZE];

        //initialize data
        weightedSum = new double[SIZE];
        weightedCoarseSum = new double[SIZE];
        nodeDegrees= new int[SIZE];
        c_nodeDegrees = new int[SIZE];

        for(int i=0;i<SIZE;i++){
            nodes[i]=new EdgeList();
            coarse_neighbors[i] = new EdgeList();
            volumes[i]=1;
        }
    }

    public void addEdge(int u, int v, double w){
        //graph is undirected
        //In order to traverse edges in order such that u < v. We store edge u,v such that u<v
        Edge e=null;
        if(u<v){
            e = new Edge(u,v,w);
        }else if(u>v){
            e = new Edge(v,u,w);
        }else{
            throw new UnsupportedOperationException("Self loops not allowed in graph"); //TODO: Need a graph validation routine
        }

        nodes[u].add(e);
        nodes[v].add(e);

        //update the weighted sum of each edge
        weightedSum[u] += w;
        weightedSum[v] += w;

        //update the degree of each edge
        ++nodeDegrees[u];
        ++nodeDegrees[v];

        ++edge_count;
    }

    public int size(){
        return SIZE;
    }

    public EdgeList adj(int v){
        return nodes[v];
    }

    public EdgeList cAdj(int v){
        return coarse_neighbors[v];
    }

    public void sortAdj(int u, Comparator<Edge> c){
        nodes[u].sort(c);
    }

    public void sortCoarseAdj(int u, Comparator<Edge> c){
        coarse_neighbors[u].sort(c);
    }

    public void setCoarse(int node, boolean c){
        coarse[node] = c;
        if(c){
            //update the neighborHood of node
            for(Edge e: adj(node)){
                int v = e.getEndpoint(node);
                coarse_neighbors[v].add(e);
                weightedCoarseSum[v] += e.getWeight();
                ++c_nodeDegrees[v];
            }
        }
    }

    public int getEdgeCount(){
        return edge_count;
    }

    public boolean isC(int id){
        return coarse[id];
    }

    public double weightedDegree(int node){
        return weightedSum[node];
    }

    public double weightedCoarseDegree(int node){
        return weightedCoarseSum[node];
    }

    public int degree(int u){
        return nodeDegrees[u];
    }

    public int cDegree(int u){
        return c_nodeDegrees[u];
    }

    public Edge getCNeighborAt(int u,int idx){
        return coarse_neighbors[u].getAt(idx);
    }

    public float volume(int u){
        return volumes[u];
    }

    public void setVolume(int node, float v){
        volumes[node] = v;
    }

    @Override
    public String toString() {
        return "Graph[nodes:"+SIZE+",edges:"+edge_count+"]";
    }

}


//Edges are first class objects.
public class Edge {
    private boolean deleted=false;
    private int u;
    private int v;
    private double weight;
    private double pij;
    private double algebraicDist = (1/Constants.EPSILON);

    public Edge(int u, int v, double weight) {
        this.u = u;
        this.v = v;
        this.weight = weight;
    }

    public Edge() {
    }

    public int getU() {
        return u;
    }

    public void setU(int u) {
        this.u = u;
    }

    public int getV() {
        return v;
    }

    public void setV(int v) {
        this.v = v;
    }

    public int getEndpoint(int from){
        if(from == v){
            return u;
        }

        return v;
    }

    public double getPij() {
        return pij;
    }

    public void setPij(double pij) {
        this.pij = pij;
    }

    public double getAlgebraicDist() {
        return algebraicDist;
    }

    public void setAlgebraicDist(double algebraicDist) {
        this.algebraicDist = algebraicDist;
    }

    public boolean isDeleted() {
        return deleted;
    }

    public void setDeleted(boolean deleted) {
        this.deleted = deleted;
    }

    public double getWeight() {
        return weight;
    }

    public void setWeight(double weight) {
        this.weight = weight;
    }

    @Override
    public String toString() {
        return "Edge[u:"+u+", v:"+v+"]";
    }
}


// The Edge iterable
public class EdgeList implements Iterable<Edge>{
    private final ArrayList<Edge> data= new ArrayList();

    public void add(Edge e){
        data.add(e);
    }

    @Override
    public Iterator<Edge> iterator() {
        Iterator<Edge> it = new IteratorImpl();
        return it;
    }

    private class IteratorImpl implements Iterator<Edge> {

        public IteratorImpl() {
        }
        private int currentIndex = 0;
        private final int N = data.size();
        @Override
        public boolean hasNext() {

            //skip deleted
            while(currentIndex < N && data.get(currentIndex).isDeleted()){
                currentIndex++;
            }

            return currentIndex < N;
        }

        @Override
        public Edge next() {
            return data.get(currentIndex++);
        }

        @Override
        public void remove() {
            throw new UnsupportedOperationException();
        }
    }

    public Edge getAt(int idx){
        return data.get(idx);
    }

    public void sort(Comparator<Edge> c){
        data.sort(c);
    }
}

解决方法:

盲人在这里很少被刺 – 你需要实施它们才能看出它有多大帮助.

1)您可以考虑将复合键(int,int)与hashmap一起使用,而不是使用guava表.对于边缘权重来说肯定会更有效.如果你需要查询从某个顶点传出的边缘,那么它就不太明显了,但你需要看到cpu与内存的权衡.

2)如果使用纯散列图,则可以考虑使用其中一种堆外实现.例如,看看https://github.com/OpenHFT/Chronicle-Map,它可能会

3)如果你留在记忆中并想要挤出一些额外的空间,你可以用原始地图做一些肮脏的技巧.使用long-gt; double map,例如http://labs.carrotsearch.com/download/hppc/0.4.1/api/com/carrotsearch/hppc/LongDoubleMap.htmlhttp://trove4j.sourceforge.net/javadocs/gnu/trove/map/hash/TLongDoubleHashMap.html,将2xint顶点对编码为long,看看它有多大帮助.如果你使用64位,整数可以占用16个字节(假设是压缩的oops),Double 24个字节 – 每个条目得到32 24 = 56个字节,而8 8个带有原始映射

标签:java,performance,guava,graph
来源: https://codeday.me/bug/20190828/1756715.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有