首先是问题的背景:我有一个非常大的图形,存储成本约为4GB.关于3M节点和34M边缘.我的程序采用这个大图,并从中递归构建较小的图.在递归的每个级别,我有两个图形 – 原始图形和从原始图形创建的图形.递归继续,直到图形缩小到非常小的图表,大约10个节点.



public static Graph buildByTriples(Graph g, ArrayList<Integer> seeds) {
    ArrayList<Edge> edges = new ArrayList(g.getEdgeCount());
    for (int i = 0; i < g.size(); i++) {
        for (Edge e : g.adj(i)) {
            int v = e.getEndpoint(i);
            if (i < v) {

    Table<Integer, Integer, Double> coarseEgdes = HashBasedTable.create(seeds.size(),seeds.size());
    //compute coarse weights
    edges.stream().forEach((e) -> {
        int v = e.getV();
        int u = e.getU();
        if (g.isC(u) && g.isC(v)) {
            addToTable(coarseEgdes, u, v, e.getWeight());
        }else if(!g.isC(u) && g.isC(v)){ //F-C
            for(Edge cEdge: g.cAdj(u)){//get coarse neighbors of the fine edges
                int nb = cEdge.getEndpoint(u);
                if(nb != v){
                    addToTable(coarseEgdes, v, nb, cEdge.getPij() * e.getWeight());

        }else if(g.isC(u) && !g.isC(v)){//C-F
            for(Edge cEdge: g.cAdj(v)){//get coarse neighbors of the fine edges
                int nb = cEdge.getEndpoint(v);
                if(nb != u){
                    addToTable(coarseEgdes, u, nb, cEdge.getPij() * e.getWeight());
            for(Edge cEdgeU: g.cAdj(u)){//get coarse neighbors of the fine edges
                int uNb = cEdgeU.getEndpoint(u);
                for(Edge cEdgeV: g.cAdj(v)){
                    int vNb = cEdgeV.getEndpoint(v);
                    if(uNb != vNb){
                        addToTable(coarseEgdes, uNb, vNb, cEdgeU.getPij() * e.getWeight() * cEdgeV.getPij());

    return createGraph(g, coarseEgdes); //use the edges to build new graph. Basically loops through coarseEdges and add edge and weight to the new graph.

private static void addToTable(Table<Integer, Integer,Double> tbl, int r, int c, double val){
    int mn = Math.min(r, c);//the smaller of the two nodeIds
    int mx = Math.min(r, c);//the largest of the two nodeId
    if(tbl.contains(mn, mx)){
        tbl.put(mn, mx, tbl.get(mn, mx) + val);
        tbl.put(mn, mx,val);

现在,当我这样做时,我很快就会耗尽内存.我用YourKit描述了应用程序,并且内存使用率超过了顶层(在用完之前大于6GB),因此CPU使用率也是如此. coarseEdges可以变得非常大.是否存在更好的内存中Map实现,可以使用大型数据集进行扩展?或者有没有更好的方法来做到这一点而不存储coarseEdges?


**Also See my graph implementation code below: **
public class Graph{
    private final int SIZE;
    private final EdgeList[] nodes;
    private final float[] volumes;
    private final double[] weightedSum;
    private final double[] weightedCoarseSum;
    private final int[] nodeDegrees;
    private final int[] c_nodeDegrees;
    private int edge_count=0;
    private final boolean[] coarse;
    private final EdgeList[] coarse_neighbors;
    public Graph(int SIZE){
        this.SIZE =SIZE;
        nodes = new EdgeList[SIZE];
        coarse_neighbors = new EdgeList[SIZE];

        volumes = new float[SIZE];
        coarse = new boolean[SIZE];

        //initialize data
        weightedSum = new double[SIZE];
        weightedCoarseSum = new double[SIZE];
        nodeDegrees= new int[SIZE];
        c_nodeDegrees = new int[SIZE];

        for(int i=0;i<SIZE;i++){
            nodes[i]=new EdgeList();
            coarse_neighbors[i] = new EdgeList();

    public void addEdge(int u, int v, double w){
        //graph is undirected
        //In order to traverse edges in order such that u < v. We store edge u,v such that u<v
        Edge e=null;
            e = new Edge(u,v,w);
        }else if(u>v){
            e = new Edge(v,u,w);
            throw new UnsupportedOperationException("Self loops not allowed in graph"); //TODO: Need a graph validation routine


        //update the weighted sum of each edge
        weightedSum[u] += w;
        weightedSum[v] += w;

        //update the degree of each edge


    public int size(){
        return SIZE;

    public EdgeList adj(int v){
        return nodes[v];

    public EdgeList cAdj(int v){
        return coarse_neighbors[v];

    public void sortAdj(int u, Comparator<Edge> c){

    public void sortCoarseAdj(int u, Comparator<Edge> c){

    public void setCoarse(int node, boolean c){
        coarse[node] = c;
            //update the neighborHood of node
            for(Edge e: adj(node)){
                int v = e.getEndpoint(node);
                weightedCoarseSum[v] += e.getWeight();

    public int getEdgeCount(){
        return edge_count;

    public boolean isC(int id){
        return coarse[id];

    public double weightedDegree(int node){
        return weightedSum[node];

    public double weightedCoarseDegree(int node){
        return weightedCoarseSum[node];

    public int degree(int u){
        return nodeDegrees[u];

    public int cDegree(int u){
        return c_nodeDegrees[u];

    public Edge getCNeighborAt(int u,int idx){
        return coarse_neighbors[u].getAt(idx);

    public float volume(int u){
        return volumes[u];

    public void setVolume(int node, float v){
        volumes[node] = v;

    public String toString() {
        return "Graph[nodes:"+SIZE+",edges:"+edge_count+"]";


//Edges are first class objects.
public class Edge {
    private boolean deleted=false;
    private int u;
    private int v;
    private double weight;
    private double pij;
    private double algebraicDist = (1/Constants.EPSILON);

    public Edge(int u, int v, double weight) {
        this.u = u;
        this.v = v;
        this.weight = weight;

    public Edge() {

    public int getU() {
        return u;

    public void setU(int u) {
        this.u = u;

    public int getV() {
        return v;

    public void setV(int v) {
        this.v = v;

    public int getEndpoint(int from){
        if(from == v){
            return u;

        return v;

    public double getPij() {
        return pij;

    public void setPij(double pij) {
        this.pij = pij;

    public double getAlgebraicDist() {
        return algebraicDist;

    public void setAlgebraicDist(double algebraicDist) {
        this.algebraicDist = algebraicDist;

    public boolean isDeleted() {
        return deleted;

    public void setDeleted(boolean deleted) {
        this.deleted = deleted;

    public double getWeight() {
        return weight;

    public void setWeight(double weight) {
        this.weight = weight;

    public String toString() {
        return "Edge[u:"+u+", v:"+v+"]";

// The Edge iterable
public class EdgeList implements Iterable<Edge>{
    private final ArrayList<Edge> data= new ArrayList();

    public void add(Edge e){

    public Iterator<Edge> iterator() {
        Iterator<Edge> it = new IteratorImpl();
        return it;

    private class IteratorImpl implements Iterator<Edge> {

        public IteratorImpl() {
        private int currentIndex = 0;
        private final int N = data.size();
        public boolean hasNext() {

            //skip deleted
            while(currentIndex < N && data.get(currentIndex).isDeleted()){

            return currentIndex < N;

        public Edge next() {
            return data.get(currentIndex++);

        public void remove() {
            throw new UnsupportedOperationException();

    public Edge getAt(int idx){
        return data.get(idx);

    public void sort(Comparator<Edge> c){


盲人在这里很少被刺 – 你需要实施它们才能看出它有多大帮助.



3)如果你留在记忆中并想要挤出一些额外的空间,你可以用原始地图做一些肮脏的技巧.使用long-gt; double map,例如http://labs.carrotsearch.com/download/hppc/0.4.1/api/com/carrotsearch/hppc/LongDoubleMap.htmlhttp://trove4j.sourceforge.net/javadocs/gnu/trove/map/hash/TLongDoubleHashMap.html,将2xint顶点对编码为long,看看它有多大帮助.如果你使用64位,整数可以占用16个字节(假设是压缩的oops),Double 24个字节 – 每个条目得到32 24 = 56个字节,而8 8个带有原始映射

