使用 官方 etl neo4j-import 导入 数据

1千万数据(数字 英文,没有中文)导入花费 27s 932ms
1千万数据(包含中文属性)导入花费 1m 50s 9ms
1.1亿数据(1.1亿节点 1.1亿关系 2.2亿属性) 15m 9s 37ms

温馨提示:
neo4j-import和neo4j-admin import虽然功能一样,但是参数还有很多区别,neo4j-import的参数更多一些,感觉用起来更方便。
不过官方会在以后的版本里用neo4j-admin import替掉nep4j-import

2017-04-11 导入测试

(1) linux导入1000w数据

node.csv文件 数据样例

uuid:ID(Person),name:String,:Label
2a3e275d9abc4c45913d8e7e619db87a,"张忆耕",Label1
db6ee76baff64db5956b6a5deb80acbf,"傅某评",Label2
8d2f4a74e7e7429390b3389d64d77637,"王苏维",Label3
4d0a5c3fa89a49e89f81a152f2aa259c,"蓝波",Label4

relationship.csv文件 数据样例

:START_ID(Person),:END_ID(Person),:TYPE
2a3e275d9abc4c45913d8e7e619db87a,db6ee76baff64db5956b6a5deb80acbf,Relationship1
8d2f4a74e7e7429390b3389d64d77637,4d0a5c3fa89a49e89f81a152f2aa259c,Relationship2
[wkq@wkq bin]$ ./neo4j-import --into /home/wkq/neo4j/neo4j-community-3.1.0/data/databases/test_10000000_graph.db --nodes /home/wkq/neo4j/node.csv  --relationships /home/wkq/neo4j/relathionship.csv --trim-strings true --input-encoding UTF-8 --id-type INTEGER --stacktrace true --bad-tolerance 0 --skip-bad-relationships true --skip-duplicate-nodes false
WARNING: neo4j-import is deprecated and support for it will be removed in a future
version of Neo4j; please use neo4j-admin import instead.

Neo4j version: 3.1.0
Importing the contents of these files into /home/wkq/neo4j/neo4j-community-3.1.0/data/databases/test_10000000_graph.db:
Nodes:
  /home/wkq/neo4j/node.csv
Relationships:
  /home/wkq/neo4j/relathionship.csv

Available resources:
  Free machine memory: 28.05 GB
  Max heap memory : 26.67 GB
  Processors: 32

Nodes
[>:23.85 MB/s-|PROPERTIES-----|NODE:68.66 MB---|LABEL SCAN-----|*v:41.50 MB/s-----------------]8.38M
Done in 10s 106ms
Prepare node index
[*SPLIT:209.81 MB-----------------------------------------------------------------------------]9.67M
Done in 2s 162ms
Calculate dense nodes
[>:115.63 M|TYPE-----------|*PREPARE(32)========================|CALCULATE(2)=================] 5.5M
Done in 2s 904ms
Relationships [:Friend] (1/1)
[>:47.30 MB|*PREPARE(10)==================|RECORDS-|P|RELATIONSHIP----------|v:61.33 MB/s-----]5.63M
Done in 4s 591ms
Node --> Relationship [:Friend] (1/1)

Done in 10ms
Relationship --> Relationship [:Friend] (1/1)
[>:??----------------------------------|*LINK------------------------------------------------|]3.82M
Done in 1s 605ms
Node --> Relationship Sparse
[>------------------------------------|LINK----|*v:140.00 MB/s--------------------------------]9.78M
Done in 1s 65ms
Relationship --> Relationship Sparse

Done in 1s 503ms
Count groups

Done in 10ms
Gather
[*>:??----------------------------------------------------------------------------------------]    0
Done in 1ms
Write

Done in 11ms
Node --> Group

Done in 10ms
Node counts

Done in 938ms
Relationship counts

Done in 779ms

IMPORT DONE in 27s 932ms.
Imported:
  10000000 nodes
  10000000 relationships
  20000000 properties
Peak memory usage: 209.81 MB
[wkq@wkq bin]$

(2) windows导入3000条左右节点(仅节点)

C:\ProfessionSofware\Neo4j\neo4j-community-3.1.2\bin>neo4j-import --into test.db  --id-type string  --nodes:Test C:/User/wdb/2017-04-06_test.csv  --stacktrace truektrace true
警告: This command does not appear to be running with administrative rights.  Some commands may fail e.g. Start/Stop
WARNING: neo4j-import is deprecated and support for it will be removed in a future
version of Neo4j; please use neo4j-admin import instead.

Neo4j version: 3.1.2
Importing the contents of these files into test.db:
Nodes:
  :Test
  C:\User\wdb\2017-04-06_test.csv

Available resources:
  Free machine memory: 3.09 GB
  Max heap memory : 1.77 GB
  Processors: 4

Nodes

Done in 441ms
Prepare node index

Done in 30ms
Calculate dense nodes

Done in 11ms
Node --> Relationship Sparse

Done in 11ms
Relationship --> Relationship Sparse

Done in 1ms
Count groups
[*>:??----------------------------------------------------------------------------------------]    0
Done in 15ms
Gather

Done in 2ms
Write

Done in
Node --> Group

Done in
Node counts

Done in 96ms
Relationship counts

Done in 3ms

IMPORT DONE in 2s 234ms.
Imported:
  3466 nodes
  0 relationships
  17897 properties
Peak memory usage: 33.85 kB

C:\ProfessionSofware\Neo4j\neo4j-community-3.1.2\bin>

(3) windows导入10w数据 (10万节点 10万关系)

node.csv文件 数据样例

uuid:ID(Person),name:String,:Label
2a3e275d9abc4c45913d8e7e619db87a,"张忆耕",Label1
db6ee76baff64db5956b6a5deb80acbf,"傅某评",Label2
8d2f4a74e7e7429390b3389d64d77637,"王苏维",Label3
4d0a5c3fa89a49e89f81a152f2aa259c,"蓝波",Label4

relationship.csv文件 数据样例

:START_ID(Person),:END_ID(Person),:TYPE
2a3e275d9abc4c45913d8e7e619db87a,db6ee76baff64db5956b6a5deb80acbf,Relationship1
8d2f4a74e7e7429390b3389d64d77637,4d0a5c3fa89a49e89f81a152f2aa259c,Relationship2
D:\ProfessionalSoftWare\Neo4j\neo4j-community-3.4.1\bin
λ neo4j-import --into D:/ProfessionalSoftWare/Neo4j/neo4j-community-3.4.1/data/databases/graph2.db --nodes D:/home/bonc/neo4j/node_uuid_10w.csv --relationships D:/home /bonc/neo4j/relathionship_uuid_10w.csv --trim-strings true --input-encoding UTF-8 --id-type String --stacktrace true --bad-tolerance 10000 --skip-bad-relationships true --skip-duplicate-nodes true
警告: This command does not appear to be running with administrative rights.  Some commands may fail e.g. Start/Stop
WARNING: neo4j-import is deprecated and support for it will be removed in a future
version of Neo4j; please use neo4j-admin import instead.
Neo4j version: 3.4.1
Importing the contents of these files into D:\ProfessionalSoftWare\Neo4j\neo4j-community-3.4.1\data\databases\graph2.db:
Nodes:
  D:\home\bonc\neo4j\node_uuid_10w.csv
Relationships:
  D:\home\bonc\neo4j\relathionship_uuid_10w.csv

Available resources:
  Total machine memory: 7.95 GB
  Free machine memory: 3.42 GB
  Max heap memory : 1.77 GB
  Processors: 4
  Configured max memory: 5.56 GB
  High-IO: false

Import starting 2018-07-10 13:41:28.551+0800
  Estimated number of nodes: 112.07 k
  Estimated number of node properties: 224.14 k
  Estimated number of relationships: 100.00 k
  Estimated number of relationship properties: 0.00
  Estimated disk space usage: 13.72 MB
  Estimated required memory usage: 1021.42 MB

InteractiveReporterInteractions command list (end with ENTER):
  c: Print more detailed information about current stage
  i: Print more detailed information

(1/4) Node import 2018-07-10 13:41:28.669+0800
  Estimated number of nodes: 112.07 k
  Estimated disk space usage: 10.47 MB
  Estimated required memory usage: 1021.42 MB
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%

(2/4) Relationship import 2018-07-10 13:41:30.376+0800
  Estimated number of relationships: 100.00 k
  Estimated disk space usage: 3.24 MB
  Estimated required memory usage: 1.00 GB
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%

(3/4) Relationship linking 2018-07-10 13:41:30.787+0800
  Estimated required memory usage: 1021.06 MB
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%

(4/4) Post processing 2018-07-10 13:41:31.445+0800
  Estimated required memory usage: 1020.01 MB
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%


IMPORT DONE in 4s 585ms.
Imported:
  100000 nodes
  100000 relationships
  200000 properties
Peak memory usage: 1.00 GB

其他测试

如果使用相对路径,导入后数据库在bin目录下

neo4j-import --into test.db --id-type string --nodes:Test C:/User/wdb/2017-04-06_test.csv --stacktrace truektrace true

如果使用绝对路径,导入后数据库在指定目录下

neo4j-import --into C:/ProfessionSofware/Neo4j/databases/test.db --id-type string --nodes:Test C:/User/wdb/2017-04-06_test.csv --stacktrace truektrace true

2018-04-26 试验

winbdows 小数据测试
neo4j-import --into C:/ProfessionSofware/Neo4j/neo4j-community-3.1.0/data/databases/test_graph.db --nodes C:/tmp/neo4j/node.csv --relationships C:/tmp/neo4j/relathionship.csv --trim-strings true --input-encoding UTF-8 --id-type INTEGER --stacktrace true --bad-tolerance 0 --skip-bad-relationships true --skip-duplicate-nodes false

linux 实际导入1000w数据
neo4j-import --into /home/wkq/neo4j/neo4j-community-3.1.0/data/databases/test_10000000_graph.db --nodes /home/wkq/neo4j/node.csv --relationships /home/wkq/neo4j/relathionship.csv --trim-strings true --input-encoding UTF-8 --id-type String --stacktrace true --bad-tolerance 0 --skip-bad-relationships true --skip-duplicate-nodes false

遇到的错误

(1) '--nodes' to have at least 1 valid item, but had 0 []

C:\ProfessionSofware\Neo4j\neo4j-community-3.1.2\bin>neo4j-import -into test.db  --id-type string  --nodes [:Customer] customers.csv
警告: This command does not appear to be running with administrative rights.  Some commands may fail e.g. Start/Stop
WARNING: neo4j-import is deprecated and support for it will be removed in a future
version of Neo4j; please use neo4j-admin import instead.

Input error: Expected '--nodes' to have at least 1 valid item, but had 0 []
Caused by:Expected '--nodes' to have at least 1 valid item, but had 0 []
java.lang.IllegalArgumentException: Expected '--nodes' to have at least 1 valid item, but had 0 []
        at org.neo4j.kernel.impl.util.Validators.lambda$atLeast$6(Validators.java:125)
        at org.neo4j.helpers.Args.validated(Args.java:640)
        at org.neo4j.helpers.Args.interpretOptionsWithMetadata(Args.java:608)
        at org.neo4j.tooling.ImportTool.extractInputFiles(ImportTool.java:503)
        at org.neo4j.tooling.ImportTool.main(ImportTool.java:388)
        at org.neo4j.tooling.ImportTool.main(ImportTool.java:334)
  1. 文件头有问题,检查nodes.csv的文件头是否有:ID,relathionship.csv的文件头是否有 :START_ID,:END_ID,:TYPE
  2. --nodes 后面的参数有问题,--nodes 直接跟 csv路径 或者 :Label csv绝对路径
  3. 仔细检查路径,看路径是否写错,linux和windows路径统一用/作为目录

(2) Error in input data ERROR in input

C:\ProfessionSofware\Neo4j\neo4j-community-3.1.2\bin>neo4j-import -into test.db  --id-type string  --nodes:Test C:/User/wdb/2017-04-06_test.csv  --stacktrace truektrace true  --skip-duplicate-nodes true
警告: This command does not appear to be running with administrative rights.  Some commands may fail e.g. Start/Stop
WARNING: neo4j-import is deprecated and support for it will be removed in a future
version of Neo4j; please use neo4j-admin import instead.

Neo4j version: 3.1.2
Importing the contents of these files into test.db:
Nodes:
  :Test
  C:\User\wdb\2017-04-06_test.csv

Available resources:
  Free machine memory: 3.74 GB
  Max heap memory : 1.77 GB
  Processors: 4

Nodes
Error in input data
Caused by:ERROR in input
  data source: BufferedCharSeeker[source:C:\User\wdb\2017-04-06_test.csv, position:654, line:11]
  in field: mobile:int:6
  for header: [dep:string, uid:int, name:string, tel:int, fax:string, mobile:int, email:string, id:string]
  raw field value: 13900001234
  original error: Not supported a.t.m

csv文件的数据有问题,最后把tel的类型换成string解决了

neo4j-import命令中的参数

// 要导入的数据库存放位置,必须是空库
--into <store-dir>

// 要导入的数据库
--database <database-name>

// 节点文件,包含头和数据,第一个文件第一行必须是头,多个文件在逻辑上视为一个大文件,文件组必须用引号括起来
--nodes[:Label1:Label2] "<file1>,<file2>,..."

// 关系文件,包含头和数据,多个文件在逻辑上视为一个大文件,文件组必须用引号括起来
--relationships[:RELATIONSHIP_TYPE] "<file1>,<file2>,..."

// 数据分隔符,默认是逗号
--delimiter <delimiter-character>

// 数组分隔符,默认是分号
--array-delimiter <array-delimiter-character>

// 引用字符 注意转义
--quote <quotation-character>

// 输入源中的字段是否可以跨越多行,即包含换行符。默认值:false
--multiline-fields <true/false>

//  是否应该修剪空白字符串。默认值:false
--trim-strings <true/false>

// csv文件编码,推荐使用UTF-8
--input-encoding <character set>

// 忽略空字符串,默认是false
--ignore-empty-strings <true/false>

// id类型,integer string actual, integer会比较快,string比较通用
--id-type <id-type>

// 最大处理器数量,为了达到最佳性能,这个值不应该大于可用处理器的数量。
--processors <max processor count>

// 启用错误堆栈跟踪的打印。默认值:false
--stacktrace <true/false>

// 导入前的错误条目数被视为失败。这种宽容阈值是关于引用缺失节点的关系。格式化错误 输入数据仍被视为错误。默认值:1000
--bad-tolerance <max number of bad entries>

// 是否跳过导入缺少节点ID的关系,即引用未指定节点的开始或结束节点ID /组 由节点输入数据。跳过的节点将被记录,最多包含数字 由不良容忍指定的实体。默认值:true
--skip-bad-relationships <true/false>

// 是否跳过导入具有相同ID /组的节点。在事件中在同一组内的多个节点具有相同的ID,第一个遇到的将被导入,而连续的这样的节点将被跳过。跳过的节点将被记录,最多包含由指定的实体数量坏容忍。默认值:false
--skip-duplicate-nodes <true/false>

// 是否忽略未由标题指定的数据中的额外列。默认值:false
--ignore-extra-columns <true/false>

// 指定数据库特定配置的文件。
--db-config

// 指定数据库特定配置的文件
--additional-config


// 页面大小(以字节为单位)
--page-size

Example:
        bin/neo4j-import --into retail.db --id-type string --nodes:Customer customers.csv
        --nodes products.csv --nodes orders_header.csv,orders1.csv,orders2.csv
        --relationships:CONTAINS order_details.csv
        --relationships:ORDERED customer_orders_header.csv,orders1.csv,orders2.csv

neo4j-3.1.2 neo4j-import help

C:\ProfessionSofware\Neo4j\neo4j-community-3.1.2\bin
λ neo4j-import
WARNING: neo4j-import is deprecated and support for it will be removed in a future
version of Neo4j; please use neo4j-admin import instead.

Neo4j Import Tool
        neo4j-import is used to create a new Neo4j database from data in CSV files. See
        the chapter "Import Tool" in the Neo4j Manual for details on the CSV file format
        - a special kind of header is required.
Usage:
--into <store-dir>
        Database directory to import into. Must not contain existing database.
--database <database-name>
        Database name to import into. Must not contain existing database.
--nodes[:Label1:Label2] "<file1>,<file2>,..."
        Node CSV header and data. Multiple files will be logically seen as one big file
        from the perspective of the importer. The first line must contain the header.
        Multiple data sources like these can be specified in one import, where each data
        source has its own header. Note that file groups must be enclosed in quotation
        marks.
--relationships[:RELATIONSHIP_TYPE] "<file1>,<file2>,..."
        Relationship CSV header and data. Multiple files will be logically seen as one
        big file from the perspective of the importer. The first line must contain the
        header. Multiple data sources like these can be specified in one import, where
        each data source has its own header. Note that file groups must be enclosed in
        quotation marks.
--delimiter <delimiter-character>
        Delimiter character, or 'TAB', between values in CSV data. The default option is
        ,.
--array-delimiter <array-delimiter-character>
        Delimiter character, or 'TAB', between array elements within a value in CSV
        data. The default option is ;.
--quote <quotation-character>
        Character to treat as quotation character for values in CSV data. The default
        option is ". Quotes inside quotes escaped like """Go away"", he said." and "\"Go
        away\", he said." are supported. If you have set "'" to be used as the quotation
        character, you could write the previous example like this instead: '"Go away",
        he said.'
--multiline-fields <true/false>
        Whether or not fields from input source can span multiple lines, i.e. contain
        newline characters. Default value: false
--trim-strings <true/false>
        Whether or not strings should be trimmed for whitespaces. Default value: false
--input-encoding <character set>
        Character set that input data is encoded in. Provided value must be one out of
        the available character sets in the JVM, as provided by
        Charset#availableCharsets(). If no input encoding is provided, the default
        character set of the JVM will be used.
--ignore-empty-strings <true/false>
        Whether or not empty string fields, i.e. "" from input source are ignored, i.e.
        treated as null. Default value: false
--id-type <id-type>
        One out of [STRING, INTEGER, ACTUAL] and specifies how ids in node/relationship
        input files are treated.
        STRING: arbitrary strings for identifying nodes.
        INTEGER: arbitrary integer values for identifying nodes.
        ACTUAL: (advanced) actual node ids. The default option is STRING. Default value:
        STRING
--processors <max processor count>
        (advanced) Max number of processors used by the importer. Defaults to the number
        of available processors reported by the JVM (in your case 4). There is a certain
        amount of minimum threads needed so for that reason there is no lower bound for
        this value. For optimal performance this value shouldn't be greater than the
        number of available processors.
--stacktrace <true/false>
        Enable printing of error stack traces. Default value: false
--bad-tolerance <max number of bad entries>
        Number of bad entries before the import is considered failed. This tolerance
        threshold is about relationships refering to missing nodes. Format errors in
        input data are still treated as errors. Default value: 1000
--skip-bad-relationships <true/false>
        Whether or not to skip importing relationships that refers to missing node ids,
        i.e. either start or end node id/group referring to node that wasn't specified
        by the node input data. Skipped nodes will be logged, containing at most number
        of entites specified by bad-tolerance. Default value: true
--skip-duplicate-nodes <true/false>
        Whether or not to skip importing nodes that have the same id/group. In the event
        of multiple nodes within the same group having the same id, the first
        encountered will be imported whereas consecutive such nodes will be skipped.
        Skipped nodes will be logged, containing at most number of entities specified by
        bad-tolerance. Default value: false
--ignore-extra-columns <true/false>
        Whether or not to ignore extra columns in the data not specified by the header.
        Skipped columns will be logged, containing at most number of entities specified
        by bad-tolerance. Default value: false
--db-config <path/to/neo4j.conf>
        (advanced) File specifying database-specific configuration. For more information
        consult manual about available configuration options for a neo4j configuration
        file. Only configuration affecting store at time of creation will be read.
        Examples of supported config are:
        dbms.relationship_grouping_threshold
        unsupported.dbms.block_size.strings
        unsupported.dbms.block_size.array_properties
--additional-config <path/to/neo4j.conf>
        (advanced) File specifying database-specific configuration. For more information
        consult manual about available configuration options for a neo4j configuration
        file. Only configuration affecting store at time of creation will be read.
        Examples of supported config are:
        dbms.relationship_grouping_threshold
        unsupported.dbms.block_size.strings
        unsupported.dbms.block_size.array_properties
--legacy-style-quoting <true/false>
        Whether or not backslash-escaped quote e.g. \" is interpreted as inner quote.
        Default value: true
Example:
        bin/neo4j-import --into retail.db --id-type string --nodes:Customer customers.csv
        --nodes products.csv --nodes orders_header.csv,orders1.csv,orders2.csv
        --relationships:CONTAINS order_details.csv
        --relationships:ORDERED customer_orders_header.csv,orders1.csv,orders2.csv

官方文档 10.1. Import

This chapter covers importing data into Neo4j.

The import tool is used to create a new Neo4j database from data in CSV files.

This chapter explains how to use the tool and format the input data. For in-depth examples of using the import tool, see Section B.4, “Use the Import tool”.

These are some things you will need to keep in mind when creating your input files:

Fields are comma separated by default but a different delimiter can be specified.
All files must use the same delimiter.
Multiple data sources can be used for both nodes and relationships.
A data source can optionally be provided using multiple files.
A header which provides information on the data fields must be on the first row of each data source.
Fields without corresponding information in the header will not be read.
UTF-8 encoding is used.

Indexes are not created during the import. Instead, you will need to add indexes afterwards (see Developer Manual → Indexes).

Data cannot be imported into an existing database using this tool. If you want to load small to medium sized CSV files use LOAD CSV (see Developer Manual → LOAD CSV).

10.1.1. CSV file header format

This section explains the header format of CSV files when using the Neo4j import tool.

The header row of each data source specifies how the fields should be interpreted. The same delimiter is used for the header row as for the rest of the data.

The header contains information for each field, with the format: :. The is used as the property key for values, and ignored in other cases. The following settings can be used for both nodes and relationships:

Property value
Use one of int, long, float, double, boolean, byte, short, char, string to designate the data type. If no data type is given, this defaults to string. To define an array type, append [] to the type. By default, array values are separated by ;. A different delimiter can be specified with –array-delimiter.
IGNORE
Ignore this field completely.

See below for the specifics of node and relationship data source headers.

10.1.1.1. Nodes

The following field types do additionally apply to node data sources:

ID
Each node must have a unique id which is used during the import. The ids are used to find the correct nodes when creating relationships. Note that the id has to be unique across all nodes in the import, even nodes with different labels.
LABEL
Read one or more labels from this field. Like array values, multiple labels are separated by ;, or by the character specified with –array-delimiter.

10.1.1.2. Relationships

For relationship data sources, there are three mandatory fields:

TYPE
The relationship type to use for the relationship.
START_ID
The id of the start node of the relationship to create.
END_ID
The id of the end node of the relationship to create.

10.1.1.3. ID spaces

The import tool assumes that node identifiers are unique across node files. If this is not the case then we can define an id space. Id spaces are defined in the ID field of node files.

For example, to specify the Person id space we would use the field type ID(Person) in our persons node file. We also need to reference that id space in our relationships file i.e. START_ID(Person) or END_ID(Person).

10.1.2. Command line usage

This section covers how to use the Neo4j import tool from the command line.

10.1.2.1. Linux

Under Unix/Linux/OSX, the command is named neo4j-import. Depending on the installation type, the tool is either available globally, or used by executing ./bin/neo4j-import from inside the installation directory.
10.1.2.2. Windows

Under Windows, used by executing bin\neo4j-import from inside the installation directory.

For help with running the import tool under Windows, see the reference in Windows.
10.1.2.3. Options

–into
Database directory to import into. Must not contain existing database.

–nodes[:Label1:Label2] “,,…​”
Node CSV header and data. Multiple files will be logically seen as one big file from the perspective of the importer. The first line must contain the header. Multiple data sources like these can be specified in one import, where each data source has its own header. Note that file groups must be enclosed in quotation marks.

–relationships[:RELATIONSHIP_TYPE] “,,…​”
Relationship CSV header and data. Multiple files will be logically seen as one big file from the perspective of the importer. The first line must contain the header. Multiple data sources like these can be specified in one import, where each data source has its own header. Note that file groups must be enclosed in quotation marks.

–delimiter
Delimiter character, or TAB, between values in CSV data. The default option is ,.

–array-delimiter
Delimiter character, or TAB, between array elements within a value in CSV data. The default option is ;.

–quote
Character to treat as quotation character for values in CSV data. The default option is “. Quotes inside quotes escaped like “””Go away””, he said.” and “"Go away", he said.” are supported. If you have set ‘ to be used as the quotation character, you could write the previous example like this instead: ‘“Go away”, he said.’

–multiline-fields <true/false>
Whether or not fields from input source can span multiple lines, i.e. contain newline characters. Default value: false

–input-encoding
Character set that input data is encoded in. Provided value must be one out of the available character sets in the JVM, as provided by Charset#availableCharsets(). If no input encoding is provided, the default character set of the JVM will be used.

–ignore-empty-strings <true/false>
Whether or not empty string fields (“”) from input source are ignored, i.e. treated as null. Default value: false

–id-type
One out of [STRING, INTEGER, ACTUAL] and specifies how ids in node/relationship input files are treated. STRING: arbitrary strings for identifying nodes. INTEGER: arbitrary integer values for identifying nodes. ACTUAL: (advanced) actual node ids. Default value: STRING

–processors
(advanced) Max number of processors used by the importer. Defaults to the number of available processors reported by the JVM. There is a certain amount of minimum threads needed so for that reason there is no lower bound for this value. For optimal performance this value shouldn’t be greater than the number of available processors.

–stacktrace <true/false>
Enable printing of error stack traces.

–bad-tolerance
Number of bad entries before the import is considered failed. This tolerance threshold is about relationships referring to missing nodes. Format errors in input data are still treated as errors. Default value: 1000

–skip-bad-relationships <true/false>
Whether or not to skip importing relationships that refers to missing node ids, i.e. either start or end node id/group referring to node that wasn’t specified by the node input data. Skipped nodes will be logged, containing at most number of entities specified by bad-tolerance. Default value: true

–skip-duplicate-nodes <true/false>
Whether or not to skip importing nodes that have the same id/group. In the event of multiple nodes within the same group having the same id, the first encountered will be imported whereas consecutive such nodes will be skipped. Skipped nodes will be logged, containing at most number of entities specified by bad-tolerance. Default value: false

–ignore-extra-columns <true/false>
Whether or not to ignore extra columns in the data not specified by the header. Skipped columns will be logged, containing at most number of entities specified by bad-tolerance. Default value: false

–db-config <path/to/neo4j.conf>

(advanced) File specifying database-specific configuration. For more information consult manual about available configuration options for a neo4j configuration file. Only configuration affecting store at time of creation will be read. Examples of supported config are:

    dbms.relationship_grouping_threshold
    unsupported.dbms.block_size.strings
    unsupported.dbms.block_size.array_properties

10.1.2.4. Verbose error information

In some cases if an unexpected error occurs it might be useful to supply the command line option –stacktrace to the import (and rerun the import to actually see the additional information). This will have the error printed with additional debug information, useful for both developers and issue reporting.

10.1.2.5. Output and statistics

While an import is running through its different stages, some statistics and figures are printed in the console. The general interpretation of that output is to look at the horizontal line, which is divided up into sections, each section representing one type of work going on in parallel with the other sections. The wider a section is, the more time is spent there relative to the other sections, the widest being the bottleneck, also marked with *. If a section has a double line, instead of just a single line, it means that multiple threads are executing the work in that section. To the far right a number is displayed telling how many entities (nodes or relationships) have been processed by that stage.

As an example:

[*>:20,25 MB/s——————|PREPARE(3)====================|RELATIONSHIP(2)===============] 16M

Would be interpreted as:

> data being read, and perhaps parsed, at 20,25 MB/s, data that is being passed on to …​
PREPARE preparing the data for …​
RELATIONSHIP creating actual relationship records and …​
v writing the relationships to the store. This step is not visible in this example, because it is so cheap compared to the other sections.

Observing the section sizes can give hints about where performance can be improved. In the example above, the bottleneck is the data read section (marked with >), which might indicate that the disk is being slow, or is poorly handling simultaneous read and write operations (since the last section often revolves around writing to disk).

References

[1] https://neo4j.com/developer/guide-import-csv/
[2] http://neo4j.com/docs/operations-manual/current/tools/import/
[3] http://neo4j.com/docs/operations-manual/current/tools/import/command-line-usage/
[4] http://neo4j.com/docs/operations-manual/current/tutorial/import-tool/