Home:ALL Converter>parsing large JSON with java/GSON, can't read the JSON structure

parsing large JSON with java/GSON, can't read the JSON structure

Ask Time:2021-09-22T05:25:42         Author:Dimitri Petrucci

Json Formatter

I'm trying to parse, using Java and GSON, a large (about 10GB) database dump in JSON format from the Musicbrainz.org

the JSON file has this structure. No '[' ']' to indicate that this is gonna be an array of objects, and no ',' between each object. Don't know why, but this JSON file is just like that.

    "id": "d0ab06e1-751a-414b-a976-da72670391b1",
    "name": "Arcing Wires",
    "sort-name": "Arcing Wires"
    "id": "6f0c2c16-dd7e-4268-a484-bc7b2ac78108",
    "name": "Another",
    "sort-name": "Another"
    "id": "e062b6cd-5506-47b0-afdb-72f4279ec38c",
    "name": "Agent S",
    "sort-name": "Agent S"

and this is the code that I'm using:

        try(JsonReader jsonReader = new JsonReader(
            new InputStreamReader(
                    new FileInputStream(jsonFilePath), StandardCharsets.UTF_8))) {
        Gson gson = new GsonBuilder().create();
        while (jsonReader.hasNext()) {
            Artist mapped = gson.fromJson(jsonReader, Artist.class);
            //TODO do something with the object
    catch (UnsupportedEncodingException e) {
    } catch (FileNotFoundException e) {
    } catch (IOException e) {

and the class that I mapped is this:

public class Artist {

public String id;
public String name;
public String sortName;


the error I'm getting:

Exception in thread "main" java.lang.IllegalStateException: Expected BEGIN_ARRAY but was BEGIN_OBJECT at line 1 column 2 path $
at com.google.gson.stream.JsonReader.beginArray(JsonReader.java:350)
at DBLoader.parse(DBLoader.java:39)
at DBLoader.main(DBLoader.java:23)

I believe that the GSON expect a different structure from what I declared, but I don't understand how should I define this kind of JSON with no commas and no brackets. Any clues? thanks

Author:Dimitri Petrucci,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/69275727/parsing-large-json-with-java-gson-cant-read-the-json-structure
terrorrussia-keeps-killing :

JSON by default declares one top value only (and yes, this would be a valid JSON document), but there is JSON streaming that uses arbitrary techniques to concatenate multiple JSON elements into a single stream assuming that the stream consumer can parse it (read more). Gson supports a so-called lenient mode that turns off the "one top value only" mode (and does some more things irrelevant to the question) for JsonReader: setLenient. Having the lenient mode on, you can read JSON elements one by one, and it turns out that this mode can be used to parse/read line-delimited JSON and concatenated JSON values since they are simply delimited by zero or more whitespaces that are ignored by Gson (therefore more exotic record separator-delimited JSON and length-prefixed JSON are unsupported). The reason of why it does not work for you is that your initial code assumes that the stream contains a single JSON array (and it does not obviously: it is supposed to be a stream of elements that does not conform the JSON array syntax).\nA simple generic JSON stream support might look like this (using Stream API for its more rich API than Iterator has, but it is fine to show an idea, and you can easily adapt it to iterators, callbacks, observable streams, whatever you like):\n@UtilityClass\npublic final class JsonStreamSupport {\n\n public static <T> Stream<T> parse(@WillNotClose final JsonReader jsonReader, final Function<? super JsonReader, ? extends T> readElement) {\n final boolean isLenient = jsonReader.isLenient();\n jsonReader.setLenient(true);\n final Spliterator<T> spliterator = new Spliterators.AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED) {\n @Override\n public boolean tryAdvance(final Consumer<? super T> action) {\n try {\n final JsonToken token = jsonReader.peek();\n if ( token == JsonToken.END_DOCUMENT ) {\n return false;\n }\n // TODO: read more elements in batch\n final T element = readElement.apply(jsonReader);\n action.accept(element);\n return true;\n } catch ( final IOException ex ) {\n throw new RuntimeException(ex);\n }\n }\n };\n return StreamSupport.stream(spliterator, false)\n .onClose(() -> jsonReader.setLenient(isLenient));\n }\n\n}\n\nAnd then:\nJsonStreamSupport.<Artist>parse(jsonReader, jr -> gson.fromJson(jr, Artist.class))\n .forEach(System.out::println);\n\nOutput (assuming Artist has Lombok-generated toString()):\n\nArtist(id=d0ab06e1-751a-414b-a976-da72670391b1, name=Arcing Wires, sortName=Arcing Wires)\nArtist(id=6f0c2c16-dd7e-4268-a484-bc7b2ac78108, name=Another, sortName=Another)\nArtist(id=e062b6cd-5506-47b0-afdb-72f4279ec38c, name=Agent S, sortName=Agent S)\n\nHow many bytes does such an approach, JSON streaming, save so that it is used at the service you're trying to consume? I don't know.",