Differences

This shows you the differences between two versions of the page.

Link to this comparison view

irc:1443564000 [2017/05/27 13:44] (current)
Line 1: Line 1:
 +[09:00:12] <​Narigo>​ pmlopes, good morning - what does CTE mean in this context? :) https://​github.com/​vert-x3/​vertx-mysql-postgresql-client/​issues/​25#​issuecomment-143770949
 +
 +[09:01:32] <​pmlopes>​ hello, CTE stands for "​Common Table Expression"​ which in SQL (in a simplistisc way) are expressions that use the WITH keyword
 +
 +[09:02:20] <​pmlopes>​ http://​www.postgresql.org/​docs/​9.4/​static/​queries-with.html
 +
 +[09:02:43] <​pmlopes>​ https://​technet.microsoft.com/​en-us/​library/​ms190766%28v=sql.105%29.aspx
 +
 +[09:05:58] <​Narigo>​ pmlopes, I guess in MySQL we only have the option to query for "​last_insert_id()"​ to get the latest inserted id in an auto_increment table. How does the JDBC client return the id? Does it always send this extra query, even if you don't want to check the latest inserted id?
 +
 +[09:08:32] <​pmlopes>​ narigo, that specific part is driver specific but there it does not mean that they always issue a second query. As i wrote in the issue the driver could rewrite the query to append " returning pk_column_name"​ which would make the query atomic, no other queries would interfere with the generated values.
 +
 +[09:10:35] <​Narigo>​ pmlopes, is that really the way the underlying jdbc drivers work? I mean, are they really changing your statement when you add a parameter to return auto_increment ids?
 +
 +[09:11:05] <​pmlopes>​ narigo, the thing gets more complicated for example when you insert 2 rows, then the last_insert_id() will only return the last one.
 +
 +[09:11:32] <​pmlopes>​ narigo i am not into the details of the jdbc drivers, but i can try to have a look
 +
 +[09:11:52] <​Narigo>​ pmlopes, that's true.. how does it work in JDBC drivers then? :)
 +
 +[09:12:48] <​Narigo>​ Yeah, let's try to see how they operate first... I doubt that they are rewriting the statements themselves
 +
 +[09:14:53] <​pmlopes>​ narigo, postgres does not use last_insert_id it rewrite the query, here for example: https://​github.com/​impossibl/​pgjdbc-ng/​blob/​844be8b94b8e0491370eebcc02b0f141a3aa6a7e/​src/​main/​java/​com/​impossibl/​postgres/​jdbc/​PGSimpleStatement.java#​L128
 +
 +[09:15:14] <​pmlopes>​ however that is not the official driver, let me see how the official driver does it...
 +
 +[09:17:08] <​Narigo>​ https://​github.com/​pgjdbc/​pgjdbc/​blob/​master/​org/​postgresql/​jdbc3/​AbstractJdbc3Statement.java#​L146
 +
 +[09:17:10] <​Narigo>​ pmlopes,
 +
 +[09:17:24] <​pmlopes>​ the official also, https://​github.com/​pgjdbc/​pgjdbc/​blob/​543ffd0cc012ed39654d3360c9cf816589ff3f1d/​org/​postgresql/​jdbc3/​AbstractJdbc3Statement.java#​L146
 +
 +[09:17:25] <​Narigo>​ it also adds returning
 +
 +[09:31:35] <​Narigo>​ pmlopes, I guess we could do something similar in our code, without getting into the driver itself, WDYT? In PostgreSQL it's done very simple - I doubt that it works with multiple statements (just changes the statement)
 +
 +[09:31:58] <​Narigo>​ "just changes the statement in a very simple way"
 +
 +[09:47:39] <​Narigo>​ pmlopes, another thing, in JDBC, it looks like you can tell the driver if you want the auto-generated ids or not. In the vertx-jdbc-client,​ you always return it - is this correct?
 +
 +[09:48:12] <​pmlopes>​ yes it is always on
 +
 +[09:48:54] <​pmlopes>​ i am not aware why that decision was taken but since we need to keep API compatibility we cannot change that now
 +
 +[09:49:03] <​Narigo>​ As in https://​github.com/​vert-x3/​vertx-sql-common/​blob/​master/​src/​main/​java/​io/​vertx/​ext/​sql/​SQLConnection.java we never say anything about "it returns the ids implicitly"​... a client can always get the ids by "doing it manually"​ (SELECT last_insert_id() or ... RETURNING id)
 +
 +[09:54:09] <​Narigo>​ I'm looking at the MySQL JDBC driver implementation currently and try to find out how it works on their end...
 +
 +[10:04:35] <​pmlopes>​ i've jumped the boat of mysql to psql like 5 years ago, so i'm not so fresh into its internals but i don't recall anything like "​insert ... returning id"
 +
 +[10:20:59] <​Narigo>​ pmlopes, so after digging deep into the MySQL driver code, I believe it always sends the last inserted id and update count in it's binary protocol (https://​github.com/​mysql/​mysql-connector-j/​blob/​release/​5.1/​src/​com/​mysql/​jdbc/​MysqlIO.java#​L3057)
 +
 +[10:25:43] <​Narigo>​ pmlopes, I guess that Mauricios MySQL driver actually returns the last inserted id, if it finds one: https://​github.com/​mauricio/​postgresql-async/​blob/​2c18a425c7b1d2d87d3b6530a0c1d48bfeb192cb/​mysql-async/​src/​main/​scala/​com/​github/​mauricio/​async/​db/​mysql/​MySQLConnection.scala#​L146
 +
 +[10:26:59] <​pmlopes>​ then we need to check if it really works, because there are reports on the forum that it doesn'​t
 +
 +[10:29:07] <​Narigo>​ pmlopes, can you give me a link? It shouldn'​t work for PostgreSQL, but it might already work for MySQL then
 +
 +[10:30:03] <​pmlopes>​ https://​groups.google.com/​d/​topic/​vertx/​ZFUSx3_HNsw/​discussion
 +
 +[10:30:13] <​pmlopes>​ the issue was related to mysql
 +
 +[10:30:58] <​Narigo>​ Ok, I need to check that
 +
 +[10:34:18] <​purplefox>​ cescoffier: pmlopes: temporalfox:​ Hi folks - there are a few core PRs that I'd like to merge, all pretty simple stuff. any chance someone could take a look?
 +
 +[10:34:35] <​cescoffier>​ purplefox: I'm in a code reivew day
 +
 +[10:34:41] <​cescoffier>​ so gonna have a look
 +
 +[10:34:44] <​purplefox>​ ah good :)
 +
 +[10:34:50] <​purplefox>​ thanks
 +
 +[10:35:03] <​cescoffier>​ BTW, do you remember, at some point, the Starter class was adding . to the classpath is not set
 +
 +[10:46:36] <​pmlopes>​ purplefox, i'll also give a look
 +
 +[10:47:20] <​pmlopes>​ do you have a list? or just run over all open PRs on core?
 +
 +[10:56:29] <​temporalfox>​ purplefox hello
 +
 +[10:56:42] <​purplefox>​ hi
 +
 +[10:56:57] <​temporalfox>​ I've some info about the cluster problem
 +
 +[10:57:05] <​temporalfox>​ at least I understand the behavior
 +
 +[10:57:28] <​temporalfox>​ I'm able to fix it, but I'm not sure this is the right thing to do, but the change at least makes the problem go away
 +
 +[10:57:39] <​purplefox>​ ok
 +
 +[10:57:45] <​purplefox>​ sounds like progress
 +
 +[10:58:50] <​temporalfox>​ the problem is that the HAManager uses a distributed map for the clusterMap
 +
 +[10:58:54] <​temporalfox>​ and when two nodes are killed
 +
 +[10:59:05] <​temporalfox>​ it does not have all the info to clean the subs
 +
 +[10:59:12] <​temporalfox>​ it does have info for one node but not the other
 +
 +[10:59:37] <​temporalfox>​ I made a quick change in Hazelcast to use ReplicatedMap instead of the problem goes away
 +
 +[11:00:06] <​purplefox>​ hmm, but the distributed map should have backup > 0 so should be ok
 +
 +[11:00:08] <​temporalfox>​ I think also adding more nodes to the cluster would solve the problem as distribution would be better
 +
 +[11:00:26] <​temporalfox>​ yes but two nodes dies at the same time
 +
 +[11:00:31] <​temporalfox>​ with 3 nodes
 +
 +[11:00:50] <​purplefox>​ i see
 +
 +[11:00:56] <​temporalfox>​ like a quorum issue probably
 +
 +[11:01:12] <​purplefox>​ that makes sense
 +
 +[11:01:37] <​purplefox>​ so , yes, replicatedmap seems to make sense here
 +
 +[11:01:47] <​temporalfox>​ perhaps we can use only replicated map for this particular map
 +
 +[11:02:08] <​purplefox>​ agreed, we don't want to do this in general
 +
 +[11:02:10] <​temporalfox>​ by checking the map name in the getSyncMap
 +
 +[11:02:22] <​purplefox>​ and users can always edit cluster.xml if they want more replication for other maps
 +
 +[11:02:23] <​temporalfox>​ and agree that this special map used by vertx needs to be replicated
 +
 +[11:02:42] <​temporalfox>​ the __vertx.haInfo name
 +
 +[11:02:46] <​purplefox>​ we can just change the settings in default_cluster.xml
 +
 +[11:02:53] <​temporalfox>​ ok
 +
 +[11:02:56] <​temporalfox>​ I will try
 +
 +[11:02:59] <​purplefox>​ no need to check the name
 +
 +[11:03:03] <​temporalfox>​ cool
 +
 +[11:03:19] <​temporalfox>​ will do that and try to have a reproducer in HZ for this case
 +
 +[11:03:33] <​purplefox>​ actually.. we should make the naming consistent, right now we use _vertx.haInfo for cluster map but use "​subs"​ for topic map
 +
 +[11:03:44] <​purplefox>​ we should probably user _vertx.subs for topic map
 +
 +[11:03:48] <​temporalfox>​ yes
 +
 +[11:03:51] <​temporalfox>​ speakinng of names
 +
 +[11:04:13] <​temporalfox>​ for internal event bus registration
 +
 +[11:04:20] <​temporalfox>​ like generated reply address
 +
 +[11:04:28] <​temporalfox>​ we could also use a prefix
 +
 +[11:05:30] <​temporalfox>​ anyway going to have a look at HZ project reproducer and config
 +
 +[11:05:32] <​purplefox>​ do you mean for easier debugging so you can recognise them?
 +
 +[11:05:35] <​temporalfox>​ yes
 +
 +[11:05:38] <​temporalfox>​ also
 +
 +[11:05:40] <​temporalfox>​ for metrics
 +
 +[11:05:53] <​purplefox>​ yes maybe, but this could give a performance penalty if you have many addresses
 +
 +[11:06:00] <​temporalfox>​ this allow to understand which metrics are business and which are not
 +
 +[11:06:10] <​temporalfox>​ ok
 +
 +[11:06:15] <​purplefox>​ as you will be scanning the first X chars every time (times millions) which are always the same
 +
 +[11:06:20] <​purplefox>​ to find matching address
 +
 +[11:06:33] <​purplefox>​ maybe we could enable this in "debug mode" ?
 +
 +[11:06:37] <​purplefox>​ (e.g. with system property)
 +
 +[11:16:49] <​purplefox>​ temporalfox:​ also.. in the users case, it also might be that the subs entry is on a different node, so he will need to make the subs map replicated or increase the backup size to guarantee it to work
 +
 +[11:17:44] <​purplefox>​ perhaps we should add a note in the hazelcast cluster manager - "if more than N nodes die at the same time, where N = backup count" then there is a possibility that topic information can be lost
 +
 +[11:20:00] <​purplefox>​ so.... really this is a user configuration issue - we can't guarantee the integrity of the event bus state with concurrent failure of more than N nodes where N = backup count
 +
 +[11:20:04] <​purplefox>​ default backup count = 2
 +
 +[11:20:09] <​purplefox>​ =1 I mean
 +
 +[11:20:18] <​purplefox>​ we should make this clearer though
 +
 +[11:23:58] <​temporalfox>​ so do we fix it with a replicatd map or do we just add a note ?
 +
 +[11:25:46] <​purplefox>​ do you know how to configure the map as replicated in default-cluster.xml ?
 +
 +[11:26:03] <​temporalfox>​ I havent looked yet
 +
 +[11:26:12] <​temporalfox>​ I was about to write a test for this particular case
 +
 +[11:26:24] <​temporalfox>​ in vertx cluster bus tests
 +
 +[11:26:38] <​temporalfox>​ but now I need to pick up my daughter at school :-)
 +
 +[11:26:58] <​purplefox>​ np. I think we could just make this a config issue and add something to the docs
 +
 +[11:27:10] <​purplefox>​ but I think we also need to make the naming consistent so the config covers both maps
 +
 +[11:27:47] <​purplefox>​ then we can just say "​either map the map replicated, or increase backup counts if you want to cope with concurrent failure of more than N nodes"
 +
 +[11:28:28] <​purplefox>​ although increasing backup counts does not guarantee just reduces proability
 +
 +[11:58:51] <​cescoffier>​ pmlopes: Is Route#​last(boolean) method gone, or just moved ?
 +
 +[12:01:02] <​s0va>​ purplefox: https://​github.com/​eclipse/​vert.x/​blob/​fb7fd9624d6034a8a8be80515317998c92ec4f4e/​src/​main/​java/​io/​vertx/​core/​eventbus/​impl/​EventBusImpl.java#​L438 ​ => i'm getting null pointer exceptions here sometimes
 +
 +[12:03:04] <​s0va>​ maybe there should be if (sid == null) { do something }
 +
 +[12:04:13] <​purplefox>​ s0va: you're probably trying to use an event bus instance before it's finished startng up, but without seeing your code it's hard to say
 +
 +[12:11:22] <​temporalfox>​ purplefox I'm not sure we can solve this by config because HazelcastInstance as separate methods for creating maps : getReplicatedMap and getMap
 +
 +[12:16:43] <​purplefox>​ temporalfox:​ it's not possible to configure in the config?
 +
 +[12:22:27] <​temporalfox>​ it's not the same API and the config.XML does not have replicated map config
 +
 +[12:22:51] <​temporalfox>​ replicate map returns a ReplicatedMap object and distributed map returns an IMap interface
 +
 +[12:22:57] <​temporalfox>​ both extends more or less java.util.Map
 +
 +[12:28:34] <​purplefox>​ ok i think it's ok just to use distributed map and make a note about backup count
 +
 +[12:28:47] <​purplefox>​ (and change the naming)
 +
 +[12:29:31] <​purplefox>​ looking at the docs it says that replicatedmap is weakly consistent and might get lost or missing updates which won't work for us anyway
 +
 +[12:40:42] <​pmlopes>​ @cescoffier it dropped the parameter it wasn't used and was giving the wrong semantics since it was being used to set a given route as the last entry of the router
 +
 +[12:48:41] <​purplefox>​ temporalfox:​ is the GenModule annotation no longer used? (vertx-sync doesn'​t build with it)
 +
 +[12:58:23] <​temporalfox>​ now it's ModuleGen
 +
 +[12:58:31] <​temporalfox>​ like VertxGen
 +
 +[12:58:38] <​temporalfox>​ it was changed a couple of weeks ago
 +
 +[12:59:35] <​temporalfox>​ purplefox ok I will do that and check the case is solved by increasing the backup count
 +
 +[13:00:42] <​purplefox>​ ok, i removed it from vertx-sync and it still seems to build without it
 +
 +[13:06:14] <​temporalfox>​ if you don't use codegen, you don't need it
 +
 +[13:06:21] <​cescoffier>​ pmlopes: thanks
 +
 +[13:09:55] <​temporalfox>​ purplefox also this has impact not only on event bus but also HA
 +
 +[13:10:08] <​purplefox>​ +1
 +
 +[13:10:14] <​temporalfox>​ when the failover node is chosen, an absence of data in the clusterMap
 +
 +[13:10:20] <​temporalfox>​ would prevent a node to be chosen for failover
 +
 +[13:10:54] <​temporalfox>​ and also affect quorum size
 +
 +[13:11:03] <​temporalfox>​ and have lower quorum size than expected
 +
 +[13:12:34] <​purplefox>​ I think it's just a distributed systems problem, not really Vert.x specific - if you have N replicas you can cope with at most N - 1 concurrent failures
 +
 +[13:12:51] <​purplefox>​ but we should document it
 +
 +[13:13:01] <​purplefox>​ the behaviour here is also specific to the cluster manager
 +
 +[13:13:09] <​purplefox>​ e.g. another cluster manager might not used a distributed map
 +
 +[13:13:11] <​temporalfox>​ yes
 +
 +[13:19:53] <​cescoffier>​ pmlopes: can you check the redis client changes: https://​github.com/​vert-x3/​wiki/​wiki/​3.1.0---Breaking-changes
 +
 +[13:20:21] <​pmlopes>​ yes
 +
 +[13:21:42] <​cescoffier>​ thanks
 +
 +[13:22:00] <​cescoffier>​ looks like all method taking a list as parameters have been removed
 +
 +[13:44:07] <​cescoffier>​ purplefox, pmlopes, temporalfox - https://​github.com/​vert-x3/​wiki/​wiki/​3.1.0---Breaking-changes contains all breaking changes from the official stack. Not much !
 +
 +[14:08:45] <​purplefox>​ cescoffier: thanks. in general we shouldn'​t make breaking changes outside major versions unless:
 +
 +[14:08:53] <​purplefox>​ a) There'​s a security issue that needs to be fixed
 +
 +[14:09:01] <​purplefox>​ b) Required by a bug fix
 +
 +[14:09:31] <​purplefox>​ this will be even more important in a Vert.x product where they are very strict about this
 +
 +[14:09:54] <​cescoffier>​ in core, 2 typos have been fixed, while the eventLoop method in something that is more or less internal
 +
 +[14:10:52] <​cescoffier>​ I've tried to use a new plugin to detect the changes, works quite well. Before I was using clirr, but clirr does not support Java 8 (because of BCEL that does not support java 8)
 +
 +[14:11:25] <​cescoffier>​ I will add the plugin configuration in our parent, so checking compatibility will be "​simpler"​
 +
 +[14:15:32] <​purplefox>​ cool, great job!
 +
 +[14:34:53] <​pmlopes>​ @cescoffier redis updates are in the wiki mostly should i also add the new methods? pretty much all the changes there was to keep the API consistent, for commands that take many args the method name is "​command"​ + "​Many"​ and just "​command"​ when there is just 1 argument
 +
 +[14:36:20] <​pmlopes>​ this affected like 8 methods in the whole API
 +
 +[14:45:03] <​purplefox>​ pmlopes: I think that is probably ok for now
 +
 +[14:45:14] <​purplefox>​ pmlopes: but we should be more careful going forward
 +
 +[14:48:17] <​pmlopes>​ agree, the main issue was that redis 3.0.0 was not tested properly and lots of issues were found and fixed on 3.1.0 but i don't see any API changes in the near future, there is a big PR now to support redis sentinel but that will not break the API once is it fully reviewed
 +
 +[17:07:04] *** ChanServ sets mode: +o purplefox
 +
 +[17:37:44] *** ChanServ sets mode: +o purplefox